In the first part of this blog series regarding everything you need to know about assessments, we covered topics related to why assessments are used by companies/employers, the types of assessments available, and assessment formats.
In part 2, we will cover these topics:
No discussion of assessments would be complete without mentioning the benefits of using artificial intelligence (AI) as a tool for assessment. The benefit of AI is the technology can accurately automate tasks that were previously done by people. In the assessment context, this can include tasks such as scoring content, evaluating speech and writing, making predictions, etc.
The use of AI to score and rate assessments, however, can eliminate many of these common human rater biases like halo, severity, leniency, central tendency, similar-to-me, etc.
Interestingly, when these tasks are completed by humans, there is always some degree of unconscious human bias that creeps into the process. This is because we all have different backgrounds, come from different cultures, etc., and hence, we tend to view things from our own unique lens. The use of AI to score and rate assessments, however, can eliminate many of these common human rater biases like halo, severity, leniency, central tendency, similar-to-me, etc.
Now that you are an expert on the many types of assessments and the various formats in which they can be administered, it’s time for us to dive deeper into the science and learn a bit about how to evaluate the effectiveness and fairness of these tools. We call this field of study Psychometrics – the science of measuring (metrics) psychological traits (psycho). In this discussion we will learn about the critical concepts of validation, fairness, and the utility of using assessments for candidate selection. We will start with a term that many use, and often incorrectly – validation.
Tests are not valid; rather it is the inference that we make from a test score that is either valid or not.
The most common misconception, even among some more junior I/O psychologists, is that tests & assessments are either valid or not. If you learn only this one key psychometric fact, you will surely have attained a level of expertise in this field – Tests are not valid; rather it is the inference that we make from a test score that is either valid or not. In other words, if we infer from a high score on a test that someone has mastered some domain of knowledge or that they are likely to perform well on the job, and that inference is correct – then we have made a valid decision based on a test score!
Validation evidence answers the question; does the test do what it is supposed to do - or are the inferences we make from the test scores true. There are a few types of (validity) evidence that we rely on to answer this question – content, construct, and criterion-related validity evidence.
Content validity evidence is widely used when evaluating knowledge or skills assessments and it involves the degree to which the content of the test matches a content domain associated with the construct. For example, a test of the ability to add two numbers should include a range of combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good coverage of the content domain.
Construct validity evidence is more widely used when evaluating behavioral or personality assessments. Construct validity evidence is usually obtained by examining the correlation of the assessment in question with other scales purported to measure the same construct.
Criterion-related validity evidence can be used to support the hiring decisions made from any type of assessment. This evidence is generally gathered through correlational analyses examining the extent to which test scores correlate with some valued performance outside the test (the criterion) such as supervisory ratings of work performance or achievement of sales goals.
In the science of psychometrics, we define fairness as the assessment functioning the same way for everyone, regardless of their particular demographic, gender, ethnicity, age, socioeconomic status, geography, etc.
Another critical factor when evaluating assessments is fairness. In the science of psychometrics, we define fairness as the assessment functioning the same way for everyone, regardless of their particular demographic, gender, ethnicity, age, socioeconomic status, geography, etc. Aside from the legal ramifications of tests being unfair, or biased, fairness is of paramount concern simply, because it’s always important to operate with integrity and do the right thing – so we go to great lengths in this area.
First, when assessments are developed, they should be written by experts, not laymen. Although everyone has good intentions, expert test developers are keenly aware of troublesome areas and do their best to write test questions that are free of bias. These questions are then reviewed by experts for cultural and gender sensitivity and many other factors. Despite this hard work, sometimes even the best test developer will have unconscious cognitive biases that can creep into the assessment. For this reason, new assessments are field-tested and analyzed to better understand how they are functioning. And finally, when the assessments are administered to job candidates, a continuous process of review is employed to ensure that the tests are working as intended.
On the legal front, validation and fairness are key issues for organizations using pre-employment assessments. Specifically, the U.S. Department of Labor - Equal Employment Opportunity Commission (EEOC) oversees claims of employment discrimination. If a candidate were to fail a test and be denied a job, they could claim that the test discriminated against them. This is unlikely with an objective test of knowledge and skills, however, if tests are not observably job-related, a legal challenge could result. The company would then conduct an adverse impact study according to the EEOC guidelines. Here is how the EEOC defines adverse impact in assessments using the four-fifths rule:
A selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact.
If no adverse impact is identified, the claim is dismissed. If there is adverse impact, the company must demonstrate that the test is valid (i.e., job-related) and that’s where the validation evidence we discussed comes into the picture.
At this point in our series, you now have a good grasp of some fundamental assessment concepts. You have learned the assessment basics in part 1 and AI and use of psychometrics above. In the next blog, you will learn how the use of valid and fair assessments can impact the bottom-line of an organization. Specifically, we will look at how we quantify value through increased worker productivity – known as utility analysis. I’ll then wrap up on what we’re doing at Talview regarding assessments and to provide you insights into candidates.
Learn more about how Talview Assessments can benefit your company.