Home » R Discovery » What is Criterion Validity? Definition, Types and Examples
criterion validity

What is Criterion Validity? Definition, Types and Examples

criterion validity

Imagine a psychologist developing a new questionnaire to measure levels of anxiety among working mothers. To ensure the test is valid, they compare its results with a well-established clinical anxiety assessment. If the new test produces similar results to the established tool, it demonstrates criterion validity—a key measure of how well an assessment reflects real-world observations.  

 

What is criterion validity?

Criterion validity evaluates how accurately a test or assessment measures an abstract concept—such as depression, intelligence, or job performance—by comparing its results to a recognized standard or external benchmark. This is essential in research, as it helps establish whether a test genuinely measures what it claims to.

In fact, given that many psychological and social constructs cannot be directly measured (e.g., mood swings or stress levels), researchers rely on carefully designed tests and instruments to assess them. By analyzing how well test outcomes align with an existing “gold standard” measure, researchers can determine whether their assessment is genuinely practical. 

What is the gold standard assessment for criterion validity?

A gold standard is a widely accepted benchmark used for comparison. In the field of research, a theory may suggest that a specific criterion is linked to a particular concept. To ensure accurate validation, the chosen gold standard or criterion must measure the same or a closely related concept.

However, if the gold standard itself is flawed or biased, any measure validated against it may inherit the same issues, making it challenging to achieve actual criterion validity—even if the test itself is otherwise reliable. Common examples of criterion variables include clinical assessments, well-established questionnaires, and other validated tools. 

Types of Criterion Validity 

Having learnt what criterion validity is, let us now try to understand the two primary types of criterion validity. Given that the timing of the measurement of criterion validity can be different, two types of criterion validity can be observed: concurrent validity and predictive validity.  

  • Concurrent validity: When the measure of a test is compared to that obtained from the criterion variable at the same time, concurrent validity can be determined. This type of validity is proper when a new test or instrument needs to be evaluated against an existing one. It answers the question: “Does the test accurately reflect the current state of the criterion?” For example, psychological depression test scores correlate with currently observed behaviours.
  • Predictive validity: This type of validity focuses on the test’s ability to predict future outcomes or behaviours. It answers the question, “Can the test forecast a future criterion?” For example, a university admission test predicts students’ academic performance in the course.

How is criterion validity measured?  

Criterion validity is assessed by comparing a test with a widely accepted criterion for the construct. For this, statistical testing, such as determining the correlation between the test and criterion variable, is used. Correlations between the variables are calculated using the correlation coefficient or Pearson’s r. The correlation coefficient takes a number between -1 and 1 and shows the strength of the relationship between the two variables. The value -1 shows a perfect negative correlation, 0 indicates no relationship, and 1 indicates a perfect positive correlation. 

Establishing concurrent validity 

Identify a well-established criterion. The new measurement technique and the established criterion will be administered simultaneously. For example, in a university setting, when selecting the president of a literature club, there is usually a screening of candidates for this post.

One of the tests can be on the debating skills of candidates assessed by the teaching staff of the English department. At the same time, the debating skills will be evaluated by a cross-section of students. Both these tests will be administered simultaneously.  

The scores from the new technique and established criterion are then statistically analyzed using correlation coefficients. The score from the established criterion should be assessed separately. A positive correlation between the two indicates good concurrent validity. In the above example, if student evaluations correlate highly with teacher evaluations, then it exhibits good concurrent validity. 

Establishing predictive validity 

Identify the relevant criterion. The measurement technique will be administered to participants. Subsequently, after a specific time period, which may be months or years, the outcome will be assessed using an appropriate criterion. For example, if the criterion is academic performance, data can be collected from the last semester’s examination. 

The scores from the measurement technique and the future outcomes will then be statistically analyzed using correlation coefficients. A positive correlation indicates good predictive validity. 

As discussed, to determine criterion validity, a widely accepted standard for comparison needs to exist. If it is not available or documented, it won’t be easy to assess criterion validity. It should also be noted that the quality of the criterion variable affects the validity of the test.  

Clearly, criterion validity is not a one-size-fits-all concept. It requires careful consideration of the specific context and purpose of the test. Whether you are developing a new psychological assessment, an educational test, or a job performance tool, establishing criterion validity is essential for ensuring that your test truly measures what it claims to measure.  

R Discovery is a literature search and research reading platform that accelerates your research discovery journey by keeping you updated on the latest, most relevant scholarly content. With 250M+ research articles sourced from trusted aggregators like CrossRef, Unpaywall, PubMed, PubMed Central, Open Alex and top publishing houses like Springer Nature, JAMA, IOP, Taylor & Francis, NEJM, BMJ, Karger, SAGE, Emerald Publishing and more, R Discovery puts a world of  research at your fingertips. 

Try R Discovery Prime FREE for 1 week or upgrade at just US$72 a year to access premium features that let you listen to research on the go, read in your language, collaborate with peers, auto sync with reference managers, and much more. Choose a simpler, smarter way to find and read research – Download the app and start your free 7-day trial today! 

Related Posts