Validity is certainly the most important single characteristic of a test. If not valid, even a reliable test does not worth much. The reason is that a reliable test may not be valid; however, a valid test is to some extent reliable as well. Furthermore, where reliability is an independent statistical concept and has nothing to do with the content of the test, validity is directly related to the content and form of the test. In fact, validity is defined as "the extent to which a test measures what it is supposed to measure". This means that if a test is designed to measure examinees’ language ability, it should measure their language ability and nothing else. Otherwise, it will not be a valid test for the purposes intended.
As an example, suppose a test of reading comprehension is given to a student and on the basis of his test score, it is claimed that the student is very good at listening comprehension. This kind of interpretation is quite invalid. That particular score can be a valid indication of the student’s reading comprehension ability; however, the same score will be an invalid indication of the same student’s listening comprehension ability. Thus, a test can be valid for one purpose but not the other. In other words, a good test of grammar may be valid for measuring the grammatical ability of the examinees but not for measuring other kinds of abilities. Thus, validity is not an allor- non purpose phenomenon, but a relative one.
In order to guarantee that a test is valid, it should be evaluated from different dimensions. Every dimension constitutes a different kind of validity and contributes to the total validity of the test. Among many types of validity, three types – face validity, content validity, and criterion-related validity – are considered important. Each will be discussed briefly.
1. Face Validity
Face validity refers to the extent to which the physical appearance of the test corresponds to what it is claimed to measure. For instance, a test of grammar must contain grammatical items and not vocabulary items. Of course, a test of vocabulary may very well measure grammatical ability as well; however, that type of test will not show high face validity. It should be mentioned that face validity is not a very crucial or determinant type of validity. In most cases, a test that does not show high face validity has proven to be highly valid by other criteria. Therefore, teachers and administrators should not be very much concerned about the face validity of their tests. They should however be careful about other types of validity.
2. Content Validity
Content validity refers to the correspondence between the content of the test and the content of the materials to be tested. Of course, a test cannot include all the elements of the content to be tested. Nevertheless, the content of the test should be a reasonable sample and representative of the total content to be tested. In order to determine the content validity of a test, a careful examination of the direct correspondence between the content of the test and the materials to be tested is necessary. This would be possible through scrutinizing the table of specifications explained in the previous article. Although content validity, like face validity, is determined subjectively, it is, however, crucial for the validity of the test. Therefore, subjectivity should not imply insignificance. It is just the only that way the content validity of a test can be determined.
3. Criterion-Related Validity/ Empirical Validity:
Criterion-related validity refers to the correspondence between the results of the test in question and the results obtained from an outside criterion. The outside criterion is usually a measurement device for which the validity is already established. In contrast to face validity and content validity, which are determined subjectively, criterion-related validity is established quite objectively. That is why it is often referred to as empirical validity. Criterion-related validity is determined by correlating the scores on a newly developed test with scores on an already-established test.
As an example, assume that a new test of language proficiency, called ‘ROSHD’ is developed by a group of teachers. In this case, the criterion must also be a language proficiency test. Assume further that ‘TOEFL’ is selected as the outside criterion. In order to determine the criterion-related validity of ‘ROSHD,’ these two tests should be administered to a group of students and the two sets of scores be correlated. The degree of correlation is the validity index of the ‘ROSHD’ test validated against ‘TOEFL.’ It means that to the extent that the two tests correlate, they provide the same information on examinees’ language proficiency.
Criterion-related validity is of two major kinds. If a newly developed test is given concurrently with a criterion test, the validity index is called concurrent validity. However, when the two tests are given within a time interval, the correlation between the two sets of scores is called predictive validity. Both concurrent and predictive validity indexes serve the purposes of prediction. The magnitude of correlation predicts the performance of the examinees on the criterion measure from their performance on a newly developed test or vice-versa. The question may arise that how the criterion measure itself is validated. Of course, the answer is that it is validated against another valid test. The question can go back to the very first validated test. In this sense, criterion-related validity is a relative concept. That is, a test is valid in comparison to another test which itself is validated against still another test. No matter how tedious validation procedures might be, it is an inevitable part of test construction process. Test developers must go through the validation process since face validity and content validity are not sufficient indicators for a test to be considered valid.