By: Selow
Klo udah ngomongin pacaran yang dikaitkan dengan Islam pasti kita udah tau jawabannya. Yupz, apa lagi kalo bukan “gak boleh”. Kata-kata ini sudah menjadi klise sejak lama. Yah, namanya juga klise, sapa yang mau percaya. Bahkan banyak dikalangan yang menyerukan “gak boleh” pacaran ini yang tidak tahu alas an seruan mereka. Mungkin ini yang jadi sebab utama kebenaran ini hanya menjadi klise. Wait, aq harus clarify nih pacaan macam apa yg aq maksud. Yg aq maksud adalah pacaran yg antara dua orang membuat kesepakatan yg katanya sih berdasar cinta (entah binatang macam apa itu, hehehe), rasa suka, dLL kesepakatan ini sering kali tidak mengandung ikatan keluarga, apalagi agama. Fiuh,, ngomong apa ya aq barusan… ah lanjut kaaang….
Aq pernah pacaran, aq akui… itu juga mungkin yang mendorong saya nulis artikel ini. Aq pacaran sekali, 1,5 tahun dan tidak menghasilkan apa2. Lhah, kok malah curhat. Yo wes, jadi aq punya alasan umum supaya kita tidak kedodoran saat nyampein kata “dilarang Pacaran”. Oya, sebelum aq ngasih alasan2 itu, aq pesen, loe org brenti pacaran, jgn sampe loe orang malah brenti mbaca.. Okeokeoeke….
1. Deket dengan ziNa
Yah tergantung pacarannya lha klo ini mah….. yeee jgn salah bro, klo org udah jalin ikatan, bilang klo I love you… hati saling luluh,, weh bakalan gampang tuh menjurus k Zina… yanx k kosan bentar y, q mau ngambil dompet dulu. Lhah, kebetulan siang2 kosan sepi, eh pintu tertutup otimatis, lhah kok bukan rasa takut kok malah rasa seneng yg muncul…. Walah2 bahayabahayabahaya….. klo zina-nya udah ampe sex, trus sex-nya jd kbiasaan, trus kna pnyakit kelamin, AIDS… waspadalah….wkwk
2. Boros bgt
Boros bgt (banget.pent)lha apa nggak? Klo hang out, bli makanan, minuman adalah kebutuhan wajib. Bisanya sih g mati dari angka RP. 20.000,-. Wah banyak juga ya? Apalagi buat anak kost macam aq… g percaya itung aja! Yank ade’ aus nih,,, y udah, bentar y, AA beliin,,, haduuuhhh… n jangan salah juga, bnyak juga cewek-nya yg kluar duit, walah….
3. Prestasi menurun
Aq g bilang klo g pacarn prestasinya membaik lho.. lhah,, y klo prestasinya mau baik y belajar, hehe… lho kan bisa pacaran tp jg blajarnya dirajinnin? Klo day g nanya gitu bilang aja “ lho kan lebih baik lagi belajar n ga pacaran, g ada yg tiba2 SMS minta anterin ke sini lah, sono lah, klo g danter takut d putusin lah, apalah, tai unta lah…. Hehe untuk alasan yg laen pikirin ndiri y.. haha
4. Ga guna
Coba klo kita liat apasih yg dimongin ama org pacaran,, apa ya… yank baju ayang bagus, spatu ayang bagus, tas ayang bagus, mobil ayang bagu,, walah.. yank, besok jalan kemana y.? dah makan belum? klo dilihat ya paling jauh ya ngomongin tetangga yg putus cinta. Klo sms, pagi sore siang malem, yg ditanya g jauh dari “gi pain yank?, dah maem yank? Bobo, dah mlem!, dahh mandi lm? Dah ee’ lom? “ apa ada yang ngomongin kemaslahatan masyarakat, politik yang kian ngawur, birokrasi n birokrat yg amburadul, masalah pendidikan, ekonomi Negara,dLL.
5. Bikin sakit ati
Karna pada dasarnya ikatan yg dibuat org pacaran itu lemah (karna hanya, biasanya, diketahui dua pihak saja) maka ikatan ini hampir 100% menuai kegagalan. Klo dah gagal, salah satu atau kedua pihak akan sakit ati, klo dah sakit ati, bisa saling fitnah, saling serang, ato saling buhuh diri,,, ckckck… dari pada bunuh diri ya karuan bunuh nyamuk,,, suka pusing aq ma nyamuk… lhah, ngomong apa to…
6. Putus silaturahim
Klo pacarnnya dah putus, alih-alih hubungannya kembali baik seperti sediakala, malah sering kali saling gengsi2an, marah2an, dongkol2an, tai unta2an.. jd malah putus silaturahim, ini juga g boleh lho dalam islam. Terutama dibagian dendam2anny.
7. Ga sesuai dgn aturan islam
Klo d telaah, ada banyak bgt ayat, hadis, ijmaa’ yg pada akhirnya menunjuk pada dilarangnya berpacaran ini. Ketika segala sesuatu yg kita lakukan didasarkan pada islam, maka kita akan langsung menyadari baiknya tidak pacaran itu. Klo g percaya u Tanya deh ma ustad, kiai, dsb ttg pacaran ini.
Semua ini tidak terlepas dari sebuah sebab yang yang amat sangat mendasar. Sebab ini adalah sebab yang sangat sistematis. Lantas apasih sebab yg sistematis ini? Tunggu kelanjutannya d artikel q yang lainnya y…
Hah, pliss deh ah… hahaha
Minggu, 30 Mei 2010
Selasa, 11 Mei 2010
VALIDITY OF A GOOD TEST
Validity is certainly the most important single characteristic of a test. If not valid, even a reliable test does not worth much. The reason is that a reliable test may not be valid; however, a valid test is to some extent reliable as well. Furthermore, where reliability is an independent statistical concept and has nothing to do with the content of the test, validity is directly related to the content and form of the test. In fact, validity is defined as "the extent to which a test measures what it is supposed to measure". This means that if a test is designed to measure examinees’ language ability, it should measure their language ability and nothing else. Otherwise, it will not be a valid test for the purposes intended.
As an example, suppose a test of reading comprehension is given to a student and on the basis of his test score, it is claimed that the student is very good at listening comprehension. This kind of interpretation is quite invalid. That particular score can be a valid indication of the student’s reading comprehension ability; however, the same score will be an invalid indication of the same student’s listening comprehension ability. Thus, a test can be valid for one purpose but not the other. In other words, a good test of grammar may be valid for measuring the grammatical ability of the examinees but not for measuring other kinds of abilities. Thus, validity is not an allor- non purpose phenomenon, but a relative one.
In order to guarantee that a test is valid, it should be evaluated from different dimensions. Every dimension constitutes a different kind of validity and contributes to the total validity of the test. Among many types of validity, three types – face validity, content validity, and criterion-related validity – are considered important. Each will be discussed briefly.
1. Face Validity
Face validity refers to the extent to which the physical appearance of the test corresponds to what it is claimed to measure. For instance, a test of grammar must contain grammatical items and not vocabulary items. Of course, a test of vocabulary may very well measure grammatical ability as well; however, that type of test will not show high face validity. It should be mentioned that face validity is not a very crucial or determinant type of validity. In most cases, a test that does not show high face validity has proven to be highly valid by other criteria. Therefore, teachers and administrators should not be very much concerned about the face validity of their tests. They should however be careful about other types of validity.
2. Content Validity
Content validity refers to the correspondence between the content of the test and the content of the materials to be tested. Of course, a test cannot include all the elements of the content to be tested. Nevertheless, the content of the test should be a reasonable sample and representative of the total content to be tested. In order to determine the content validity of a test, a careful examination of the direct correspondence between the content of the test and the materials to be tested is necessary. This would be possible through scrutinizing the table of specifications explained in the previous article. Although content validity, like face validity, is determined subjectively, it is, however, crucial for the validity of the test. Therefore, subjectivity should not imply insignificance. It is just the only that way the content validity of a test can be determined.
3. Criterion-Related Validity/ Empirical Validity:
Criterion-related validity refers to the correspondence between the results of the test in question and the results obtained from an outside criterion. The outside criterion is usually a measurement device for which the validity is already established. In contrast to face validity and content validity, which are determined subjectively, criterion-related validity is established quite objectively. That is why it is often referred to as empirical validity. Criterion-related validity is determined by correlating the scores on a newly developed test with scores on an already-established test.
As an example, assume that a new test of language proficiency, called ‘ROSHD’ is developed by a group of teachers. In this case, the criterion must also be a language proficiency test. Assume further that ‘TOEFL’ is selected as the outside criterion. In order to determine the criterion-related validity of ‘ROSHD,’ these two tests should be administered to a group of students and the two sets of scores be correlated. The degree of correlation is the validity index of the ‘ROSHD’ test validated against ‘TOEFL.’ It means that to the extent that the two tests correlate, they provide the same information on examinees’ language proficiency.
Criterion-related validity is of two major kinds. If a newly developed test is given concurrently with a criterion test, the validity index is called concurrent validity. However, when the two tests are given within a time interval, the correlation between the two sets of scores is called predictive validity. Both concurrent and predictive validity indexes serve the purposes of prediction. The magnitude of correlation predicts the performance of the examinees on the criterion measure from their performance on a newly developed test or vice-versa. The question may arise that how the criterion measure itself is validated. Of course, the answer is that it is validated against another valid test. The question can go back to the very first validated test. In this sense, criterion-related validity is a relative concept. That is, a test is valid in comparison to another test which itself is validated against still another test. No matter how tedious validation procedures might be, it is an inevitable part of test construction process. Test developers must go through the validation process since face validity and content validity are not sufficient indicators for a test to be considered valid.
As an example, suppose a test of reading comprehension is given to a student and on the basis of his test score, it is claimed that the student is very good at listening comprehension. This kind of interpretation is quite invalid. That particular score can be a valid indication of the student’s reading comprehension ability; however, the same score will be an invalid indication of the same student’s listening comprehension ability. Thus, a test can be valid for one purpose but not the other. In other words, a good test of grammar may be valid for measuring the grammatical ability of the examinees but not for measuring other kinds of abilities. Thus, validity is not an allor- non purpose phenomenon, but a relative one.
In order to guarantee that a test is valid, it should be evaluated from different dimensions. Every dimension constitutes a different kind of validity and contributes to the total validity of the test. Among many types of validity, three types – face validity, content validity, and criterion-related validity – are considered important. Each will be discussed briefly.
1. Face Validity
Face validity refers to the extent to which the physical appearance of the test corresponds to what it is claimed to measure. For instance, a test of grammar must contain grammatical items and not vocabulary items. Of course, a test of vocabulary may very well measure grammatical ability as well; however, that type of test will not show high face validity. It should be mentioned that face validity is not a very crucial or determinant type of validity. In most cases, a test that does not show high face validity has proven to be highly valid by other criteria. Therefore, teachers and administrators should not be very much concerned about the face validity of their tests. They should however be careful about other types of validity.
2. Content Validity
Content validity refers to the correspondence between the content of the test and the content of the materials to be tested. Of course, a test cannot include all the elements of the content to be tested. Nevertheless, the content of the test should be a reasonable sample and representative of the total content to be tested. In order to determine the content validity of a test, a careful examination of the direct correspondence between the content of the test and the materials to be tested is necessary. This would be possible through scrutinizing the table of specifications explained in the previous article. Although content validity, like face validity, is determined subjectively, it is, however, crucial for the validity of the test. Therefore, subjectivity should not imply insignificance. It is just the only that way the content validity of a test can be determined.
3. Criterion-Related Validity/ Empirical Validity:
Criterion-related validity refers to the correspondence between the results of the test in question and the results obtained from an outside criterion. The outside criterion is usually a measurement device for which the validity is already established. In contrast to face validity and content validity, which are determined subjectively, criterion-related validity is established quite objectively. That is why it is often referred to as empirical validity. Criterion-related validity is determined by correlating the scores on a newly developed test with scores on an already-established test.
As an example, assume that a new test of language proficiency, called ‘ROSHD’ is developed by a group of teachers. In this case, the criterion must also be a language proficiency test. Assume further that ‘TOEFL’ is selected as the outside criterion. In order to determine the criterion-related validity of ‘ROSHD,’ these two tests should be administered to a group of students and the two sets of scores be correlated. The degree of correlation is the validity index of the ‘ROSHD’ test validated against ‘TOEFL.’ It means that to the extent that the two tests correlate, they provide the same information on examinees’ language proficiency.
Criterion-related validity is of two major kinds. If a newly developed test is given concurrently with a criterion test, the validity index is called concurrent validity. However, when the two tests are given within a time interval, the correlation between the two sets of scores is called predictive validity. Both concurrent and predictive validity indexes serve the purposes of prediction. The magnitude of correlation predicts the performance of the examinees on the criterion measure from their performance on a newly developed test or vice-versa. The question may arise that how the criterion measure itself is validated. Of course, the answer is that it is validated against another valid test. The question can go back to the very first validated test. In this sense, criterion-related validity is a relative concept. That is, a test is valid in comparison to another test which itself is validated against still another test. No matter how tedious validation procedures might be, it is an inevitable part of test construction process. Test developers must go through the validation process since face validity and content validity are not sufficient indicators for a test to be considered valid.
RELIABILITY OF A GOOD TEST
Reliability is one of the most important characteristics of all tests in general, and language tests in particular. In fact, an unreliable test is worth nothing. In order to understand the concept of reliability, an example may prove helpful. Suppose a student took a test of grammar comprising one hundred items and received a score of 90. Suppose further that the same student took the same test two days later and got a score of 45. Finally suppose that the same student took the same test for the third time and received a score of 70. What would you think of these scores? What would you think of the student?
Assuming that the student’s knowledge of English cannot go under drastic changes within this short period, the best explanation would be that there must have been something wrong with the test. How would you rely on a test that does not produce consistent scores? How can you make a sound decision on the basis of such test scores? This is the essence of the concept of reliability, i.e., producing consistent scores.
Reliability, then, can be technically defined as “the extent to which a test produces consistent scores at different administrations to the same or similar group of examinees”. If a test produced exactly the same scores at different administrations to the same group, that test would be perfectly reliable. This perfect reliability, nevertheless, does not practically exist in reality. There are many factors influencing test score reliability.
These factors are
1. the sampling task
2. poor students motivation
3. test writer’s or the examiner’s control
4. the range from examinees' differing mental and physical conditions to the precision of the test items, and to the administration as well as scoring procedures.
Therefore, reliability is “the extent to which a test produces consistent scores." This means that the higher the extent, the more reliable the test.
Statistically speaking, reliability is represented by the letter “r”, whose magnitude fluctuates between zero and one; zero and one demonstrate maximum and minimum degree of test score reliability. It should be mentioned that “R” is an independent statistical concept. It does not have anything to do with the content or the form of the test. It solely deals with the scores produced by a test. In fact, one can estimate “R” without having any information about the content of the test. Thus, when one talks about the reliability of a test, he refers to the scores and not to the content or the form of the test.
Understanding the concept of reliability, one should next estimate “R” which requires some statistical competency. In the following section, an attempt is made to explain the procedures to estimate “r” in as non-technical terms as possible. Four methods of estimating reliability – (1) test-retest, (2) parallel forms, (3) split-half, and (4) KR-21.
1. Test-Retest Method
As the name implies, in this method a single test is administered to a single group of examinees twice. The first administration is called “test” and the second administration is referred to as “retest”. The correlation between the two sets of scores, obtained from testing and retesting, would determine the magnitude of reliability. Since there is a time interval (usually more than two weeks) between the two administrations, this kind of reliability estimate is also known as “stability of scores over time”.
Although obtaining reliability estimates through test-retest method seems very easy, it has some practical disadvantages. First, it is not very easy to have the same group of examinees available in two different administrations. Second, the time interval creates two obstacles. On the one hand, if it is very short, there might be practice effect as well as memorization effect carried over from the first administration. On the other hand, if the interval is too long, there will be the learning effect, i.e., the examinee’s state of knowledge will not be the same as it was in the first administration. To avoid these problems, other methods of estimating reliability have been developed.
2. Parallel Forms Method
In order to remove some of the problems inherent in the test-retest method, experts have developed the parallel forms method. In this method, two parallel forms of a single test are given to one group of examinees. The correlation between the scores obtained from the two tests is computed to indicate the reliability of the scores. This method has an advantage over the test-retest method in that there is no need for administering the test twice. Thus, the problem of examinees’ knowledge undergoing changes does not exist in this method. Nevertheless, this method has a major shortcoming. That is, constructing two parallel forms of a test is not an easy task.
There are certain logical and statistical criteria that a pair of parallel forms must meet. Therefore, most teachers and test developers avoid this method. Due to the complexity of the task, they prefer to use other methods of estimating reliability.
3. Split-Half Method
In test-retest method, one group of examinees was needed for two administrations. In parallel forms method, on the other hand, two forms of a single test were needed. Each of these requirements is considered a disadvantage. To obviate these shortcomings, the split-half method has been developed. In this method, a single form of a test is given to a single group of examinees. Then each examinee’s test is split (divided) into two halves. The correlation between the scores of the examinees in the first half and the second half will determine the reliability of the test scores. The only problem with this method is how to divide the test items into two halves. The best way is to use odd and even items to form each half, i.e., items numbered 1, 3,5, 7, etc. will constitute the first half, and items numbered 2, 4, 6, 8, ... will form the second half.
4. The KR-21 Method
The previously mentioned methods to estimate test score reliability require a statistical procedure called ‘correlation’. Majority of teachers and non-professional test developers, however, are not quite familiar with statistics. Thus, they may have some problems in using statistical formulas and interpreting the outcome of statistical analyses. To overcome these problems, two statisticians – named Kuder and Richardson – developed a series of formulas to be used in statistics. One of these formulas is used to estimate test score reliability through simple mathematical operations. The formula is called KR-21, in which K and R refer to the first initials of the two statisticians and 21 refers to the number of the formula in the series. This formula is used to estimate the reliability of a single test given to one group of examinees through a single administration. This method requires only the testers and teachers to be able to calculate two simple statistical parameters. These parameters are (1) the mean and (2) variance. The methods of computing the mean and variance are explained in almost all introductory statistics books. However, for the purposes of clarification, a brief explanation of each parameter will be given here. For further information, interested readers are to consult statistics books.
(1) The Mean: The mean, commonly known as the average, is the most frequently used concept in statistics. It simply refers to a single score that best represents the scores of a group. If each score is symbolized as X, then the mean (represented by X and read X bar) will be computed by adding up all Xs and dividing the sum by the number of scores (represented by N). To represent the sum of scores, the Greek letter (Σ), read 'sigma' is used in statistics. Thus the statistical formula to compute the mean would be:
This simply means that add up all scores (ΣX) and divide it by the number of scores
(N). A numerical example may be helpful. Consider the scores of fifteen students who
took a language test:
98 89 78
97 89 73
95 84 70
93 82 60
90 82 50
To determine the mean score of the test, add the fifteen scores, that is, ΣX = 1230. Then, divide it by N, 15, to give X = 82.
(2) The Variance: The variance, represented by the letter V refers to the variation of scores around the mean. Although the formula for computing variance may seem cumbersome, it is not actually difficult. To avoid complexities, the formula will be explained as follows.The formula means to do the following operations:
1. Compute the mean (X)
2. Compute the deviation scores by subtracting the mean from each single score (X-X).
3. Square every deviation score (X-X)²
4. Add up all deviation scores squared Σ(X-X)²
5. Divide the result of step 4 by N-1
In order to clarify the computational procedures, a numerical example is given below.
Consider the scores of ten subjects on a short grammar test: 3, 2, 3, 4, 5, 5, 5, 6, 6, 8.
To compute the variance we follow the instructions given before:
1. Compute the mean.
2. Compute the deviation scores.
3. Square each deviation score.
X X X-X (X-X)2
3 4,7 -1,7 2,89
2 4,7 -2,7 7,29
3 4,7 -1,7 2,89
4 4,7 -0,7 0,49
5 4,7 -0,3 0,09
5 4,7 -0,3 0,09
5 4,7 -0,3 0,09
6 4,7 -1,3 1,69
6 4,7 -1,3 1,69
8 4,7 -3,3 10,89
4. Add up all deviation scores squared Σ(X-X)² = 28.10
5. Divide the result of step 4 by N-1
Computing the magnitudes of the mean and variance, we are now ready to put these
values in the KR-21 formula and get the reliability of the test scores. The formula is as
follows:
In this formula, K refers to the number of items in the test, X represents the mean of
test scores, and V is the variance of test scores. Again, a numerical example follows:
Suppose a one-hundred-item test is administered to a group of students. The mean and
variance computed to be 65 and 100, respectively. Reliability of the scores will be
computed using the KR-21 formula:
The procedure may seem a little complex, but with some practice, it will prove easy and very useful. This method is especially valuable for those who do not have a strong statistical background. From the four methods of estimating reliability, KR-21 method is the most practical and commonly used one. Therefore, it is recommended that teachers and administrators use this method. After covering the first characteristic of a good test, i.e., reliability, and the ways of estimating reliability, the next section is devoted to explaining the second characteristic, namely, validity.
Assuming that the student’s knowledge of English cannot go under drastic changes within this short period, the best explanation would be that there must have been something wrong with the test. How would you rely on a test that does not produce consistent scores? How can you make a sound decision on the basis of such test scores? This is the essence of the concept of reliability, i.e., producing consistent scores.
Reliability, then, can be technically defined as “the extent to which a test produces consistent scores at different administrations to the same or similar group of examinees”. If a test produced exactly the same scores at different administrations to the same group, that test would be perfectly reliable. This perfect reliability, nevertheless, does not practically exist in reality. There are many factors influencing test score reliability.
These factors are
1. the sampling task
2. poor students motivation
3. test writer’s or the examiner’s control
4. the range from examinees' differing mental and physical conditions to the precision of the test items, and to the administration as well as scoring procedures.
Therefore, reliability is “the extent to which a test produces consistent scores." This means that the higher the extent, the more reliable the test.
Statistically speaking, reliability is represented by the letter “r”, whose magnitude fluctuates between zero and one; zero and one demonstrate maximum and minimum degree of test score reliability. It should be mentioned that “R” is an independent statistical concept. It does not have anything to do with the content or the form of the test. It solely deals with the scores produced by a test. In fact, one can estimate “R” without having any information about the content of the test. Thus, when one talks about the reliability of a test, he refers to the scores and not to the content or the form of the test.
Understanding the concept of reliability, one should next estimate “R” which requires some statistical competency. In the following section, an attempt is made to explain the procedures to estimate “r” in as non-technical terms as possible. Four methods of estimating reliability – (1) test-retest, (2) parallel forms, (3) split-half, and (4) KR-21.
1. Test-Retest Method
As the name implies, in this method a single test is administered to a single group of examinees twice. The first administration is called “test” and the second administration is referred to as “retest”. The correlation between the two sets of scores, obtained from testing and retesting, would determine the magnitude of reliability. Since there is a time interval (usually more than two weeks) between the two administrations, this kind of reliability estimate is also known as “stability of scores over time”.
Although obtaining reliability estimates through test-retest method seems very easy, it has some practical disadvantages. First, it is not very easy to have the same group of examinees available in two different administrations. Second, the time interval creates two obstacles. On the one hand, if it is very short, there might be practice effect as well as memorization effect carried over from the first administration. On the other hand, if the interval is too long, there will be the learning effect, i.e., the examinee’s state of knowledge will not be the same as it was in the first administration. To avoid these problems, other methods of estimating reliability have been developed.
2. Parallel Forms Method
In order to remove some of the problems inherent in the test-retest method, experts have developed the parallel forms method. In this method, two parallel forms of a single test are given to one group of examinees. The correlation between the scores obtained from the two tests is computed to indicate the reliability of the scores. This method has an advantage over the test-retest method in that there is no need for administering the test twice. Thus, the problem of examinees’ knowledge undergoing changes does not exist in this method. Nevertheless, this method has a major shortcoming. That is, constructing two parallel forms of a test is not an easy task.
There are certain logical and statistical criteria that a pair of parallel forms must meet. Therefore, most teachers and test developers avoid this method. Due to the complexity of the task, they prefer to use other methods of estimating reliability.
3. Split-Half Method
In test-retest method, one group of examinees was needed for two administrations. In parallel forms method, on the other hand, two forms of a single test were needed. Each of these requirements is considered a disadvantage. To obviate these shortcomings, the split-half method has been developed. In this method, a single form of a test is given to a single group of examinees. Then each examinee’s test is split (divided) into two halves. The correlation between the scores of the examinees in the first half and the second half will determine the reliability of the test scores. The only problem with this method is how to divide the test items into two halves. The best way is to use odd and even items to form each half, i.e., items numbered 1, 3,5, 7, etc. will constitute the first half, and items numbered 2, 4, 6, 8, ... will form the second half.
4. The KR-21 Method
The previously mentioned methods to estimate test score reliability require a statistical procedure called ‘correlation’. Majority of teachers and non-professional test developers, however, are not quite familiar with statistics. Thus, they may have some problems in using statistical formulas and interpreting the outcome of statistical analyses. To overcome these problems, two statisticians – named Kuder and Richardson – developed a series of formulas to be used in statistics. One of these formulas is used to estimate test score reliability through simple mathematical operations. The formula is called KR-21, in which K and R refer to the first initials of the two statisticians and 21 refers to the number of the formula in the series. This formula is used to estimate the reliability of a single test given to one group of examinees through a single administration. This method requires only the testers and teachers to be able to calculate two simple statistical parameters. These parameters are (1) the mean and (2) variance. The methods of computing the mean and variance are explained in almost all introductory statistics books. However, for the purposes of clarification, a brief explanation of each parameter will be given here. For further information, interested readers are to consult statistics books.
(1) The Mean: The mean, commonly known as the average, is the most frequently used concept in statistics. It simply refers to a single score that best represents the scores of a group. If each score is symbolized as X, then the mean (represented by X and read X bar) will be computed by adding up all Xs and dividing the sum by the number of scores (represented by N). To represent the sum of scores, the Greek letter (Σ), read 'sigma' is used in statistics. Thus the statistical formula to compute the mean would be:
This simply means that add up all scores (ΣX) and divide it by the number of scores
(N). A numerical example may be helpful. Consider the scores of fifteen students who
took a language test:
98 89 78
97 89 73
95 84 70
93 82 60
90 82 50
To determine the mean score of the test, add the fifteen scores, that is, ΣX = 1230. Then, divide it by N, 15, to give X = 82.
(2) The Variance: The variance, represented by the letter V refers to the variation of scores around the mean. Although the formula for computing variance may seem cumbersome, it is not actually difficult. To avoid complexities, the formula will be explained as follows.The formula means to do the following operations:
1. Compute the mean (X)
2. Compute the deviation scores by subtracting the mean from each single score (X-X).
3. Square every deviation score (X-X)²
4. Add up all deviation scores squared Σ(X-X)²
5. Divide the result of step 4 by N-1
In order to clarify the computational procedures, a numerical example is given below.
Consider the scores of ten subjects on a short grammar test: 3, 2, 3, 4, 5, 5, 5, 6, 6, 8.
To compute the variance we follow the instructions given before:
1. Compute the mean.
2. Compute the deviation scores.
3. Square each deviation score.
X X X-X (X-X)2
3 4,7 -1,7 2,89
2 4,7 -2,7 7,29
3 4,7 -1,7 2,89
4 4,7 -0,7 0,49
5 4,7 -0,3 0,09
5 4,7 -0,3 0,09
5 4,7 -0,3 0,09
6 4,7 -1,3 1,69
6 4,7 -1,3 1,69
8 4,7 -3,3 10,89
4. Add up all deviation scores squared Σ(X-X)² = 28.10
5. Divide the result of step 4 by N-1
Computing the magnitudes of the mean and variance, we are now ready to put these
values in the KR-21 formula and get the reliability of the test scores. The formula is as
follows:
In this formula, K refers to the number of items in the test, X represents the mean of
test scores, and V is the variance of test scores. Again, a numerical example follows:
Suppose a one-hundred-item test is administered to a group of students. The mean and
variance computed to be 65 and 100, respectively. Reliability of the scores will be
computed using the KR-21 formula:
The procedure may seem a little complex, but with some practice, it will prove easy and very useful. This method is especially valuable for those who do not have a strong statistical background. From the four methods of estimating reliability, KR-21 method is the most practical and commonly used one. Therefore, it is recommended that teachers and administrators use this method. After covering the first characteristic of a good test, i.e., reliability, and the ways of estimating reliability, the next section is devoted to explaining the second characteristic, namely, validity.
Langganan:
Postingan (Atom)