Understanding Key Terminology in the Education Testing Process: A Comprehensive Guide

Words: 1540 Pages: 6

Table of Contents

Introduction
Testing Terms
Conclusion
Reference List

Introduction

In any education setting, testing forms an integral part of any learning process, it is one of the primary methods of ascertaining whether learners have mastered all the content learned in class. Because of the dynamic nature of the learning process, which results due to the differences in learners’ learning abilities, background, motivational levels, and experiences of testing can also help to ascertain areas of learners’ weaknesses. On the other hand, testing plays an important role in informing parents and guardians of their children’s level of educational performance.

Although most parents and other stakeholders receive results of learners’ performance, sometimes it is hard to understand the results, because of the terminologies that have been used. Therefore, to be able to interpret the results and offer learners the required guidance, one must know the testing terms that are commonly used during testing. In addition to parents, teachers and curriculum developers need to understand any testing terms used, as this may act as an important tool for facilitating the testing process. This paper will define and discuss the most primary terms, which are used in the education testing process, some of which include age equivalence, stanine, Z-score, standard score, raw score, reliability, percentile rank, and norm-referenced (Clark, 2010, pp. 1-2).

Testing Terms

One of the most commonly used terms in testing is age equivalency. Age equivalency primarily means comparing a child’s educational performance concerning a set standard performance score, in which learners of a certain age are supposed to score under normal learning conditions. For example, if a learner who is twelve years old gets a raw score of forty-five in an examination; a mark is required to be scored a learner who is fourteen years; then the age equivalence of such a student is fourteen. Age equivalent test scores help to consider the level of performance between learners of a certain age group and in ascertaining the level of performance of a learner.

Closely related to age equivalency is another term; chronological age. A learner’s chronological age is the exact age from the time of birth to a certain specific time that a learner is taking a specific test. Chronological age is one of the most important tools during the assessment because it gives educators an opportunity to using any test results to obtain different categories of test scores. Such information is important when it comes to educational placement or formulation of curriculum guides. To obtain the chronological age of a learner, one has to ascertain the exact date of birth followed by subtracting it from a specific time, in which a learner took a specific test (McPherson, 2011, p.1).

Another term that is commonly used in testing is criterion-referenced tests. It is used to refer to tests administered on learners to test the level of mastery of skills or to ascertain the effectiveness of the adopted teaching programs. This is the most common form of tests that are administered in most learning scenarios, because of the numerous skills and ideas that are learned within a specific period. Criterion-reference tests normally are designed with one hundred points (Valenzuela, 2010, p.1).

Another term that is used in the assessment is grade-level equivalency. This term is used to imply how typical learners of a certain level of learning should perform on an examination that has been administered. A learning institution’s grade-level performance is the median score of a test’s norm group. Grade equivalency primarily shows the grade level and the month, which learners scored a certain median value. For example, students examined in November to a norming group, of seventh grades, and the median score obtained was 480, the grade equivalent for a scale score 0f 480 on the examination is 7.2.

The “7” represents grade seven and “2” represents the month of November, as September marks the commencing of the school year. Closely related to grade-level equivalency is another term “norm-referenced tests”. These are standardized tests that are used to compare learner’s skills within a certain age group. The process of developing a norm-referenced test includes selecting test items, followed by administering them on learners, after which statistical and another set standardized and highly structured methods are used to gauge learners, after the norming process. Common examples of norm-referenced tests are multiple-choice tests that are used to test the basic skills of learners (Harvey, 2004, p.1).

A seventh term that is used in testing is percentile rank. It is used to mean the percentage of marks in a certain frequency distribution that is either the same or fall below a certain frequency distribution. For example, a test score that is above 25% of the overall frequency distribution of a cohort of learners who did an examination will be at the 25^th percentile rank. They are commonly applied to ascertain the percentage of the total frequency whose performance is below a certain percentile measure (Stockburger, 2009, p. 1). Before, making any interpretation of learners’ performance, raw scores have to be collected first.

Raw scores are the original scores of learners, which have not been converted into any form, for example, standardized scores. It is important for a test administered to obtain collect raw scores to be reliable, because of the significance of getting the same results when such tests are administered to other students of the same learning level. Reliability simply means the ability of a testing instrument or method that has been used to collect data to yield similar outcomes when used repeatedly. It is also important for a test to have some level of validity, being one of the primary ways of ensuring that a test achieves its primary goal.

The validity of a test means the ability of a test to assess all concepts it is supposed to assess (Key, 1997, p.1). After raw data has been collected and converted into any desirable form, different measures can be applied to the data, for example, standard deviation. Standard deviation is used to show the variation of the test scores from the average; hence, providing a mechanism of determining the amount of confidence to put on statistical conclusions (the University of Iowa, Department of Education, 2011, p. 1).

Using the values of standard deviation obtained, an one can calculate the standard error of measurement (SEM) using the following formula; SEM=S Ö(1 – R); where, S- represents the standard deviation obtained, r- is the reliability coefficient, and Ö- is the square root sign. SEM primarily means the value of error that is normally used in analyzing a learner’s test score. Learners’ test scores can also be analyzed using set standard scores (St Ambrose University, 2006, p.1).

Standard scores are set performance levels upon which a learner’s or a class’s test scores are gauged against. It obtained by calculating the difference between an entire class’s mean from an individual score. The formula for calculating the standard score is Z=(X-muoversigma); whereby, X-is the raw score, which needs standardization, mu- is the mean of the population and σ- is the standard deviation. To obtain reliable and dependable results educators should administer standard tests. These are a form of tests that are normally administered and marked and graded, and interpreted regularly, for example, the continuous assessment tests done after completion of every topic.

There exist three types of standards scores namely T-scores, stanines, and Z-scores. Z-scores are statistical measures that are used to show how single pieces of raw data compare with standard data. They usually show how many standard deviations a raw score is from the overall mean of the test results being analyzed. For example, if the mean of test scores is 1000, and the standard deviation is 200, a score of 900 would have a Z-score of (900-1000)/200=-0.5 (Popham, 1999, pp. 9-14 and Azzolino, 2005, p.1).

Just like Z-scores, a stanine is also a standard score, however, the stanine is used to scale test scores. Unlike Z-scores that can be negative or positive, stanine values are positive and range from one to nine. To get the stanine of a certain test score, one should divide the normal distribution of scores into nine intervals. Each of these distributions should have a width of a half a unit standard deviations, exclusive of the first and last ones. The process of getting a stanine of a test score starts with ranking test scores from the lowest to the highest, followed by assigning each test score a stanine value depending on where it falls. For example, if a class’s test scores are 8, 13, 18, 23, 28, 33, 38, 33, and 28, these values should be distributed between one and nine. In this, like a scenario, the stanine of test score 13 is 2 and that of 33 is stanine 8 (Azzolino, 2005, p.1).

Conclusion

In conclusion, it is of great significance educators to understand all terms that are used in testing, as they are very important tools in any testing process. Such should the case primarily because, different learners have different abilities; hence, the need to assess such needs and formulate strategies of meeting each learner’s needs and offering them some academic guidance. Also, the understanding of such terms is important for curriculum developers, because of the need to match a curriculum’s content with learners’ abilities.

Reference List

Azzolino, A. (2005). A standard normal distribution with nine specific intervals. Web.

Clark, D. (2010). Kirkpatrick’s four level training evaluation models. Web.

Harvey, L. (2004). Analytic quality glossary, quality research international. Web.

Key, James. (1997). Research design in occupational education. Oklahoma State University. Web.

McPherson, K.F. (2011). Chronological age. Web.

Popham, W.J. (1999). Why standardized tests don’t measure educational quality. Educational Leadership, 56(6), 8–15.

Stockburger, D. W. (2009). Introductory statistics: concepts, models, and application; Score transformations. Web.

St. Ambrose University. (2006). Psychological tests and assessment. St. Ambrose University. Web.

University of Iowa, College of Education. (2011). Interpreting test scores. University of Iowa. Web.

Valenzuela, S. (2010). What is criterion-referenced assessment? Web.