Classroom English Language Test

Topic: Classroom Words: 1985 Pages: 7

Introduction

The modern globalized economy is characterized by high levels of worker mobility. Individuals are traveling far from home to improve their living conditions and find better jobs. English is considered to be the most popular and widespread language in the world due to its historical popularity and relative simplicity compared to other languages. It is the language of world trade as well (Unver & Sahin, 2017). Because of this, the role of English teaching, testing, and skill evaluation became even more important. Testing is considered an integral part of the teaching process, as it provides an input of data about the learner’s levels of growth, difficulties, anxieties, and predispositions towards different styles of learning.

Effective teaching is impossible without an effective testing system to accompany it. Modern teaching programs encompass numerous aims and goals, syllabuses, materials, and methods of teaching. Testing allows evaluating these frameworks based not only on the achievement of learners but also on the effectiveness of teaching materials and methods used. The purpose of this paper is to evaluate a classroom test and provide a comprehensive critique based on test items, test techniques, the overall goals behind both tests, and the academic literature on the subject.

Test Descriptions

The test evaluated in the scope of this paper is comprised of two parts. The first part presents four situations, which are similar as they ask the learner to talk about their personal experiences. It is meant for two students, with the teacher acting as an intermediary rather than an independent side in the dialogue. The situations are relatively simple, allowing the learner to choose a setting or a situation they are most familiar with, or one they could describe better. It is meant to be completed orally.

The second part of the test is to be taken between an interlocutor and a student. Although it is suggested that it is a speech task, the questions can be used for a written exam as well. The majority of the questions are related to Saudi Arabia and require a degree of preparation to answer, as the knowledge required to answer them is not very common. The questions are open-ended, but more complex in nature, as they ask about particular issues and situations, thus not allowing as much leeway and maneuvering for the student.

Aims and Purposes of the Test

The test provided for evaluation in the scope of this paper is a proficiency test. It aims to establish the learner’s readiness to use the language to fulfill a particular communicative role (Unver & Sahin, 2017). These tests are frequently used to get an overall feel for the taker’s language-speaking capabilities, in the broad term of the word. They usually involve a dialogue with no pre-determined subjects or parameters, they define language ability as a relatively stable trait.

Successful or failed passing of a proficiency test makes it possible to make predictions about the student’s language performance in the future. The presented test is a high-stakes test, which is inherently more difficult when compared to standard writing tests when students have the time to think and prepare their answers. However, there are some differences in goals between the tasks. The student-student part is aimed at facilitating conversations with individuals who may not know English well and being able to piece incomplete information to facilitate conversation (Unver & Sahin, 2017). The student-teacher part, on the other hand, is simpler in that regard, as the teacher speaks the language properly.

Judgment Criteria

Desheng and Varghese (2013) state that all language tests should be evaluated based on their usefulness to determine the level of knowledge of an individual student. They propose a list of criteria for grading the usefulness of language tests, which will be used in the scope of this paper. These criteria are as follows (Desheng & Varghese, 2013):

Reliability: The capability to determine the levels of student knowledge with relative accuracy and stability in the results. If a test cannot deliver accurate results constantly, it cannot be considered reliable.
Validity: One of the most important criteria in testing. It stands for the capability of tests to measure what they are supposed to measure. If the test fails to do so, then it is pointless.
Impact authenticity: The ability to influence both the learner and the teacher in the right direction. The results of the test must be hard to misinterpret.
Interactivity: Determines the number of ways a student can take to solve a particular task or problem. High levels of interactivity mean that the test allows the learner to act as they would in a real conversation. Low levels are more suited for grammar tests.
Practicality: The test should be relatively easy to administer, evaluate, and grade. It should not take too much time since most English classes are constrained on time that could be dedicated to evaluating each student.

To summarize, the criteria used for this evaluation are as follows: Usefulness = Reliability + Validity + Impact authenticity + Interactivity + Practicality. In addition, the test will be graded based on the language knowledge levels of the students.

Test Evaluation

Language Knowledge Levels

The test is meant for ESL English speakers with pre-operational levels of speaking, reading, and writing fluency. Preoperational levels are classified by the presence of influence of the native language on talking and writing patterns, the capability to use basic grammatical structures and sentence patterns, basic vocabulary ranges, relative language slowness during speaking, and relatively high comprehension in common topics (McArthur, Lam-McArthur, & Fontaine, 2018). Responses are typically quick when the subject is familiar, but any derivations from the topic can produce pauses and inadequacy in responses. Both tasks are simple enough, as answering the questions offered in all scenarios does not require advanced understanding or knowledge. A student should be able to hold a conversation while using only the basic sentence structure and vocabulary.

Reliability

Proficiency tests are supposed to provide reliable results in at least one of the following scales: pronunciation, fluency, grammar, and overall comprehensibility. According to Boldt (1992), both tasks presented in the scope of this paper have score relatively high in providing accurate results based on all four scales. These results were achieved by examining 1,528 students using a double-blinded cross-examination. Two raters were used to experiment. In all cases, both raters have shown high levels of homogeneity in grading results. In 91-95% of cases, grading scores were relatively close to one another. Therefore, the test is reliable, due to being simple and, efficient. No significant advantage could be given to either portion of the test.

Validity

The items used in the test seek to test the communicative capabilities of the student. They were constructed using a structuralist approach. The validity of the method is supported by the paradigm that even if a student knows how to properly construct sentences, utilize grammar, and recognize the sound system of the language, they still will not be able to communicate without having any communicative competence (Chowdhury & Sultana, 2016). Both tasks are supposed to help evaluate said communicative competence as well as its components:

Grammatical competence. Both items enable the teacher to evaluate a student’s grammatical competence by observing the grammatical structure of their replies. Since the questions offered to the students are open-ended, a student cannot rephrase them into answers and use them as a crutch.
Sociolinguistic competence. Both items put the students in imaginary situations and environments, which suggest different kinds of appropriate language use.
Strategic competence. The student-student item requires a higher level of strategic competence, as it requires a capability to compensate for incomplete or imperfect linguistic sources.

Overall, both items have high validity ratings, as they directly test out the communicative competence of the students on all three levels of communication.

Impact Authenticity

Impact authenticity is heavily dependent on teachers conducting the evaluation. Depending on their grading criteria, the perceived results of student performance could be different. Nevertheless, both items evaluated in the scope of this paper provide plenty of opportunities for the teachers to evaluate and interpret the students’ capabilities of speaking and understanding the language. The presented student-teacher item is superior in that regard due to the relative uniformity of answers in comparison to the student-student test. Both tasks have acceptable levels of impact authenticity.

Interactivity

Both items show relatively high levels of interactivity. The student-student task, however, is more interactive since the taker has an option of choosing the subject to talk about, within the parameters of the question. For example, Prompt #2A offers the student to tell about a day when he or she received good news. It offers a myriad of situations for the student to choose from, including making a situation up to utilize his available vocabulary. In contrast, the student-teacher item is more complicated and less flexible. One of the questions in Set G is “What rules should foreign people follow when they are invited to a wedding in Saudi Arabia?” This question allows for a degree of flexibility for the student, but the bulk of the information required would largely be static. Therefore, the student-student item has a higher degree of interactivity.

Practicality

Astawa, Mantra, and Vidiastuti (2017) state that the practicality of the tests lies in the ability to apply them in a learning setting. However, the researchers also highlight the importance of creating realistic and feasible test scenarios that could replicate potential situations happening in real life. Student-student tests presented in this paper have high practicality value in both regards. They are relatively short and therefore easy to administrate. In addition, they offer helpful and realistic scenarios, such as talking about a restaurant, discussing good and bad news, and other matters. The student-teacher test is also highly practical, though in a different way. It focuses on the student’s cultural heritage and asks them to tell more about their country. Not only does it help reinforce a student’s national identity but also provides practice for communication with other cultures.

Recommendations

The test presented in the scope of this paper utilizes tried and true methods of student evaluation, which have been used for a very long time. They can be improved, however, depending on the situation. The student-student item, while providing plenty of interactivity, also encourages the students to take the path of least resistance and choose a subject they are most proficient with, thus creating a false picture of proficiency (Morrow, 2018). They should be framed to particular subjects based on previous assessments, to test the students where they are the weakest.

The student-teacher item is more elaborate and provides a plethora of open-ended and closed-ended questions, which could be used to test the student’s speaking capability from all angles. Nevertheless, a good half of these questions require very specific knowledge, which a student may or may not possess. A test should not assume one’s cultural identity based on their country of origin. Some of the questions presented should be changed into more universal ones. It would also make the test applicable to all students, not just the ones coming from Saudi Arabia.

Conclusions

Both the student-student and the teacher-student tests presented in the scope of this paper have scored reasonably high using the five criteria of usefulness: Reliability, Validity, Impact authenticity, Interactivity, and Practicality. The teacher-student part of the test scored higher on impact authenticity and practicality, whereas the student-student item offered greater interactivity. At the same time, the former was noted to be a bit rigid in its formal evaluation of student knowledge, whereas the latter was too liberal, with an emphasis on strategic competence rather than socio-linguistic or grammatical competencies. If these items would be used as stand-alone assessment tools, the teachers would be required to account for their limitations. Together, however, they manage to cover for each other’s weaknesses and provide the teachers with a well-rounded tool of student evaluation and analysis.

References

Astawa, I. N., Mantra, I. B. N., & Widiastuti, I. A. M. S. (2017). Developing communicative English language tests for tourism vocational high school students. International Journal of Social Sciences and Humanities, 1(2), 58-64.

Boldt, R. F. (1992). Reliability of the test of spoken English revisited. ETS Research Report Series, 1992(2), 1-22.

Chowdhury, M. A., & Sultana, R. (2016). A study of the validity of English language testing at the higher secondary level in Bangladesh. International Journal of Applied Linguistics & English Literature, 5(6), 64-75.

Desheng, C., & Varghese, A. (2013). Testing and evaluation of language skills. IOSR Journal of Research & Method in Education, 1(2), 31-33.

McArthur, T., Lam-McArthur, J., & Fontaine, L. (2018). The Oxford companion to the English language (2nd ed.). Oxford, UK: Oxford University Press.

Morrow, C. K. (2018). Communicative language testing. New York, NY: Wiley.

Unver, M. M., & Sahin, M. D. (2017). An exploration of the effectiveness of a language proficiency test for student mobility. European Journal of Foreign Language Teaching, 2(2), 158-169.