A validation framework for an online English language Exit Test: A case study using Moodle as an assessment management system

نتاج البحث: Doctoral Thesis

97 التنزيلات (Pure)


Technology-enhanced language tests are increasingly being hosted on course management systems (CMSs) like Moodle. Despite the increased use of CMS-hosted tests and the rising concerns over the reliability and construct validity of computerised tests due to a potential testing mode effect (Chapelle & Douglas, 2006; Fulcher, 2003), validation research on these tests is lacking. Therefore, this study seeks to fill this gap with empirical validation research using a case study of administering and validating a CMS-hosted test. The test was a technology-enhanced English Language Proficiency Exit Test that was hosted on Moodle (hereafter called Moodle-hosted test) and administered to a group of EFL students (N = 207) at Sultan Qaboos University in Oman. The overall aim of the study was to provide a validity argument about using a Moodle-hosted test for its intended purpose by empirically establishing reliability and construct validity evidence. To achieve this aim, a study framework was successfully applied following principles of the Assessment Use Argument (AUA) framework of Bachman (2005) and Bachman and Palmer (2010). Applying the framework as a pragmatic tool to conduct validation research led to the structuring of an evidencebased argument about test reliability and construct validity drawing on multiple sources of evidence (Kane, 1992) collected via mixed-method design.

The results of Rasch analysis revealed that a quarter of the test items, which were of the gap-filling type requiring typing of responses, were overly difficult and had high unacceptable measurement error values. Although the study outcomes demonstrated warrants of statistically acceptable reliability estimates, two threats to reliability and construct validity were identified: construct-irrelevance and construct under-representation. The overly difficult items introduced construct-irrelevant difficulty as some test takers found the construct difficult and the resulting scores might have been invalidly low. Thirty percent of the test items also had unacceptable fit statistics, suggesting that they did not contribute independently to test reliability and they inconsistently assessed student performances. Having items with unacceptable fit statistics indicated departure from unidimensionality, as the test might have measured construct-irrelevant sub-dimensions other than the single dimension of language proficiency. Construct under-representation was identified by finding gaps between item difficulty and person ability measures, suggesting that the test did not capture examinees’ ability levels well. As difficulty of the items did not match the ability levels of test takers, the test construct might have been under-represented by the set of items and better quality items might be needed to address a range of ability levels. With this evidence that the test had reliability and construct validity issues, the test scores might not be reliable and valid indicators of the target test construct. Further investigation examined a number of factors that could be potential sources of reliability and construct validity issues interfering with test performance results in the Moodle-hosted technology-enhanced testing mode.

Based on a comparison of test scores with examinees’ post-test questionnaire responses, the study revealed that test performance was significantly affected by the testing mode due to construct-irrelevant technology-related factors. These were strong rebuttals to reliability and construct validity claims in the validity argument. The study found that some construct-irrelevant technology-related variables significantly affected test performance including: 1) the familiarity and levels of technology experience of test takers, familiarity with Moodle tests, and computer-literacy; 2) the functionality of headphones during the exam; 3) test taker’s attitude towards the testing format; 4) the need to type responses for constructed-response test items; and 5) test time sufficiency and the use of a count-down timer. Other construct-irrelevant technology-related issues that did not significantly interfere with test performance were also considered as issues of concern, and these were: 1) screen layout and scrolling; 2) note-taking and text highlighting features; and 3) eye fatigue. Because negative evidence indicated that the testing mode effect threatened reliability and construct validity and created unfairness or bias issues, it was concluded in the validity argument that the Moodle-hosted score-based decisions cannot be justifiably reliable nor valid. The research questions were answered in the validity argument based on combined evidence from the study outputs, including test and post-test questionnaire responses. Therefore, a significant finding from this study was that statistical analysis of test responses alone is insufficient in developing computerised tests that are holistically fit for purpose.

This study contributes knowledge to the field as its findings lay out significant implications and recommendations about the testing mode effect. Practitioners and researchers may wish to adopt these implications and recommendations as guidelines for creating, developing, implementing, and researching reliable and valid large-scale high-stakes tests delivered on Moodle, other course management systems, or any other computerised test delivery tools. To ensure policy-makers are informed about whether using test outcomes can be justifiably fair to students, future validation research studies should be conducted so that potential issues with this testing mode can be further identified and addressed.
اللغة الأصليةEnglish
التأهيلDoctor of Philosophy
المؤسسة المانحة
  • School of Education, The University of Queensland, Australia
  • Hillier, Mathew , Supervisor, موظف خارجي
  • Iwashita, Noriko, Supervisor, موظف خارجي
  • Campbell, Chris , Supervisor, موظف خارجي
رعاة الأطروحة
تاريخ الجائزةديسمبر ٢٠ ٢٠١٧
المعرِّفات الرقمية للأشياء
حالة النشرPublished - ديسمبر 20 2017


أدرس بدقة موضوعات البحث “A validation framework for an online English language Exit Test: A case study using Moodle as an assessment management system'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا