العالم بين يديك: Automated Essay Scoring: Applications to Educational Technology

الخميس، 24 مارس 2016

Automated Essay Scoring: Applications to Educational Technology

The Intelligent Essay Assessor (IEA) is a set of software tools for scoring the quality of essay content. The IEA uses Latent Semantic Analysis (LSA), which is both a computational model of human knowledge representation and a method for extracting semantic similarity of words and passages from text. Simulations of psycholinguistic phenomena show that LSA reflects similarities of human meaning effectively. To assess essay quality, LSA is first trained on domain-representative text. Then student essays are characterized by LSA representations of the meaning of their contained words and compared with essays of known quality on degree of conceptual relevance and amount of relevant content. Over many diverse topics, the IEA scores agreed with human experts as accurately as expert scores agreed with each other. Implications are discussed for incorporating automatic essay scoring in more general forms of educational technology.

Introduction

While writing is an essential part of the educational process, many instructors find it difficult to incorporate large numbers of writing assignments in their courses due to the effort required to evaluate them. However, the ability to convey information verbally is an important educational achievement in its own right, and one that is not sufficiently well assessed by other kinds of tests. In addition, essay-based testing is thought to encourage a better conceptual understanding of the material on the part of students and to reflect a deeper, more useful level of knowledge and application by students. Thus grading and criticizing written products is important not only as an assessment method, but also as a feedback device to help students better learn both content and the skills of thinking and writing. Nevertheless, essays have been neglected in many computer-based assessment applications since there exist few techniques to score essays directly by computer. In this paper we describe a method for performing automated essay scoring of the conceptual content of essays. Based on a statistical approach to analyzing the essays and content information from the domain, the technique can provide scores that prove to be an accurate measure of the quality of essays.

The text analysis underlying the essay grading schemes is based on Latent Semantic Analysis (LSA). Detailed treatments of LSA, both as a theory of aspects of human knowledge acquisition and representation, and as a method for the extraction of semantic content of text are beyond the scope of this article. They are fully presented elsewhere (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990; Landauer & Dumais, 1997; Landauer, Foltz & Laham, 1998), as are a number of simulations of cognitive and psycholinguistic phenomena that show that LSA captures a great deal of the similarity of meanings expressed in discourse (Rehder, Schreiner, Wolfe, Laham, Landauer, & Kintsch, 1998 ; Wolfe, Schreiner, Rehder, Laham, Foltz, Kintsch, & Landauer, 1998).

The LSA similarity between words and passages is measured by the cosine of their contained angle in a 300-dimensional "semantic space". The LSA measured similarities have shown to closely mimic human judgments of meaning similarity and human performance based on such similarity in a variety of ways. For example, after training on about 2,000 pages of English text, it scored as well as average test-takers on the synonym portion of TOEFL–the ETS Test of English as a Foreign Language (Landauer & Dumais, 1997). After training on an introductory psychology textbook, it achieved passing scores on two different multiple-choice exams used in introductory psychology courses (Landauer, Foltz & Laham, in preparation). This similarity comparison made by LSA is the basis for performing automated scoring of essays through comparing the similarity of meaning between essays.