Ch.+37+by+Jill+Berstein+&+Martin+Chodorow


 * Chapter 37 **


 * Directions in automated essay analysis **

by Jill Burstein & Martin Chodorow

The use of computers to analyze essays was practiced since the early 1960s. Recently computers score essays in high-stakes testing such as college entrance exams, and in lower-stakes Web-based writing practice. Computer systems have been developed for holistic scoring assigning a single numerical score to the quality of writing an essay. An important issue for all forms of evaluation is test validity, in this case how adequately does computer-based scoring represent the underlying aspects of assessment. When people rate essays they follow a set of scoring criteria based on the writer’s organization of ideas, construction and vocabulary. The challenge for computational linguistics is to develop methods to identify these features of writing, and to combine it into a single score. The computational analyses of the lexical, syntactical and discourse structure is done within the context of three systems: WWB – Writer’s Workbench IEA – Intelligent Essay Assessor E-rater – Easy rater

The Writer’s Workbench (WWB) is a computer tool designed for text analysis. It analyzes text under three main categories 1. Proofreading, 2. Stylistic analysis 3. English usage. It also looks for violation of rules characterizing good writing. IEA and E-rater assess essays comparing them with other essays that have been judged by human readers. For example in holistic scoring, E-rater is trained on a few hundred essays that represent the full range of scores assigned by readers. IEA and E –rater are used in operational applications.. E-rater’s assessments correspond with human reader approximately 92% of the time.

Lexical content – overall word frequency - is assessed with the use of WWB by computing the percentage of words that can be found in the list of abstract words according to psychological research. Topical content is evaluated by E-rater using the comparison of the essay vocabulary with the vocabulary from manually graded essays. The writers of good essays should use similar words. Resemblance is computed by the convent vector analysis (CVA). CVA converts the essay into word frequency vector. Some gaps may occur because of the failure to recognize synonyms. This problem is fixed by with the use of latent semantic analysis (LSA) which can generalize across synonyms.
 * Content Analysis **

Syntactic analysis uses tools such as tagging words for part-of-speech, linking words into phrases and sentences. And how these tools are applied determines the way of essay scoring. This type of analysis deals with text heuristics, rhetorical structure, etc.
 * Syntactic Analysis **

<span style="font-family: Arial,sans-serif;">As the name speaks for itself, discourse analysis deals with discourse. One of the tools used here is E-rater which identifies discourse cue words, various relationship present in the text and other features of discourse.
 * <span style="font-family: Arial,sans-serif;">Discourse Analysis **

<span style="font-family: Arial,sans-serif;">There are always some differences between the predicted and reader-assigned scores. These differences are the base of R2 which is used for measuring the validity of a holistic scoring model. It is //the proportion of variation in writing scores for which the model accounts.// <span style="font-family: Arial,sans-serif;">There exist two types of validity - statistical and conceptual. In statistical validity the reader takes into account the length, while conceptual validity concentrates more on the concept. But the best choice is to have a balance of these two types.
 * <span style="font-family: Arial,sans-serif;">Assessing Validity **

<span style="font-family: Arial,sans-serif;">For future development and improvement of teaching and learning writing there is a need to produce a text analysis with instructional feedback. And different grammar checkers in current word-processing software, WWB, as well as proofreading somewhat serve this purpose.
 * <span style="font-family: Arial,sans-serif;">Diagnostic Analysis **

<span style="font-family: Arial,sans-serif;">E-rater was developed to predict a holistic score. However the diagnostic analysis was to assess more specific information about certain elements of writing. According to a study conducted by Burstein, Wolff, Breland and Kubota studies it was revealed that E-rater could rate the individual features of writing in essays and the ratings could be used for diagnostic feedback and instructions. Burstein, Wolff, Breland and Kubota found out via their study that E-rater could be in agreement with the human reader. Moreover, there can be even an exact agreement (in terms of rhetorical strategy).
 * <span style="font-family: Arial,sans-serif;">Computational Modeling of Writing Components **


 * <span style="font-family: Arial,sans-serif;">Feedback in the Form of Summaries **

<span style="font-family: Arial,sans-serif;">Automated text summarization techniques can be used to generate text abstracts. According to Marcu (1999) salient summaries can be generated automatically from text on the basis of their rhetorical structure. Moreover, E-rater’s lexical content analysis of noise-reduced, cleaned-up summaries produces performance which can be compared with that of the full-text versions (Burstein and Marcu 2000b). Essay summaries can be used: 1. refinement of this technique could generate essay outline 2. the information in the summaries could be used to evaluate information in subject-based essays.
 * <span style="font-family: Arial,sans-serif;">Future Directions **

<span style="font-family: Arial,sans-serif;">E-rater can also score domain specific prompts with reliability which can be compared to that achieved by the readers (Boodoo and Burstein in press). <span style="font-family: Arial,sans-serif;">Future research should continue raising the validity of automated scoring so that computer-based methods of essay analysis will follow the educational goals in writing instruction and so that systems can be appropriately present the underlying concepts of writing assessment.