Reference: Cotos, E., & Pendar, N. (2016). Discourse classification into rhetorical functions for AWE feedback. calico journal, 33(1), 92.
- Computational techniques can be exploited to provide individualized feedback to learners on writing.
- Genre analysis on writing to identify moves (communicative goal) and steps (rhetorical functions to help achieve the goal) [Swales, 1990].
- Natural language processing (NLP) and machine learning categorization approach are widely used to automatically identify discourse structures (E.g. Mover, prior work on IADE).
- To develop an automated analysis system ‘Research Writing Tutor‘ (RWT) for identifying rhetorical structures (moves and steps) from research writing and provide feedback to students.
- Sentence level analysis – Each sentence classified to a move, step within the move.
- Data: Introduction section from 1020 articles – 51 disciplines, each discipline containing 20 articles, total of 1,322,089 words.
- Annotation Scheme:
- 3 moves, 17 steps – Refer Table 1 from the original paper for detailed annotation scheme (Based on the CARS model).
- Manual annotation using XML based markup by the Callisto Workbench.
- Supervised learning approach steps:
- Feature selection:
- Important features – unigrams, trigrams
- n-gram feature set contained 5,825 unigrams and 11,630 trigrams for moves, and 27,689 unigrams and 27,160 trigrams for steps.
- Sentence representation:
- Each sentence is represented as a n-dimensional vector in the R^n Euclidean space.
- Boolean representation to indicate presence or absence of feature in sentence.
- Training classifier:
- SVM model for classification.
- 10-fold cross validation.
- precision higher than recall – 70.3% versus 61.2% for the move classifier and 68.6% versus 55% for the step classifier – objective is to maximize accuracy.
- RWT analyzer has two cascaded SVM – move classifier followed by step classifier.
- Feature selection:
- Move and step classifiers predict some elements better than the others (Refer paper for detailed results):
- Move 2 most difficult to identify (sparse training data).
- Move 1 gained best recall- less ambiguous cues.
- 10 out of 17 steps were predicted well.
- Overall move accuracy of 72.6% and step accuracy of 72.9%.
- Moving beyond sentence level to incorporate context information and sequence of moves/steps.
- Knowledge-based approach for hard to identify steps – hand written rules and patterns.
- Voting algorithm using independent analyzers.