rhetoric – Shibani’s blog

Reference: Cotos, E., & Pendar, N. (2016). Discourse classification into rhetorical functions for AWE feedback. calico journal, 33(1), 92.

Background:

Computational techniques can be exploited to provide individualized feedback to learners on writing.
Genre analysis on writing to identify moves (communicative goal) and steps (rhetorical functions to help achieve the goal) [Swales, 1990].
Natural language processing (NLP) and machine learning categorization approach are widely used to automatically identify discourse structures (E.g. Mover, prior work on IADE).

Purpose:

To develop an automated analysis system ‘Research Writing Tutor‘ (RWT) for identifying rhetorical structures (moves and steps) from research writing and provide feedback to students.

Method:

Sentence level analysis – Each sentence classified to a move, step within the move.
Data: Introduction section from 1020 articles – 51 disciplines, each discipline containing 20 articles, total of 1,322,089 words.
Annotation Scheme:
- 3 moves, 17 steps – Refer Table 1 from the original paper for detailed annotation scheme (Based on the CARS model).
- Manual annotation using XML based markup by the Callisto Workbench.
Supervised learning approach steps:
1. Feature selection:
  - Important features – unigrams, trigrams
  - n-gram feature set contained 5,825 unigrams and 11,630 trigrams for moves, and 27,689 unigrams and 27,160 trigrams for steps.
2. Sentence representation:
  - Each sentence is represented as a n-dimensional vector in the R^n Euclidean space.
  - Boolean representation to indicate presence or absence of feature in sentence.
3. Training classifier:
  - SVM model for classification.
  - 10-fold cross validation.
  - precision higher than recall – 70.3% versus 61.2% for the move classifier and 68.6% versus 55% for the step classifier – objective is to maximize accuracy.
  - RWT analyzer has two cascaded SVM – move classifier followed by step classifier.

Results:

Move and step classifiers predict some elements better than the others (Refer paper for detailed results):
- Move 2 most difficult to identify (sparse training data).
- Move 1 gained best recall- less ambiguous cues.
- 10 out of 17 steps were predicted well.
- Overall move accuracy of 72.6% and step accuracy of 72.9%.

Future Work:

Moving beyond sentence level to incorporate context information and sequence of moves/steps.
Knowledge-based approach for hard to identify steps – hand written rules and patterns.
Voting algorithm using independent analyzers.

Tag: rhetoric