Notes: Discourse classification into rhetorical functions

Reference: Cotos, E., & Pendar, N. (2016). Discourse classification into rhetorical functions for AWE feedback. calico journal, 33(1), 92.

Background:

  • Computational techniques can be exploited to provide individualized feedback to learners on writing.
  • Genre analysis on writing to identify moves (communicative goal) and steps (rhetorical functions to help achieve the goal) [Swales, 1990].
  • Natural language processing (NLP) and machine learning categorization approach are widely used to automatically identify discourse structures (E.g. Mover, prior work on IADE).

Purpose:

  • To develop an automated analysis system ‘Research Writing Tutor‘ (RWT) for identifying rhetorical structures (moves and steps) from research writing and provide feedback to students.

Method:

  • Sentence level analysis – Each sentence classified to a move, step within the move.
  • Data: Introduction section from 1020 articles – 51 disciplines, each discipline containing 20 articles, total of 1,322,089 words.
  • Annotation Scheme:
    • 3 moves, 17 steps – Refer Table 1 from the original paper for detailed annotation scheme (Based on the CARS model).
    • Manual annotation using XML based markup by the Callisto Workbench.
  • Supervised learning approach steps:
    1. Feature selection:
      • Important features – unigrams, trigrams
      • n-gram feature set contained 5,825 unigrams and 11,630 trigrams for moves, and 27,689 unigrams and 27,160 trigrams for steps.
    2. Sentence representation:
      • Each sentence is represented as a n-dimensional vector in the R^n Euclidean space.
      • Boolean representation to indicate presence or absence of feature in sentence.
    3. Training classifier:
      • SVM model for classification.
      • 10-fold cross validation.
      • precision higher than recall – 70.3% versus 61.2% for the move classifier and 68.6% versus 55% for the step classifier – objective is to maximize accuracy.
      • RWT analyzer has two cascaded SVM – move classifier followed by step classifier.

Results:

  • Move and step classifiers predict some elements better than the others (Refer paper for detailed results):
    • Move 2 most difficult to identify (sparse training data).
    • Move 1 gained best recall- less ambiguous cues.
    • 10 out of 17 steps were predicted well.
    • Overall move accuracy of 72.6% and step accuracy of 72.9%.

Future Work:

  • Moving beyond sentence level to incorporate context information and sequence of moves/steps.
  • Knowledge-based approach for hard to identify steps – hand written rules and patterns.
  • Voting algorithm using independent analyzers.