research articles – Shibani’s blog

Tools for automated rhetorical analysis of academic writing

Alert – Long post!

In this post, I’m presenting a summary of my review on tools for automatically analyzing rhetorical structures from academic writing.

The tools considered are designed to cater to different users and purposes. AWA and RWT aim to provide feedback for improving students’ academic writing. Mover and SAPIENTA on the other hand, are to help researchers identify the structure of research articles. ‘Mover’ even allows users to give a second opinion on the classification of moves and add new training data (This can lead to a less accurate model if students with less expertise add potentially wrong training data). However, these tools have a common thread and fulfill the following criteria:

They look at scientific text – Full research articles, abstracts or introductions. Tools to automate argumentative zoning of other open text (Example) are not considered.
They automate the identification of rhetorical structures (zones, moves) in research articles (RA) with sentence being the unit of analysis.
They are broadly based on the Argumentative Zoning scheme by Simone Teufel or the CARS model by John Swales (Either the original schema or modified version of it).

Tools (in alphabetical order):

Academic Writing Analytics (AWA) – Summary notes here

AWA also has a reflective parser to give feedback on students’ reflective writing, but the focus of this post is on the analytical parser. AWA demo, video courtesy of Dr. Simon Knight:

Mover – Summary notes here

Available for download as a stand alone application. Sample screenshot below:

Research Writing Tutor (RWT) – Summary notes here

RWT demo, video courtesy of Dr. Elena Cotos:

SAPIENTA – Summary notes here.

Available for download as a stand alone java application or can be accessed as a web service. Sample screenshot of tagged output from SAPIENTA web service below:

Annotation Scheme:

The general aim of the schemes used is to be applicable to all academic writing and this has been successfully tested across data from different disciplines. A comparison of the schemes used by the tools is shown in the below table:

Tool	Source & Description	Annotation Scheme
AWA	AWA Analytical scheme (Modified from AZ for sentence level parsing)	-Summarizing -Background knowledge -Contrasting ideas -Novelty -Significance -Surprise -Open question -Generalizing
Mover	Modified CARS model -three main moves and further steps	1. Establish a territory -Claim centrality -Generalize topics -Review previous research 2. Establish a niche -Counter claim -Indicate a gap -Raise questions -Continue a tradition 3. Occupy the niche -Outline purpose -Announce research -Announce findings -Evaluate research -Indicate RA structure
RWT	Modified CARS model -3 moves, 17 steps	Move 1. Establishing a territory -1. Claiming centrality -2. Making topic generalizations -3. Reviewing previous research Move 2. Identifying a niche -4. Indicating a gap -5. Highlighting a problem -6. Raising general questions -7. Proposing general hypotheses -8. Presenting a justification Move 3. Addressing the niche -9. Introducing present research descriptively -10. Introducing present research purposefully -11. Presenting research questions -12. Presenting research hypotheses -13. Clarifying definitions -14. Summarizing methods -15. Announcing principal outcomes -16. Stating the value of the present research -17. Outlining the structure of the paper
SAPIENTA	finer grained AZ scheme -CoreSC scheme with 11 categories in the first layer	-Background (BAC) -Hypothesis (HYP) -Motivation (MOT) -Goal (GOA) -Object (OBJ) -Method (MET) -Model (MOD) -Experiment (EXP) -Observation (OBS) -Result (RES) -Conclusion (CON)

Method:

The tools are built on different data sets and methods for automating the analysis. Most of them use manually annotated data as a standard for training the model to automatically classify the categories. Details below:

Tool	Data type	Automation method
AWA	Any research writing	NLP rule based - Xerox Incremental Parser (XIP) to annotate rhetorical functions in discourse.
Mover	Abstracts	Supervised learning - Naïve Bayes classifier with data represented as bag of clusters with location information.
RWT	Introductions	Supervised learning using Support Vector Machine (SVM) with n-dimensional vector representation and n-gram features.
SAPIENTA	Full article	Supervised learning using SVM with sentence aspect features and Sequence Labelling using Conditional Random Fields (CRF) for sentence dependencies.

Others:

SciPo tool helps students write summaries and introductions for scientific texts in Portuguese.
Another tool CARE is a word concordancer used to search for words and moves from research abstracts- Summary notes here.
A ML approach considering three different schemes for annotating scientific abstracts (No tool).

If you think I’ve missed a tool which does similar automated tagging in research articles, do let me know so I can include it in my list 🙂

Notes: Discourse classification into rhetorical functions

Reference: Cotos, E., & Pendar, N. (2016). Discourse classification into rhetorical functions for AWE feedback. calico journal, 33(1), 92.

Background:

Computational techniques can be exploited to provide individualized feedback to learners on writing.
Genre analysis on writing to identify moves (communicative goal) and steps (rhetorical functions to help achieve the goal) [Swales, 1990].
Natural language processing (NLP) and machine learning categorization approach are widely used to automatically identify discourse structures (E.g. Mover, prior work on IADE).

Purpose:

To develop an automated analysis system ‘Research Writing Tutor‘ (RWT) for identifying rhetorical structures (moves and steps) from research writing and provide feedback to students.

Method:

Sentence level analysis – Each sentence classified to a move, step within the move.
Data: Introduction section from 1020 articles – 51 disciplines, each discipline containing 20 articles, total of 1,322,089 words.
Annotation Scheme:
- 3 moves, 17 steps – Refer Table 1 from the original paper for detailed annotation scheme (Based on the CARS model).
- Manual annotation using XML based markup by the Callisto Workbench.
Supervised learning approach steps:
1. Feature selection:
  - Important features – unigrams, trigrams
  - n-gram feature set contained 5,825 unigrams and 11,630 trigrams for moves, and 27,689 unigrams and 27,160 trigrams for steps.
2. Sentence representation:
  - Each sentence is represented as a n-dimensional vector in the R^n Euclidean space.
  - Boolean representation to indicate presence or absence of feature in sentence.
3. Training classifier:
  - SVM model for classification.
  - 10-fold cross validation.
  - precision higher than recall – 70.3% versus 61.2% for the move classifier and 68.6% versus 55% for the step classifier – objective is to maximize accuracy.
  - RWT analyzer has two cascaded SVM – move classifier followed by step classifier.

Results:

Move and step classifiers predict some elements better than the others (Refer paper for detailed results):
- Move 2 most difficult to identify (sparse training data).
- Move 1 gained best recall- less ambiguous cues.
- 10 out of 17 steps were predicted well.
- Overall move accuracy of 72.6% and step accuracy of 72.9%.

Future Work:

Moving beyond sentence level to incorporate context information and sequence of moves/steps.
Knowledge-based approach for hard to identify steps – hand written rules and patterns.
Voting algorithm using independent analyzers.

Notes: XIP – Automated rhetorical parsing of scientific metadiscourse

Reference: Simsek, D., Buckingham Shum, S., Sandor, A., De Liddo, A., & Ferguson, R. (2013). XIP Dashboard: visual analytics from automated rhetorical parsing of scientific metadiscourse. In: 1st International Workshop on Discourse-Centric Learning Analytics, 8 Apr 2013, Leuven, Belgium.

Background:

Learners should have the ability to critically evaluate research articles and be able to identify the claims and ideas in scientific literature.

Purpose:

Automating analysis of research articles to identify evolution of ideas and findings.
Describing the Xerox Incremental Parser (XIP) which identifies rhetorically significant structures from research text.
Designing a visual analytics dashboard to provide overviews of the student corpus.

Method:

Argumentative Zoning (AZ) to annotate moves in research articles by Simone Teufel.
Rhetorical moves tagged by XIP – partly overlap and partly different from AZ scheme: SUMMARIZING, BACKGROUND KNOWLEDGE, CONTRASTING IDEAS, NOVELTY, SIGNIFICANCE, SURPRISE, OPEN QUESTION, GENERALIZING
Sample discourse moves:
- Summarizing: “The purpose of this article….”
- Contrasting ideas: “With an absence of detailed work…”
  - Sub-classes: novelty, surprise, importance, emerging issue, open question
XIP outputs a raw output file containing semantic tags and concepts extracted from text.
Data: Papers from LAK & EDM conferences and journal – 66 LAK and 239 EDM papers extracting 7847 sentences and 40163 concepts.
Dashboard design – Refer original paper to see the process involved in prototyping the visualizations.

Tool:

XIP is now embedded in the Academic Writing Analytics (AWA) tool by UTS. AWA provides analytical and reflective reports on students’ writing.

Reading research articles

Reading research articles can be a daunting task for new students. Even after reading many articles over the last few years, I still take time to read, understand and critically evaluate research articles (Takes double the time for theoretical ones, since I’m from a technical background). I’m no expert on this topic (or any topic for that matter :p), but I thought this post could be useful for fellow students who toil with research papers just like me. The post is going to be a combination of few tips from my own experience plus a useful course I attended at UTS by Dr. Terry Royce (Reading & writing for your Literature Review: Getting started and what to look for).

The first thing to keep in mind while reading articles is that it is a time consuming process. So do not get dejected if it takes longer than your allotted time. Not everyone reads in the same pace as you. Give yourself more time, especially if you’re reading a new topic. Your reading skills will definitely improve with experience.

Concentration is key, so take a break and refresh your mind if you’re stuck with an article for very long. How I wish it was as easy as reading a fiction novel for hours with absolute concentration… Sigh!!

Read articles in whichever form that is comfortable for you – either soft or hard copy is your choice. I recently moved from hard copy to soft copy format since it is more convenient to look up my notes anytime and easily portable. I still print important articles and make them ugly with highlights though 🙂 Managing and organising all articles you’ve read/ going to read is another arduous task, for which you probably have to plan early!

Now for the ‘real strategies’ for reading:

Read widely and extensively. When you get a fuzzy boundary sense after extensive reading (that the article doesn’t add anything new), that’s when you stop. PhD students might want to read over 300 articles before writing their thesis 😮
Learn ‘purposeful focused reading’ – you don’t necessarily have to read a whole book if you only need a chapter of it. Similarly, you can only read what you need in an article.
Employ these reading strategies:
- Reviewing (looking at title, keywords and flipping through)
- Skimming (for an overview)
- Scanning (locating specific information or ideas)
- Reading analytically (text structure, categories, hierarchies)
- Close reading (observing details)
- Reading critically (connecting what you read to what you know)
Identify the key features and the research arguments from the paper.
Look out for the important and relevant details from different entry points:
- Abstract – What is the issue/project/question? What are the methods/argument/point of view? What are the results and implications?
- Introduction – What is the topic area? What are the definitions, issue parameters and stages?
- Conclusion – What are the general areas and specifics covered? Is the focus from the introduction reiterated? What are the implications?
- Thesis – What is the claim made?
- Evidence – Is the evidence presented, interpreted and connected to the claim?
- Rhetorical Staging – How is the article rhetorically staged? (You should be able to draw a diagram to build and reinforce the points stated, if the article is well written)
- Alternative views – Are there other points of view or counter arguments?
Think critically by learning to hear your own voice amidst the authors’ voices in the paper, and be sceptic (not cynic) – This critical thinking skill is another topic by itself!
Get started, summarise relevant points, assess the claims around your research and reflect to make critical judgements.

Happy Reading 😀