CHI’24 research publications

I attended the prestigious Human-Computer Interaction (HCI) conference CHI’24 at Hawaii, Honolulu in May 2024. While I’m quite familiar with the field of HCI, it was my first time attending the conference because it is much wider than my main area of research (Learning Analytics, AI in education, and Writing Analytics). The sheer scale of the conference (~2K to 3K attendees) and the broad range of topics it covers (check out this full program) is almost impossible to fully grasp!

TLDR; Go to the end for the list of paper from CHI’24.

My personal highlight was the Intelligent Writing Assistants Workshop, which was running for the third time at CHI, organized by a bunch of fun people who are all super keen about researching the use of AI to assist writing. Picture from our workshop below (thanks, Theimo, for the LinkedIn post)

Pictured: Participants of the Intelligent Writing Assistants CHI’24 workshop at the end of the session

The workshop had many mini presentations on the overall theme of Dark Sides: Envisioning, Understanding, and Preventing Harmful Effects of Writing Assistants. I presented my work with Prof. Simon Buckingham Shum on AI-Assisted Writing in Education: Ecosystem Risks and Mitigations, where we examined key factors (in the broader socio-technical ecosystem which are often hidden) that need consideration for implementing AI writing assistants at scale in educational contexts.


This was actually a deep dive into the Ecosystem aspect of a larger piece of work we presented at CHI on A Design Space for Intelligent and Interactive Writing Assistants. The full design space from our full paper mapped the space of intelligent writing assistants reviewing 115 papers from HCI and NLP, with a team of 36 authors, led by Mina Lee.

Figure: Design space for intelligent and interactive writing assistants consisting of five key aspects—task, user, technology, interaction, and ecosystem from our full paper.

An interactive tool is also presented to explore the literature in detail.


I also had a late-breaking work poster presentation on Critical Interaction with AI on Written Assessment (I have a seperate post about it!) where we explored how students engaged with generative AI tools like ChatGPT for their writing tasks, and if they were able to navigate this interaction critically.

A cherished memory to hold on to was also the time I spent with my friend Vanessa, who is currently a Research Fellow at Monash university during this trip in Hawaii. Vanessa and I started our PhD together at th Connected Intelligence Centre at UTS ~8 years ago, and it was really nice to catch up after a long time (along with few others). I had also just visited Monash university’s CoLAM a week before for a talk and meeting fellow Learning Analytics researchers, hosted by her and Roberto. The group do interesting work in Learning Analytics that is worth checking out.

6 years apart… On the left: Vanessa and I in 2018 while attending AIED/ ICLS 2018 in London; On the right: Us while attending CHI in 2024 in Hawaii.


TLDR -> Research publications:

Here are all the papers from the work we presented at CHI’24:

Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L.C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman, Sitong Wang, Antoine Bosselut, Daniel Buschek, Joseph Chee Chang, Sherol Chen, Max Kreminski, Joonsuk Park, and Roy Pea, Eugenia H. Rho, Shannon Zejiang Shen, Pao Siangliulue. 2024. A Design Space for Intelligent and Interactive Writing Assistants. In Proceedings
of the CHI Conference on Human Factors in Computing Systems (CHI ’24),
May 11–16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA, 33 pages.
https://doi.org/10.1145/3613904.3642697

Antonette Shibani, Simon Knight, Kirsty Kitto, Ajanie Karunanayake, Simon Buckingham Shum (2024). Untangling Critical Interaction with AI in Students’ Written Assessment. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI ’24), May 11-16, 2024, Honolulu, HI, USA. doi.org/10.1145/3613905.3651083

Antonette Shibani & Simon Buckingham Shum (2024). AI-Assisted Writing in Education: Ecosystem Risks and Mitigations. In The Third Workshop on Intelligent and Interactive Writing Assistants @ CHI ’24, Honolulu, HI, USA. https://arxiv.org/abs/2404.10281

Tamil Co-Writer: Inclusive AI for writing support

Next week, I’m presenting my work in the First workshop on
Generative AI for Learning Analytics (GenAI-LA) at the 14th International Conference on Learning Analytics and Knowledge LAK 2024:

Antonette Shibani, Faerie Mattins, Srivarshan Selvaraj, Ratnavel Rajalakshmi & Gnana Bharathy (2024) Tamil Co-Writer: Towards inclusive use of generative AI for writing support. In Joint Proceedings of LAK 2024 Workshops, co-located with 14th International Conference on Learning Analytics and Knowledge (LAK 2024), Kyoto, Japan, March 18-22, 2024.

With colleagues in India, we developed Tamil Co-Writer, a GenAI-supported writing tool that offers AI suggestions for writing in the regional Indian language Tamil (which is my first language). The majority of AI-based writing assistants are created for English language users and do not address the needs of linguistically diverse groups of learners. Catering to languages typically under-represented in NLP is important in the generative AI era for the inclusive use of AI for learner support. Combined with analytics on AI usage, the tool can offer writers improved productivity and a chance to reflect on their optimal/sub-optimal collaborations with AI.

The tool combined the following elements:

  1. An interactive AI writing environment that offers several input modes to write in Tamil
  2. Analytics of writer’s AI interaction in the session for reflection (See post on CoAuthorViz for details, and related paper here)

A short video summarising the key insights from the paper is below:

Understanding human-AI collaboration in writing (CoAuthorViz)

Generative AI (GenAI) has captured global attention since ChatGPT was publicly released in November 2022. The remarkable capabilities of AI have sparked a myriad of discussions around its vast potential, ethical considerations, and transformative impact across diverse sectors, including education. In particular, how humans can learn to work with AI to augment their intelligence rather than undermine it greatly interests many communities.

My own interest in writing research led me to explore human-AI partnerships for writing. We are not very far from using generative AI technologies in everyday writing when co-pilots become the norm rather than an exception. It is possible that a ubiquitous tool like Microsoft Word that many use as their preferred platform for digital writing comes with AI support as an essential feature (and early research shows how people are imagining these) for improved productivity. But at what cost?

In our recent full paper, we explored an analytic approach to study writers’ support seeking behaviour and dependence on AI in a co-writing environment:

Antonette Shibani, Ratnavel Rajalakshmi, Srivarshan Selvaraj, Faerie Mattins, Simon Knight (2023). Visual representation of co-authorship with GPT-3: Studying human-machine interaction for effective writing. In M. Feng, T. K¨aser, and P. Talukdar, editors, Proceedings of the 16th International Conference on Educational Data Mining, pages 183–193, Bengaluru, India, July 2023. International Educational Data Mining Society [PDF].

Using keystroke data from the interactive writing environment CoAuthor powered by GPT-3, we developed CoAuthorViz (See example figure below) to characterize writer interaction with AI feedback. ‘CoAuthorViz’ captured key constructs such as the writer incorporating a GPT-3 suggested text as is (GPT-3 suggestion selection), the writer not incorporating a GPT-3 suggestion
(Empty GPT-3 call), the writer modifying the suggested text (GPT-3 suggestion modification), and the writer’s own writing (user text addition). We demonstrated how such visualizations (and associated metrics) help characterise varied levels of AI interaction in writing from low to high dependency on AI.

Figure: CoAuthorViz legend and three samples of AI-assisted writing (squares denote writer written text, and triangles denote AI suggested text)

Full details of the work can be found in the resources below:

Several complex questions are yet to be answered:

  • Is autonomy (self-writing, without AI support) preferable to better quality writing (with AI support)?
  • As AI becomes embedded into our everyday writing, do we lose our own writing skills? And if so, is that of concern, or will writing become one of those outdated skills in the future that AI can do much better than humans?
  • Do we lose our ‘uniquely human’ attributes if we continue to write with AI?
  • What is an acceptable use of AI in writing that still lets you think? (We know by writing we think more clearly; would an AI tool providing the first draft restrict our thinking?)
  • What knowledge and skills do writers need to use AI tools appropriately?

Edit: If you want to delve into the topic further, here’s an intriguing article that imagines how writing might look in the future: https://simon.buckinghamshum.net/2023/03/the-writing-synth-hypothesis/

Automated Writing Feedback in AcaWriter

You might be familiar with my research in the field of Writing Analytics, particularly Automated Writing Feedback during my PhD and beyond. The work is based off an automated feedback tool called AcaWriter (previously called Automated Writing Analytics/ AWA) which we developed at the Connected Intelligence Centre, University of Technology Sydney.

Recently we have come up with resources to spread the word and introduce the tool to anyone who wants to learn more. First is an introductory blog post I wrote for the Society for Learning Analytics Research (SoLAR) Nexus publication. You can access the full blog post here: https://www.solaresearch.org/2020/11/acawriter-designing-automated-feedback-on-writing-that-teachers-and-students-trust/

We also ran a 2 hour long workshop online as part of a LALN event to add more detail and resources for others to participate. Details are here: http://wa.utscic.edu.au/events/laln-2020-workshop/

Video recording from the event is available for replay:

Learn more: https://cic.uts.edu.au/tools/awa/

Automated Revision Graphs – AIED 2020

I’ve recently had my writing analytics work published at the 21st international conference on artificial intelligence in education (AIED 2020) where the theme was “Augmented Intelligence to Empower Education”. It is a short paper describing a text analysis and visualisation method to study revisions. It introduced ‘Automated Revision Graphs’ to study revisions in short texts at a sentence level by visualising text as graph, with open source code.

Shibani A. (2020) Constructing Automated Revision Graphs: A Novel Visualization Technique to Study Student Writing. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12164. Springer, Cham. [pdf] https://doi.org/10.1007/978-3-030-52240-7_52

I did a short introductory video for the conference, which can be viewed below:

I also had another paper I co-authored on multi-modal learning analytics lead by Roberto Martinez, which received the best paper award in the conference. The main contribution of the paper is a set of conceptual mappings from x-y positional data (captured from sensors) to meaningful measurable constructs in physical classroom movements, grounded in the theory of Spatial Pedagogy. Great effort by the team!

Details of the second paper can be found here:

Martinez-Maldonado R., Echeverria V., Schulte J., Shibani A., Mangaroska K., Buckingham Shum S. (2020) Moodoo: Indoor Positioning Analytics for Characterising Classroom Teaching. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham. [pdf] https://doi.org/10.1007/978-3-030-52237-7_29

New Research Publications in Learning Analytics

Three of my journal articles got published recently, two on learning analytics/ writing analytics implementations [Learning Analytics Special Issue in The Internet and Higher Education journal], and one on a text analysis method [Educational Technology Research and Development journal]. that I worked on earlier (many years ago in fact, which just got published!).

Article 1: Educator Perspectives on Learning Analytics in Classroom Practice

The first one is predominantly qualitative in nature, based on instructor interviews of their experiences in using Learning Analytics tools such as the automated Writing feedback tool AcaWriter. It provides a practical account of implementing learning analytics in authentic classroom practice from the voices of educators. Details below:

Abstract: Failing to understand the perspectives of educators, and the constraints under which they work, is a hallmark of many educational technology innovations’ failure to achieve usage in authentic contexts, and sustained adoption. Learning Analytics (LA) is no exception, and there are increasingly recognised policy and implementation challenges in higher education for educators to integrate LA into their teaching. This paper contributes a detailed analysis of interviews with educators who introduced an automated writing feedback tool in their classrooms (triangulated with student and tutor survey data), over the course of a three-year collaboration with researchers, spanning six semesters’ teaching. It explains educators’ motivations, implementation strategies, outcomes, and challenges when using LA in authentic practice. The paper foregrounds the views of educators to support cross-fertilization between LA research and practice, and discusses the importance of cultivating educators’ and students’ agency when introducing novel, student-facing LA tools.

Keywords: learning analytics; writing analytics; participatory research; design research; implementation; educator

Citation and article link: Antonette Shibani, Simon Knight and Simon Buckingham Shum (2020). Educator Perspectives on Learning Analytics in Classroom Practice [Author manuscript]. The Internet and Higher Education. https://doi.org/10.1016/j.iheduc.2020.100730. [Publisher’s free download link valid until 8 May 2020].

Article 2: Implementing Learning Analytics for Learning Impact: Taking Tools to Task

The second one led by Simon Knight provides a broader framing for how we define impact in learning analytics. It defines a model addressing the key challenges in LA implementations based on our writing analytics example. Details below:

Abstract: Learning analytics has the potential to impact student learning, at scale. Embedded in that claim are a set of assumptions and tensions around the nature of scale, impact on student learning, and the scope of infrastructure encompassed by ‘learning analytics’ as a socio-technical field. Drawing on our design experience of developing learning analytics and inducting others into its use, we present a model that we have used to address five key challenges we have encountered. In developing this model, we recommend: A focus on impact on learning through augmentation of existing practice; the centrality of tasks in implementing learning analytics for impact on learning; the commensurate centrality of learning in evaluating learning analytics; inclusion of co-design approaches in implementing learning analytics across sites; and an attention to both social and technical infrastructure.

Keywords: learning analytics, implementation, educational technology, learning design

Citation and article link:  Simon Knight, Andrew Gibson and Antonette Shibani (2020). Implementing Learning Analytics for Learning Impact: Taking Tools to Task. The Internet and Higher Education. https://doi.org/10.1016/j.iheduc.2020.100729.

Article 3: Identifying patterns in students’ scientific argumentation: content analysis through text mining using LDA

The third one led by Wanli Xing discusses the use of Latent Dirichlet Allocation, a text mining method to study argumentation patterns in student writing (in an unsupervised way). Details below:

Abstract: Constructing scientific arguments is an important practice for students because it helps them to make sense of data using scientific knowledge and within the conceptual and experimental boundaries of an investigation. In this study, we used a text mining method called Latent Dirichlet Allocation (LDA) to identify underlying patterns in students written scientific arguments about a complex scientific phenomenon called Albedo Effect. We further examined how identified patterns compare to existing frameworks related to explaining evidence to support claims and attributing sources of uncertainty. LDA was applied to electronically stored arguments written by 2472 students and concerning how decreases in sea ice affect global temperatures. The results indicated that each content topic identified in the explanations by the LDA— “data only,” “reasoning only,” “data and reasoning combined,” “wrong reasoning types,” and “restatement of the claim”—could be interpreted using the claim–evidence–reasoning framework. Similarly, each topic identified in the students’ uncertainty attributions— “self-evaluations,” “personal sources related to knowledge and experience,” and “scientific sources related to reasoning and data”—could be interpreted using the taxonomy of uncertainty attribution. These results indicate that LDA can serve as a tool for content analysis that can discover semantic patterns in students’ scientific argumentation in particular science domains and facilitate teachers’ providing help to students.

Keywords: text mining, latent dirichlet allocation, educational data mining, scientific argumentation

Citation and article link:  Wanli Xing, Hee-Sun Lee and Antonette Shibani (2020). Identifying patterns in students’ scientific argumentation: content analysis through text mining using Latent Dirichlet Allocation. Educational Technology Research and Development. https://doi.org/10.1007/s11423-020-09761-w.

Working with Jupyter notebooks #code

Jupyter is an open source program that helps you share and run code in many different programming languages. Jupyter notebooks are great to quickly prototype different versions of code, as they are easy to edit and try different outputs. The format of a Jupyter notebook is similar to reports in the form of Markdowns that are usually used in R. It can contain blocks of text, code, equations and results (including visualizations) all in one page. We’ve used Jupyter notebooks to run text analysis workshops in conferences, and the feedback was pretty good.

The Writing Analytics workshop is starting at #LAK18. Jupyter notebooks are being used. #great pic.twitter.com/56Zd66ku9L

I find that Jupyter notebooks are great for sharing code and results across different people, and if you’re hosting it, it saves a lot of trouble in organising a workshop where you want participants to install software. It works well for non-technical audience too, since they can choose to ignore what’s inside the code block by simply running it and focus on the results block. They are quite popular now for data science experiments, so this post will be a good place to start to know and use them. You can use an already available notebook (if you’ve downloaded one from Github) and play with it, or create your own Jupyter notebook from scratch. This post will guide you to create your own notebook from scratch demonstrating some basic text analysis in Python.

Installing Jupyter

If you want to try a Jupyter notebook first without installing anything, you can do so in this notebook hosted in the official Jupyter site. If you want to install your own copy of Jupyter running in your machine to develop code, then use one of the two options below:

  • If you are new to Python programming, and don’t have python installed in your machine, the easiest way to install Jupyter is by downloading the Anaconda distribution. This comes with in-built Python (you can choose either 2.7 or 3.6 version of Python when you download the distribution – the code I’m writing in this post is in 2.7).
  • If you already have Python working in your machine (as I did), the easiest way is to install Jupyter using the pip command as you do for any Python package. Note that if pip and python are already setup in your system path, you can simply use $ pip install jupyter from the command prompt.

Now that Jupyter is installed, type the command below in your anaconda prompt/command prompt to start a Jupyter notebook:

$ jupyter notebook

The Jupyter homepage opens in your default browser at http://localhost:8888, displaying the files present in the current folder like below. You can now create a new Python jupyter notebook by clicking on New -> Python2 (or Python 3 if you have Python version 3). You can move between folders or create a new folder for your Python notebooks. To change the default opening directory, you should first move to the required path using cd in the command prompt, and then type$ jupyter notebookOpen the created notebook, which would look like this:

This cell is a code block by default, which can be changed to a markdown text block from the drop-down list (check the figure above) to add narrative text accompanying the Python code. Now name your notebook, and try adding both a code block, and markdown block with different levels of text following the sample here:

To execute the blocks, click on the Run button (Alternatively, use Ctrl+Enter in Windows – Keyboard shortcuts can be found in Help -> Keyboard shortcuts). This renders the output of your code and your markdown text like this:

That’s it. You have a simple Jupyter notebook running on your machine. Now to try a bit more, here’s the sample code you can download and run to do some basic text analysis. I’ve defined three steps in this code: Importing required packages, defining input text, and analysis. Before importing the packages/ libraries you need in step 1 however, they should be first installed in your machine. This can be done using the Pip command in the command prompt/anaconda prompt like this:  $ pip install wordcloud (If you run into problems with that, the other option is to download an appropriate version of the package’s wheel from here and install it using $pip install C:/some-dir/some-file.whl).

Python code for the three steps is below:

#Step 1 - Importing libraries
 
from wordcloud import WordCloud, STOPWORDS  #For word cloud generation
import matplotlib.pyplot as plt             #For displaying figures
import re                          #Regular expresions for string operations


#Step 2 - Defining input text

inputtext = "A cockatoo is a parrot that is any of the 21 species belonging to the bird family Cacatuidae, the only family in the superfamily Cacatuoidea. Along with the Psittacoidea (true parrots) and the Strigopoidea (large New Zealand parrots), they make up the order Psittaciformes (parrots). The family has a mainly Australasian distribution, ranging from the Philippines and the eastern Indonesian islands of Wallacea to New Guinea, the Solomon Islands and Australia. Cockatoos are recognisable by the showy crests and curved bills. Their plumage is generally less colourful than that of other parrots, being mainly white, grey or black and often with coloured features in the crest, cheeks or tail. On average they are larger than other parrots; however, the cockatiel, the smallest cockatoo species, is a small bird. The phylogenetic position of the cockatiel remains unresolved, other than that it is one of the earliest offshoots of the cockatoo lineage. The remaining species are in two main clades. The five large black coloured cockatoos of the genus Calyptorhynchus form one branch. The second and larger branch is formed by the genus Cacatua, comprising 11 species of white-plumaged cockatoos and four monotypic genera that branched off earlier; namely the pink and white Major Mitchell's cockatoo, the pink and grey galah, the mainly grey gang-gang cockatoo and the large black-plumaged palm cockatoo. Cockatoos prefer to eat seeds, tubers, corms, fruit, flowers and insects. They often feed in large flocks, particularly when ground-feeding. Cockatoos are monogamous and nest in tree hollows. Some cockatoo species have been adversely affected by habitat loss, particularly from a shortage of suitable nesting hollows after large mature trees are cleared; conversely, some species have adapted well to human changes and are considered agricultural pests. Cockatoos are popular birds in aviculture, but their needs are difficult to meet. The cockatiel is the easiest cockatoo species to maintain and is by far the most frequently kept in captivity. White cockatoos are more commonly found in captivity than black cockatoos. Illegal trade in wild-caught birds contributes to the decline of some cockatoo species in the wild. Source: https://en.wikipedia.org/wiki/Cockatoo"


print("\nInput text for analysis:\n ")
print(inputtext)


#Step 3 - Analysis

print "Summary statistics of input text:"

wordcount = len(re.findall(r'\w+', inputtext))
print "Wordcount: ", wordcount

charcount = len(inputtext) #including spaces
print "Number of characters: ", charcount

#More options for wordclouds here: https://github.com/amueller/word_cloud
wordcloud = WordCloud(    stopwords=STOPWORDS,
                          background_color='black',
                         ).generate(inputtext)

plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()

The downloadable ipynb file is available on Github.

Other notes:

  • This post is intended for anyone who wants to start working with Jupyter notebooks, and assumes prior understanding of programming in Python. The Jupyter notebook is another environment to easily work with code, but the coding process is still very traditional. If you’re new to Python programming, this website is a good place to start.
  • You can use multiple versions of Python to run Jupyter notebooks by changing its Kernel (the computational engine which executes the code). I have both Python 2 & Python 3 installed, and I switch between them for different programs as needed.
  • While Jupyter notebooks are mainly used to run Python code, they can also be used to run R programs, which requires R kernel to be installed. The blog post below is a useful guide to do that: https://www.datacamp.com/community/blog/jupyter-notebook-r

Tools for automated rhetorical analysis of academic writing

Alert – Long post!

In this post, I’m presenting a summary of my review on tools for automatically analyzing rhetorical structures from academic writing.

The tools considered are designed to cater to different users and purposes. AWA and RWT aim to provide feedback for improving students’ academic writing. Mover and SAPIENTA on the other hand, are to help researchers identify the structure of research articles. ‘Mover’ even allows users to give a second opinion on the classification of moves and add new training data (This can lead to a less accurate model if students with less expertise add potentially wrong training data). However, these tools have a common thread and fulfill the following criteria:

  • They look at scientific text – Full research articles, abstracts or introductions. Tools to automate argumentative zoning of other open text (Example) are not considered.
  • They automate the identification of rhetorical structures (zones, moves) in research articles (RA) with sentence being the unit of analysis.
  • They are broadly based on the Argumentative Zoning scheme by Simone Teufel or the CARS model by John Swales (Either the original schema or modified version of it).

Tools (in alphabetical order):

  1. Academic Writing Analytics (AWA) – Summary notes here

AWA also has a reflective parser to give feedback on students’ reflective writing, but the focus of this post is on the analytical parser. AWA demo, video courtesy of Dr. Simon Knight:

  1. Mover – Summary notes here

Available for download as a stand alone application. Sample screenshot below:

antmover

  1. Research Writing Tutor (RWT) – Summary notes here

RWT demo, video courtesy of Dr. Elena Cotos:

  1. SAPIENTA – Summary notes here.

Available for download as a stand alone java application or can be accessed as a web service. Sample screenshot of tagged output from SAPIENTA web service below:

sapienta-outputAnnotation Scheme:

The general aim of the schemes used is to be applicable to all academic writing and this has been successfully tested across data from different disciplines. A comparison of the schemes used by the tools is shown in the below table:

ToolSource & DescriptionAnnotation Scheme
AWAAWA Analytical scheme (Modified from AZ for sentence level parsing)-Summarizing
-Background knowledge
-Contrasting ideas
-Novelty
-Significance
-Surprise
-Open question
-Generalizing
Mover Modified CARS model
-three main moves and further steps
1. Establish a territory
-Claim centrality
-Generalize topics
-Review previous research
2. Establish a niche
-Counter claim
-Indicate a gap
-Raise questions
-Continue a tradition
3. Occupy the niche
-Outline purpose
-Announce research
-Announce findings
-Evaluate research
-Indicate RA structure
RWTModified CARS model
-3 moves, 17 steps
Move 1. Establishing a territory
-1. Claiming centrality
-2. Making topic generalizations
-3. Reviewing previous research
Move 2. Identifying a niche
-4. Indicating a gap
-5. Highlighting a problem
-6. Raising general questions
-7. Proposing general hypotheses
-8. Presenting a justification
Move 3. Addressing the niche
-9. Introducing present research descriptively
-10. Introducing present research purposefully
-11. Presenting research questions
-12. Presenting research hypotheses
-13. Clarifying definitions
-14. Summarizing methods
-15. Announcing principal outcomes
-16. Stating the value of the present research
-17. Outlining the structure of the paper
SAPIENTAfiner grained AZ scheme
-CoreSC scheme with 11 categories in the first layer
-Background (BAC)
-Hypothesis (HYP)
-Motivation (MOT)
-Goal (GOA)
-Object (OBJ)
-Method (MET)
-Model (MOD)
-Experiment (EXP)
-Observation (OBS)
-Result (RES)
-Conclusion (CON)

Method:

The tools are built on different data sets and methods for automating the analysis. Most of them use manually annotated data as a standard for training the model to automatically classify the categories. Details below:

ToolData typeAutomation method
AWAAny research writingNLP rule based - Xerox Incremental Parser (XIP) to annotate rhetorical functions in discourse.
MoverAbstractsSupervised learning - Naïve Bayes classifier with data represented as bag of clusters with location information.
RWTIntroductionsSupervised learning using Support Vector Machine (SVM) with n-dimensional vector representation and n-gram features.
SAPIENTA Full articleSupervised learning using SVM with sentence aspect features and Sequence Labelling using Conditional Random Fields (CRF) for sentence dependencies.

Others:

  • SciPo tool helps students write summaries and introductions for scientific texts in Portuguese.
  • Another tool CARE is a word concordancer used to search for words and moves from research abstracts- Summary notes here.
  • A ML approach considering three different schemes for annotating scientific abstracts (No tool).

If you think I’ve missed a tool which does similar automated tagging in research articles, do let me know so I can include it in my list 🙂

Notes: Discourse classification into rhetorical functions

Reference: Cotos, E., & Pendar, N. (2016). Discourse classification into rhetorical functions for AWE feedback. calico journal, 33(1), 92.

Background:

  • Computational techniques can be exploited to provide individualized feedback to learners on writing.
  • Genre analysis on writing to identify moves (communicative goal) and steps (rhetorical functions to help achieve the goal) [Swales, 1990].
  • Natural language processing (NLP) and machine learning categorization approach are widely used to automatically identify discourse structures (E.g. Mover, prior work on IADE).

Purpose:

  • To develop an automated analysis system ‘Research Writing Tutor‘ (RWT) for identifying rhetorical structures (moves and steps) from research writing and provide feedback to students.

Method:

  • Sentence level analysis – Each sentence classified to a move, step within the move.
  • Data: Introduction section from 1020 articles – 51 disciplines, each discipline containing 20 articles, total of 1,322,089 words.
  • Annotation Scheme:
    • 3 moves, 17 steps – Refer Table 1 from the original paper for detailed annotation scheme (Based on the CARS model).
    • Manual annotation using XML based markup by the Callisto Workbench.
  • Supervised learning approach steps:
    1. Feature selection:
      • Important features – unigrams, trigrams
      • n-gram feature set contained 5,825 unigrams and 11,630 trigrams for moves, and 27,689 unigrams and 27,160 trigrams for steps.
    2. Sentence representation:
      • Each sentence is represented as a n-dimensional vector in the R^n Euclidean space.
      • Boolean representation to indicate presence or absence of feature in sentence.
    3. Training classifier:
      • SVM model for classification.
      • 10-fold cross validation.
      • precision higher than recall – 70.3% versus 61.2% for the move classifier and 68.6% versus 55% for the step classifier – objective is to maximize accuracy.
      • RWT analyzer has two cascaded SVM – move classifier followed by step classifier.

Results:

  • Move and step classifiers predict some elements better than the others (Refer paper for detailed results):
    • Move 2 most difficult to identify (sparse training data).
    • Move 1 gained best recall- less ambiguous cues.
    • 10 out of 17 steps were predicted well.
    • Overall move accuracy of 72.6% and step accuracy of 72.9%.

Future Work:

  • Moving beyond sentence level to incorporate context information and sequence of moves/steps.
  • Knowledge-based approach for hard to identify steps – hand written rules and patterns.
  • Voting algorithm using independent analyzers.

Notes: XIP – Automated rhetorical parsing of scientific metadiscourse

Reference: Simsek, D., Buckingham Shum, S., Sandor, A., De Liddo, A., & Ferguson, R. (2013). XIP Dashboard: visual analytics from automated rhetorical parsing of scientific metadiscourse. In: 1st International Workshop on Discourse-Centric Learning Analytics, 8 Apr 2013, Leuven, Belgium.

Background:

Learners should have the ability to critically evaluate research articles and be able to identify the claims and ideas in scientific literature.

Purpose:

  • Automating analysis of research articles to identify evolution of ideas and findings.
  • Describing the Xerox Incremental Parser (XIP) which identifies rhetorically significant structures from research text.
  • Designing a visual analytics dashboard to provide overviews of the student corpus.

Method:

  • Argumentative Zoning (AZ) to annotate moves in research articles by Simone Teufel.
  • Rhetorical moves tagged by XIP – partly overlap and partly different from AZ scheme: SUMMARIZING, BACKGROUND KNOWLEDGE, CONTRASTING IDEAS, NOVELTY, SIGNIFICANCE, SURPRISE, OPEN QUESTION, GENERALIZING
  • Sample discourse moves:
    • Summarizing: “The purpose of this article….”
    • Contrasting ideas: “With an absence of detailed work…”
      • Sub-classes: novelty, surprise, importance, emerging issue, open question
  • XIP outputs a raw output file containing semantic tags and concepts extracted from text.
  • Data: Papers from LAK & EDM conferences and journal – 66 LAK and 239 EDM papers extracting 7847 sentences and 40163 concepts.
  • Dashboard design – Refer original paper to see the process involved in prototyping the visualizations.

Tool:

  • XIP is now embedded in the Academic Writing Analytics (AWA) tool by UTS. AWA provides analytical and reflective reports on students’ writing.