Understanding human-AI collaboration in writing (CoAuthorViz)

Generative AI (GenAI) has captured global attention since ChatGPT was publicly released in November 2022. The remarkable capabilities of AI have sparked a myriad of discussions around its vast potential, ethical considerations, and transformative impact across diverse sectors, including education. In particular, how humans can learn to work with AI to augment their intelligence rather than undermine it greatly interests many communities.

My own interest in writing research led me to explore human-AI partnerships for writing. We are not very far from using generative AI technologies in everyday writing when co-pilots become the norm rather than an exception. It is possible that a ubiquitous tool like Microsoft Word that many use as their preferred platform for digital writing comes with AI support as an essential feature (and early research shows how people are imagining these) for improved productivity. But at what cost?

In our recent full paper, we explored an analytic approach to study writers’ support seeking behaviour and dependence on AI in a co-writing environment:

Antonette Shibani, Ratnavel Rajalakshmi, Srivarshan Selvaraj, Faerie Mattins, Simon Knight (2023). Visual representation of co-authorship with GPT-3: Studying human-machine interaction for effective writing. In M. Feng, T. K¨aser, and P. Talukdar, editors, Proceedings of the 16th International Conference on Educational Data Mining, pages 183–193, Bengaluru, India, July 2023. International Educational Data Mining Society [PDF].

Using keystroke data from the interactive writing environment CoAuthor powered by GPT-3, we developed CoAuthorViz (See example figure below) to characterize writer interaction with AI feedback. ‘CoAuthorViz’ captured key constructs such as the writer incorporating a GPT-3 suggested text as is (GPT-3 suggestion selection), the writer not incorporating a GPT-3 suggestion
(Empty GPT-3 call), the writer modifying the suggested text (GPT-3 suggestion modification), and the writer’s own writing (user text addition). We demonstrated how such visualizations (and associated metrics) help characterise varied levels of AI interaction in writing from low to high dependency on AI.

Figure: CoAuthorViz legend and three samples of AI-assisted writing (squares denote writer written text, and triangles denote AI suggested text)

Full details of the work can be found in the resources below:

Several complex questions are yet to be answered:

  • Is autonomy (self-writing, without AI support) preferable to better quality writing (with AI support)?
  • As AI becomes embedded into our everyday writing, do we lose our own writing skills? And if so, is that of concern, or will writing become one of those outdated skills in the future that AI can do much better than humans?
  • Do we lose our ‘uniquely human’ attributes if we continue to write with AI?
  • What is an acceptable use of AI in writing that still lets you think? (We know by writing we think more clearly; would an AI tool providing the first draft restrict our thinking?)
  • What knowledge and skills do writers need to use AI tools appropriately?

Edit: If you want to delve into the topic further, here’s an intriguing article that imagines how writing might look in the future: https://simon.buckinghamshum.net/2023/03/the-writing-synth-hypothesis/

Tools for automated rhetorical analysis of academic writing

Alert – Long post!

In this post, I’m presenting a summary of my review on tools for automatically analyzing rhetorical structures from academic writing.

The tools considered are designed to cater to different users and purposes. AWA and RWT aim to provide feedback for improving students’ academic writing. Mover and SAPIENTA on the other hand, are to help researchers identify the structure of research articles. ‘Mover’ even allows users to give a second opinion on the classification of moves and add new training data (This can lead to a less accurate model if students with less expertise add potentially wrong training data). However, these tools have a common thread and fulfill the following criteria:

  • They look at scientific text – Full research articles, abstracts or introductions. Tools to automate argumentative zoning of other open text (Example) are not considered.
  • They automate the identification of rhetorical structures (zones, moves) in research articles (RA) with sentence being the unit of analysis.
  • They are broadly based on the Argumentative Zoning scheme by Simone Teufel or the CARS model by John Swales (Either the original schema or modified version of it).

Tools (in alphabetical order):

  1. Academic Writing Analytics (AWA) – Summary notes here

AWA also has a reflective parser to give feedback on students’ reflective writing, but the focus of this post is on the analytical parser. AWA demo, video courtesy of Dr. Simon Knight:

  1. Mover – Summary notes here

Available for download as a stand alone application. Sample screenshot below:

antmover

  1. Research Writing Tutor (RWT) – Summary notes here

RWT demo, video courtesy of Dr. Elena Cotos:

  1. SAPIENTA – Summary notes here.

Available for download as a stand alone java application or can be accessed as a web service. Sample screenshot of tagged output from SAPIENTA web service below:

sapienta-outputAnnotation Scheme:

The general aim of the schemes used is to be applicable to all academic writing and this has been successfully tested across data from different disciplines. A comparison of the schemes used by the tools is shown in the below table:

ToolSource & DescriptionAnnotation Scheme
AWAAWA Analytical scheme (Modified from AZ for sentence level parsing)-Summarizing
-Background knowledge
-Contrasting ideas
-Novelty
-Significance
-Surprise
-Open question
-Generalizing
Mover Modified CARS model
-three main moves and further steps
1. Establish a territory
-Claim centrality
-Generalize topics
-Review previous research
2. Establish a niche
-Counter claim
-Indicate a gap
-Raise questions
-Continue a tradition
3. Occupy the niche
-Outline purpose
-Announce research
-Announce findings
-Evaluate research
-Indicate RA structure
RWTModified CARS model
-3 moves, 17 steps
Move 1. Establishing a territory
-1. Claiming centrality
-2. Making topic generalizations
-3. Reviewing previous research
Move 2. Identifying a niche
-4. Indicating a gap
-5. Highlighting a problem
-6. Raising general questions
-7. Proposing general hypotheses
-8. Presenting a justification
Move 3. Addressing the niche
-9. Introducing present research descriptively
-10. Introducing present research purposefully
-11. Presenting research questions
-12. Presenting research hypotheses
-13. Clarifying definitions
-14. Summarizing methods
-15. Announcing principal outcomes
-16. Stating the value of the present research
-17. Outlining the structure of the paper
SAPIENTAfiner grained AZ scheme
-CoreSC scheme with 11 categories in the first layer
-Background (BAC)
-Hypothesis (HYP)
-Motivation (MOT)
-Goal (GOA)
-Object (OBJ)
-Method (MET)
-Model (MOD)
-Experiment (EXP)
-Observation (OBS)
-Result (RES)
-Conclusion (CON)

Method:

The tools are built on different data sets and methods for automating the analysis. Most of them use manually annotated data as a standard for training the model to automatically classify the categories. Details below:

ToolData typeAutomation method
AWAAny research writingNLP rule based - Xerox Incremental Parser (XIP) to annotate rhetorical functions in discourse.
MoverAbstractsSupervised learning - Naïve Bayes classifier with data represented as bag of clusters with location information.
RWTIntroductionsSupervised learning using Support Vector Machine (SVM) with n-dimensional vector representation and n-gram features.
SAPIENTA Full articleSupervised learning using SVM with sentence aspect features and Sequence Labelling using Conditional Random Fields (CRF) for sentence dependencies.

Others:

  • SciPo tool helps students write summaries and introductions for scientific texts in Portuguese.
  • Another tool CARE is a word concordancer used to search for words and moves from research abstracts- Summary notes here.
  • A ML approach considering three different schemes for annotating scientific abstracts (No tool).

If you think I’ve missed a tool which does similar automated tagging in research articles, do let me know so I can include it in my list 🙂

Notes: Computational analysis of move structures in academic abstracts

Reference:

Wu, J. C., Chang, Y. C., Liou, H. C., & Chang, J. S. (2006, July). Computational analysis of move structures in academic abstracts. In Proceedings of the COLING/ACL on Interactive presentation sessions (pp. 41-44). Association for Computational Linguistics.

Background:

  • Swales pattern for research articles: Introduction, Methods, Results, Discussion (IMRD) and Creating a Research Space (CARS) model.
  • Studying the rhetorical structure of tests is found to be useful to aid reading and writing (Mover tool notes here).

Purpose:

  • To automatically analyze move structures (Background, Purpose, Method, Result, and Conclusion) from research article abstracts.
  • To develop an online learning system CARE (Concordancer for Academic wRiting in English) using move structures to help novice writers.

Method:

  • Processes involved:

care-system

  • TANGO Concordancer used for extracting collocations with chunking and clause information – Sample  Verb-Noun collocation structures in corpus: VP+NP, VP+PP+NP, and VP+NP+PP (Ref: Jian, J. Y., Chang, Y. C., & Chang, J. S. (2004, July). TANGO: Bilingual collocational concordancer. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions (p. 19). Association for Computational Linguistics.)
    • TANGO Tool accessible here.
  • Data: Corpus of 20,306 abstracts (95,960 sentences) from Citeseer. Manual tagging of moves in 106 abstracts containing 709 sentences. 72,708 collocation types extracted and manually tagged 317 collocations with moves.
  • Hidden Markov Model (HMM) trained using 115 abstracts containing 684 sentences.
  • Different parameters evaluated for the HMM model: “the frequency of collocation types, the number of sentences with collocation in each abstract, move sequence score and collocation score”

Results:

  • Precision of 80.54% achieved when 627 sentences were qualified with following parameters: weight of transitional probability function 0.7 , frequency threshold for a collocation to be applicable – 18 (crucial to exclude unreliable collocation).

Conclusion:

  • CARE system interface created for querying and looking up sentences for a specific move.
  • System is expected to help non native speakers write abstracts for research articles.

Notes: Mover – a Machine Learning tool to analyze technical research papers

Reference: Anthony, L., & Lashkia, G. V. (2003). Mover: A machine learning tool to assist in the reading and writing of technical papers. IEEE transactions on professional communication, 46(3), 185-193.

Background:

  • Identifying the structure of text helps in reading and writing research articles.
  • The structure of research article introductions in terms of moves is explained in the CARS model (Ref: J. M. Swales, “Aspects of Article Introductions,” Univ. Aston, Language Studies Unit, Birmingham, UK, Res. Rep. No. 1, 1981.).

Problem:

  • Identifying the moves in a particular type of article.
  • Time-consuming identification of moves by raters (manual annotation) with no immediate feedback.

Purpose:

  • To provide immediate feedback on move structures in the given text.

Method:

  • Using supervised learning to identify moves from 100 IT research article (RA) abstracts.
  • Machine readable abstracts were further pre-processed with subroutines to remove irrelevant characters from raw text.
  • Data labelled based on the modified CARS model which had three main moves with further steps under each move as below: (Ref: L. Anthony, “Writing research article introductions in software engineering: How accurate is a standard model?,” IEEE Trans. Prof. Commun., vol. 42, pp. 38–46, Mar. 1999.)
    1. Establish a territory
      1. Claim centrality
      2. Generalize topics
      3. Review previous research
    2. Establish a niche
      1. Counter claim
      2. Indicate a gap
      3. Raise questions
      4. Continue a tradition
    3. Occupy the niche
      1. Outline purpose
      2. Announce research
      3. Announce findings
      4. Evaluate research
      5. Indicate RA structure

 

  • Supervised Learning System – Implementation details:
    • Bag of clusters representation was implemented:
      • Dividing input text into clusters of 1-5 word tokens to capture key phrases and discourse markers as features.
      • Bag of words model does not consider word order and semantics by splitting input text into word tokens – not useful in the discourse level.
      • E.g. “Once upon a time there were three bears” clusters –> “once”, “upon”, “once upon”, “once upon a time”
      • Not useful clusters (noise) removal using statistical measures – Information Gain (IG) scores used to remove clusters below threshold.
      • ‘Location’ feature added to take note of preceding and later sentences – position index of sentence  in the abstract.
        • Additional training feature for the classifier – probability of common structural step groupings.
    • Naive Bayes learning classifier outperformed other models.
    • Tool available for download as AntMover.

Results:

  • Evaluation of Mover:
    • Training: 554 examples, Test: 138 examples
    • Five fold cross validation, Average accuracy: 68%
    • Classes (steps within the structural moves) with few examples had lower accuracy. Incorrectly classified steps were mostly from the same move (note similarity among 3.1, 3.2. 3.3)
    • Features to improve accuracy:
      • When two most likely decisions are used (instead of predicting only one class) using the Naive Bayes probabilities, accuracy increased to 86%.
      • Flow optimization effectiveness improved accuracy by 2%.
      • Manual correction of steps adding new training data (second opinion of students on the moves classified by the system used for retraining the model).

Discussion:

  • Based on two practical applications in classroom, the usage of ‘Mover’ assisted students to
    • identify unnoticed moves in manual analysis.
    • analyze moves much faster than manual analysis.
    • better understand own writing and prevent distorted views.
  • Implications:
    • Important vocabulary can be identified for teaching from the ordered cluster of words.
    • Trained examples can be used as exemplars.
    • Aid for immediate analysis of text structure.
  • Future Work:
    • Increasing the accuracy of Mover.
    • Expanding to more fields – currently implemented for engineering and science text types.