Automated Revision Graphs – AIED 2020

I’ve recently had my writing analytics work published at the 21st international conference on artificial intelligence in education (AIED 2020) where the theme was “Augmented Intelligence to Empower Education”. It is a short paper describing a text analysis and visualisation method to study revisions. It introduced ‘Automated Revision Graphs’ to study revisions in short texts at a sentence level by visualising text as graph, with open source code.

Shibani A. (2020) Constructing Automated Revision Graphs: A Novel Visualization Technique to Study Student Writing. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12164. Springer, Cham. [pdf] https://doi.org/10.1007/978-3-030-52240-7_52

I did a short introductory video for the conference, which can be viewed below:

I also had another paper I co-authored on multi-modal learning analytics lead by Roberto Martinez, which received the best paper award in the conference. The main contribution of the paper is a set of conceptual mappings from x-y positional data (captured from sensors) to meaningful measurable constructs in physical classroom movements, grounded in the theory of Spatial Pedagogy. Great effort by the team!

Details of the second paper can be found here:

Martinez-Maldonado R., Echeverria V., Schulte J., Shibani A., Mangaroska K., Buckingham Shum S. (2020) Moodoo: Indoor Positioning Analytics for Characterising Classroom Teaching. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham. [pdf] https://doi.org/10.1007/978-3-030-52237-7_29

Working with Jupyter notebooks #code

Jupyter is an open source program that helps you share and run code in many different programming languages. Jupyter notebooks are great to quickly prototype different versions of code, as they are easy to edit and try different outputs. The format of a Jupyter notebook is similar to reports in the form of Markdowns that are usually used in R. It can contain blocks of text, code, equations and results (including visualizations) all in one page. We’ve used Jupyter notebooks to run text analysis workshops in conferences, and the feedback was pretty good.

The Writing Analytics workshop is starting at #LAK18. Jupyter notebooks are being used. #great pic.twitter.com/56Zd66ku9L

I find that Jupyter notebooks are great for sharing code and results across different people, and if you’re hosting it, it saves a lot of trouble in organising a workshop where you want participants to install software. It works well for non-technical audience too, since they can choose to ignore what’s inside the code block by simply running it and focus on the results block. They are quite popular now for data science experiments, so this post will be a good place to start to know and use them. You can use an already available notebook (if you’ve downloaded one from Github) and play with it, or create your own Jupyter notebook from scratch. This post will guide you to create your own notebook from scratch demonstrating some basic text analysis in Python.

Installing Jupyter

If you want to try a Jupyter notebook first without installing anything, you can do so in this notebook hosted in the official Jupyter site. If you want to install your own copy of Jupyter running in your machine to develop code, then use one of the two options below:

  • If you are new to Python programming, and don’t have python installed in your machine, the easiest way to install Jupyter is by downloading the Anaconda distribution. This comes with in-built Python (you can choose either 2.7 or 3.6 version of Python when you download the distribution – the code I’m writing in this post is in 2.7).
  • If you already have Python working in your machine (as I did), the easiest way is to install Jupyter using the pip command as you do for any Python package. Note that if pip and python are already setup in your system path, you can simply use $ pip install jupyter from the command prompt.

Now that Jupyter is installed, type the command below in your anaconda prompt/command prompt to start a Jupyter notebook:

$ jupyter notebook

The Jupyter homepage opens in your default browser at http://localhost:8888, displaying the files present in the current folder like below. You can now create a new Python jupyter notebook by clicking on New -> Python2 (or Python 3 if you have Python version 3). You can move between folders or create a new folder for your Python notebooks. To change the default opening directory, you should first move to the required path using cd in the command prompt, and then type$ jupyter notebookOpen the created notebook, which would look like this:

This cell is a code block by default, which can be changed to a markdown text block from the drop-down list (check the figure above) to add narrative text accompanying the Python code. Now name your notebook, and try adding both a code block, and markdown block with different levels of text following the sample here:

To execute the blocks, click on the Run button (Alternatively, use Ctrl+Enter in Windows – Keyboard shortcuts can be found in Help -> Keyboard shortcuts). This renders the output of your code and your markdown text like this:

That’s it. You have a simple Jupyter notebook running on your machine. Now to try a bit more, here’s the sample code you can download and run to do some basic text analysis. I’ve defined three steps in this code: Importing required packages, defining input text, and analysis. Before importing the packages/ libraries you need in step 1 however, they should be first installed in your machine. This can be done using the Pip command in the command prompt/anaconda prompt like this:  $ pip install wordcloud (If you run into problems with that, the other option is to download an appropriate version of the package’s wheel from here and install it using $pip install C:/some-dir/some-file.whl).

Python code for the three steps is below:

#Step 1 - Importing libraries
 
from wordcloud import WordCloud, STOPWORDS  #For word cloud generation
import matplotlib.pyplot as plt             #For displaying figures
import re                          #Regular expresions for string operations


#Step 2 - Defining input text

inputtext = "A cockatoo is a parrot that is any of the 21 species belonging to the bird family Cacatuidae, the only family in the superfamily Cacatuoidea. Along with the Psittacoidea (true parrots) and the Strigopoidea (large New Zealand parrots), they make up the order Psittaciformes (parrots). The family has a mainly Australasian distribution, ranging from the Philippines and the eastern Indonesian islands of Wallacea to New Guinea, the Solomon Islands and Australia. Cockatoos are recognisable by the showy crests and curved bills. Their plumage is generally less colourful than that of other parrots, being mainly white, grey or black and often with coloured features in the crest, cheeks or tail. On average they are larger than other parrots; however, the cockatiel, the smallest cockatoo species, is a small bird. The phylogenetic position of the cockatiel remains unresolved, other than that it is one of the earliest offshoots of the cockatoo lineage. The remaining species are in two main clades. The five large black coloured cockatoos of the genus Calyptorhynchus form one branch. The second and larger branch is formed by the genus Cacatua, comprising 11 species of white-plumaged cockatoos and four monotypic genera that branched off earlier; namely the pink and white Major Mitchell's cockatoo, the pink and grey galah, the mainly grey gang-gang cockatoo and the large black-plumaged palm cockatoo. Cockatoos prefer to eat seeds, tubers, corms, fruit, flowers and insects. They often feed in large flocks, particularly when ground-feeding. Cockatoos are monogamous and nest in tree hollows. Some cockatoo species have been adversely affected by habitat loss, particularly from a shortage of suitable nesting hollows after large mature trees are cleared; conversely, some species have adapted well to human changes and are considered agricultural pests. Cockatoos are popular birds in aviculture, but their needs are difficult to meet. The cockatiel is the easiest cockatoo species to maintain and is by far the most frequently kept in captivity. White cockatoos are more commonly found in captivity than black cockatoos. Illegal trade in wild-caught birds contributes to the decline of some cockatoo species in the wild. Source: https://en.wikipedia.org/wiki/Cockatoo"


print("\nInput text for analysis:\n ")
print(inputtext)


#Step 3 - Analysis

print "Summary statistics of input text:"

wordcount = len(re.findall(r'\w+', inputtext))
print "Wordcount: ", wordcount

charcount = len(inputtext) #including spaces
print "Number of characters: ", charcount

#More options for wordclouds here: https://github.com/amueller/word_cloud
wordcloud = WordCloud(    stopwords=STOPWORDS,
                          background_color='black',
                         ).generate(inputtext)

plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()

The downloadable ipynb file is available on Github.

Other notes:

  • This post is intended for anyone who wants to start working with Jupyter notebooks, and assumes prior understanding of programming in Python. The Jupyter notebook is another environment to easily work with code, but the coding process is still very traditional. If you’re new to Python programming, this website is a good place to start.
  • You can use multiple versions of Python to run Jupyter notebooks by changing its Kernel (the computational engine which executes the code). I have both Python 2 & Python 3 installed, and I switch between them for different programs as needed.
  • While Jupyter notebooks are mainly used to run Python code, they can also be used to run R programs, which requires R kernel to be installed. The blog post below is a useful guide to do that: https://www.datacamp.com/community/blog/jupyter-notebook-r

Telling stories with data and visualizations – Some key messages

The topic of telling stories from data is huge and probably needs many many hours and books to explain the ideal ways of doing it. But Dr. Roberto Martinez did a great job in giving us a quick introduction to the topic and its pragmatic application in an hour at his talk at the UTS LX lab. It very much aligned with the Connected Intelligence Centre‘s  vision of building staff capacity in data science particularly by keeping human in the center of the data. This post includes my notes from this talk where I summarize some of the key messages.

Humans are producing enormous amounts of data these days. According to recent statistics, 2.5 quintillion bytes of data are created every day and the pace keeps growing. But, there is a stark contrast between data and knowledge – Data by itself means very little, and knowledge is created only when the data is made sense of. We might be drowning in data, but not in knowledge. Roberto compares this abundance of data to oysters and an insight to a pearl. We need to open many oysters to maybe find one pearl.

The rest of the blog is divided into two main sections 1. Data Storytelling, 2. Data visualization, and a few overall key messages that I took away from the talk.

Data Storytelling:

The value of data is not the data itself, but how we present it. This is what makes storytelling really important to present insights from data. It is not about presenting ALL the data we have, but to highlight the main insights from the data that should be noted. It is about finding patterns from the data to make people engaged with the story just like finding hooks in a fictional story. It often operates in conjunction with data visualization to communicate results from data. Check out the list of resources given at the end of this post for detailed reading.

There are a few ways to make the insights clear and pop out when communicating the story from data:

  • The first step is to declutter the data by removing all the noise. This can be done by stripping down all the unwanted information and building up on the useful insights.
  • The next key thing to do is to foreground things that are important. We do not want too much ink/ data that makes the results too complicated to understand.
  • A data story approach can be used merging narrative and visuals together to engage audience and point to key messages from the data (see examples of line graphs annotated this way here). Also check out this interesting article and podcast on the good and bad of storytelling for further reading.

Continue reading “Telling stories with data and visualizations – Some key messages”