Questioning Learning Analytics – Cultivating critical engagement (LAK’22)

Gist of LAK 22 paper

Our full research paper has been nominated for Best Paper at the prestigious Learning Analytics and Knowledge (LAK) Conference:

Antonette Shibani, Simon Knight and Simon Buckingham Shum (2022, Forthcoming). Questioning learning analytics? Cultivating critical engagement as student automated feedback literacy. [BEST RESEARCH PAPER NOMINEE] The 12th International Learning Analytics & Knowledge Conference (LAK ’22).

Here’s the gist of what the paper talks about:

  • Learning Analytics (LA) still requires substantive evidence for outcomes of impact in educational practice. A human-centered approach can bring about better uptake of LA.
  • We need critical engagement and interaction with LA to help tackle issues ranging from black-boxing, imperfect analytics, and the lack of explainability of algorithms and artificial intelligence systems, to the required relevant skills and capabilities of LA users when dealing with such advanced technologies.
  • Students must be able to, and should be encouraged to, question analytics in student-facing LA systems as Critical engagement is a metacognitive capacity that both demonstrates and builds student understanding.
  • This puts the power back to users and empowers them with agency when using LA.
  • Critical engagement with LA should be facilitated with careful design for learning; we provide an example case with automated writing feedback – see the paper for details on what the design involved.
  • We show empirical data and findings from student annotations of automated feedback from AcaWriter, where we want them to develop their automated feedback literacy.

The full paper is available for download at this link: [Author accepted manuscript pdf].

This paper was the hardest for me to write personally since I was running on 2-3 hours of sleep right after joining work part-time following my maternity leave. Super stoked to hear about the best paper nomination, as my work as a new mum paid off. Good to be back at work while also taking care of the little bubba 🙂 Thanks to my co-authors for accommodating my writing request really close to the deadline!

Also, workshops coming up in LAK22:

  • Antonette Shibani, Andrew Gibson, Simon Knight, Philip H Winne, Diane Litman (2022, Forthcoming). Writing Analytics for higher-order thinking skills. Accepted workshop at The 12th International Learning Analytics & Knowledge Conference (LAK ’22).
  • Yi-Shan Tsai, Melanie Peffer, Antonette Shibani, Isabel Hilliger, Bodong Chen, Yizhou Fan, Rogers Kaliisa, Nia Dowell and Simon Knight (2022, Forthcoming). Writing for Publication: Engaging Your Audience. Accepted workshop at The 12th International Learning Analytics & Knowledge Conference (LAK ’22).

Automated Writing Feedback in AcaWriter

You might be familiar with my research in the field of Writing Analytics, particularly Automated Writing Feedback during my PhD and beyond. The work is based off an automated feedback tool called AcaWriter (previously called Automated Writing Analytics/ AWA) which we developed at the Connected Intelligence Centre, University of Technology Sydney.

Recently we have come up with resources to spread the word and introduce the tool to anyone who wants to learn more. First is an introductory blog post I wrote for the Society for Learning Analytics Research (SoLAR) Nexus publication. You can access the full blog post here:

We also ran a 2 hour long workshop online as part of a LALN event to add more detail and resources for others to participate. Details are here:

Video recording from the event is available for replay:

Learn more:

Automated Revision Graphs – AIED 2020

I’ve recently had my writing analytics work published at the 21st international conference on artificial intelligence in education (AIED 2020) where the theme was “Augmented Intelligence to Empower Education”. It is a short paper describing a text analysis and visualisation method to study revisions. It introduced ‘Automated Revision Graphs’ to study revisions in short texts at a sentence level by visualising text as graph, with open source code.

Shibani A. (2020) Constructing Automated Revision Graphs: A Novel Visualization Technique to Study Student Writing. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12164. Springer, Cham. [pdf]

I did a short introductory video for the conference, which can be viewed below:

I also had another paper I co-authored on multi-modal learning analytics lead by Roberto Martinez, which received the best paper award in the conference. The main contribution of the paper is a set of conceptual mappings from x-y positional data (captured from sensors) to meaningful measurable constructs in physical classroom movements, grounded in the theory of Spatial Pedagogy. Great effort by the team!

Details of the second paper can be found here:

Martinez-Maldonado R., Echeverria V., Schulte J., Shibani A., Mangaroska K., Buckingham Shum S. (2020) Moodoo: Indoor Positioning Analytics for Characterising Classroom Teaching. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham. [pdf]

New Research Publications in Learning Analytics

Three of my journal articles got published recently, two on learning analytics/ writing analytics implementations [Learning Analytics Special Issue in The Internet and Higher Education journal], and one on a text analysis method [Educational Technology Research and Development journal]. that I worked on earlier (many years ago in fact, which just got published!).

Article 1: Educator Perspectives on Learning Analytics in Classroom Practice

The first one is predominantly qualitative in nature, based on instructor interviews of their experiences in using Learning Analytics tools such as the automated Writing feedback tool AcaWriter. It provides a practical account of implementing learning analytics in authentic classroom practice from the voices of educators. Details below:

Abstract: Failing to understand the perspectives of educators, and the constraints under which they work, is a hallmark of many educational technology innovations’ failure to achieve usage in authentic contexts, and sustained adoption. Learning Analytics (LA) is no exception, and there are increasingly recognised policy and implementation challenges in higher education for educators to integrate LA into their teaching. This paper contributes a detailed analysis of interviews with educators who introduced an automated writing feedback tool in their classrooms (triangulated with student and tutor survey data), over the course of a three-year collaboration with researchers, spanning six semesters’ teaching. It explains educators’ motivations, implementation strategies, outcomes, and challenges when using LA in authentic practice. The paper foregrounds the views of educators to support cross-fertilization between LA research and practice, and discusses the importance of cultivating educators’ and students’ agency when introducing novel, student-facing LA tools.

Keywords: learning analytics; writing analytics; participatory research; design research; implementation; educator

Citation and article link: Antonette Shibani, Simon Knight and Simon Buckingham Shum (2020). Educator Perspectives on Learning Analytics in Classroom Practice [Author manuscript]. The Internet and Higher Education. [Publisher’s free download link valid until 8 May 2020].

Article 2: Implementing Learning Analytics for Learning Impact: Taking Tools to Task

The second one led by Simon Knight provides a broader framing for how we define impact in learning analytics. It defines a model addressing the key challenges in LA implementations based on our writing analytics example. Details below:

Abstract: Learning analytics has the potential to impact student learning, at scale. Embedded in that claim are a set of assumptions and tensions around the nature of scale, impact on student learning, and the scope of infrastructure encompassed by ‘learning analytics’ as a socio-technical field. Drawing on our design experience of developing learning analytics and inducting others into its use, we present a model that we have used to address five key challenges we have encountered. In developing this model, we recommend: A focus on impact on learning through augmentation of existing practice; the centrality of tasks in implementing learning analytics for impact on learning; the commensurate centrality of learning in evaluating learning analytics; inclusion of co-design approaches in implementing learning analytics across sites; and an attention to both social and technical infrastructure.

Keywords: learning analytics, implementation, educational technology, learning design

Citation and article link:  Simon Knight, Andrew Gibson and Antonette Shibani (2020). Implementing Learning Analytics for Learning Impact: Taking Tools to Task. The Internet and Higher Education.

Article 3: Identifying patterns in students’ scientific argumentation: content analysis through text mining using LDA

The third one led by Wanli Xing discusses the use of Latent Dirichlet Allocation, a text mining method to study argumentation patterns in student writing (in an unsupervised way). Details below:

Abstract: Constructing scientific arguments is an important practice for students because it helps them to make sense of data using scientific knowledge and within the conceptual and experimental boundaries of an investigation. In this study, we used a text mining method called Latent Dirichlet Allocation (LDA) to identify underlying patterns in students written scientific arguments about a complex scientific phenomenon called Albedo Effect. We further examined how identified patterns compare to existing frameworks related to explaining evidence to support claims and attributing sources of uncertainty. LDA was applied to electronically stored arguments written by 2472 students and concerning how decreases in sea ice affect global temperatures. The results indicated that each content topic identified in the explanations by the LDA— “data only,” “reasoning only,” “data and reasoning combined,” “wrong reasoning types,” and “restatement of the claim”—could be interpreted using the claim–evidence–reasoning framework. Similarly, each topic identified in the students’ uncertainty attributions— “self-evaluations,” “personal sources related to knowledge and experience,” and “scientific sources related to reasoning and data”—could be interpreted using the taxonomy of uncertainty attribution. These results indicate that LDA can serve as a tool for content analysis that can discover semantic patterns in students’ scientific argumentation in particular science domains and facilitate teachers’ providing help to students.

Keywords: text mining, latent dirichlet allocation, educational data mining, scientific argumentation

Citation and article link:  Wanli Xing, Hee-Sun Lee and Antonette Shibani (2020). Identifying patterns in students’ scientific argumentation: content analysis through text mining using Latent Dirichlet Allocation. Educational Technology Research and Development.

2019 Year in review

Welcome 2020! A new year is the perfect time to reflect on the past year, so I wanted to take a step back and think about it. 2019 was one of the most successful years for me professionally (and personally) with a range of experiences and productive outcomes. Quite a few achievements I’m really proud of happened this year. This post is mostly a note for myself to remind me of all those 🙂

I started the year on a positive note – I allocated quality time for me to do some coding for a novel graph analysis method I developed for writing analytics. Recovering from my laptop loss from the previous year (noting how important backing up your work is), I redid it from scratch, and made a version better than what I had last time. Coding up those interactive automated revision graphs was probably the first successful outcome in the year for me.

My biggest achievement this year was completing my PhD from the Connected Intelligence Centre. Even at the start of the year, I hadn’t started writing my thesis and I was still finishing up data analysis. Even when I started writing my thesis in February, I was unsure if I could complete it before the August deadline. The main chapter seemed like a monster job since most of the analysis had to be done newly and I hadn’t written it up before. The best decision I made at that point was to start off with this hard chapter instead of the starting or the easier ones, where I had already written stuff (like a lit review or an introduction). A pat on the back – I stuck with the deadline of completing it before I flew out to LAK19 in March- it was quite intense, both emotionally and physically taxing, but I made it! I emailed the first version of this chapter with an overall skeleton of the thesis to my supervisors when I was on a bus home – I was literally making use of every minute I had before flying out to the conference.

My participation in LAK19 was quite a success. I’ve written a whole post on it before, so I’m not gonna dive into details. But I presented a full paper and got some amazing comments, facilitated a workshop (almost solo since my co-organizers couldn’t make it at the last minute) and joined the SoLAR executive committee. I had received the ACM-Women in Computing Scholarship to attend this conference.

While I was writing the rest of my thesis, I applied for a Lectureship at UTS Faculty of Transdisciplinary Innovation and got it! I decided to go for this one over postdoctoral research positions to stay long term in academia. Searching and applying for jobs are such an ordeal and my skills were dusty; I’m super glad mine went smoothly since it was the only job I applied for, and the timing worked out perfectly.

I had to start the lectureship in July, which pushed my thesis submission deadline a month earlier. I couldn’t take a break after thesis submission, so I took a small break after sending out the full draft of my thesis to my internal reviewers in June. I went home to India for 2 weeks, which just flew by. I worked super hard to submit the final thesis after my return, to the point where I didn’t really want to take another look at it anymore! Finally, I submitted my thesis on the 25th of July, 2019.

I started lecturing right from the first month of me joining the Faculty of Transdisciplinary Innovation. It was pretty hard, truth be told, as I was trying to juggle between a few different things. First time teaching a subject from preparation to delivery, handling student queries, the admin, the mentoring, managing difficult students – it was a handful. I even dropped my plans to take part in the 3MT competition coz my schedule was so tight.

In the meantime, the reviews for my thesis came back. I passed with flying colours and the reviews were extremely positive, with appreciation of it being one of the best theses the reviewers had reviewed! Both reviewers accepted the thesis for publication without any changes. I did make some minor changes for final publication based on their comments and my degree was conferred on the 12th of November 2019.

I also got a few invitations (both internal and external to my university) to take part in events, which went really great. I was invited as a panel speaker at Intel, Sydney where we discussed ‘Artificial Intelligence Today for Our Tomorrow‘ with some great minds. I co-organised a workshop with our Faculty staff on Data in September for the Festival of Learning Design. I gave a short talk at UTS TeachMeet “The Future Starts Now“ in October,  hosted by the School of International Studies and Education, UTS – Video of Highlights here. I visited the Centre for Research in Assessment and Digital Learning (CRADLE) at Deakin University, Melbourne in October to participate as an invited delegate at the “Advancing research in student feedback literacy” international symposium – had good conversations and set plans to move the research forward in our upcoming work.

I received the Future Women Leaders Conference Award and visited Monash for two days in November for the conference, where there were a series of workshops and talks supporting future women leaders in academia from engineering and IT. I also created from scratch and published a podcast (Episode 3 of SoLAR Spotlight) that month – lots of learning happened in putting it together, from preparation to editing. I do have regrets in turning down some good opportunities that came my way, just because I was not having enough hours in a day to manage everything. But I guess it is a part of growing as an academic, since you prioritize and decide what is more important, and try to achieve work-life balance. In the end of November, I co-organized a workshop at ALASI. That was the end of work-related events in 2019, but the best was yet to come.

I went to India in December for my long-awaited wedding with my sweetheart. It was a big fat south Indian wedding, so lots of prep and stress, but loads of fun! Here’s a picture from the wedding 🙂

Notes: ‘Digital support for academic writing: A review of technologies and pedagogies’

I came across this review article on writing tools published in 2019, and wanted to make some quick notes to come back to in this post. I’m following the usual format I use for article notes which summarizes the gist of a paper with short descriptions under respective headers. I had a few thoughts on what I thought the paper missed, which I will also describe in this post.


Carola Strobl, Emilie Ailhaud, Kalliopi Benetos, Ann Devitt, Otto Kruse, Antje Proske, Christian Rapp (2019). Digital support for academic writing: A review of technologies and pedagogies. Computers & Education 131 (33–48).


  • To present a review of the technologies designed to support writing instruction in secondary and higher education.


Data collection:

  • Writing tools collected from two sources: 1) Systematic search in literature databases and search engines, 2) Responses from the online survey sent to research communities on writing instruction.
  • 44 tools selected for fine-grained analysis.

Tools selected:

Academic Vocabulary
Article Writing Tool
C-SAW (Computer-Supported Argumentative Writing)
Carnegie Mellon prose style tool
Correct English (Vantage Learning)
Deutsch-uni online
DicSci (Dictionary of Verbs in Science)
Editor (Serenity Software)
Essay Jack
Essay Map
Klinkende Taal
Marking Mate (standard version)
My Access!
Open Essayist
Paper rater
PEG Writing
Research Writing Tutor
Right Writer
SWAN (Scientific Writing Assistant)
Scribo – Research Question and Literature Search Tool
Thesis Writer
Turnitin (Revision Assistant)
White Smoke

Inclusion criteria:

  • Tools intended solely for primary and secondary education, since the main focus of the paper was on higher education.
  • Tools with the sole focus on features like grammar, spelling, style, or plagiarism detection were excluded.
  • Technologies without an instructional focus, like pure online text editors and tools, platforms or content management systems excluded.

I have my concerns in the way tools were included for this analysis, particularly because some key tools like AWA/ AcaWriter,
Writing Mentor, Essay Critic, and Grammarly were not considered. This is one of the main limitations I found in the study. It is not clear how the tools were selected in the systematic search as there is no information about the databases and keywords used for the search. The way tools focusing on higher education were picked is not explained as well.

Continue reading “Notes: ‘Digital support for academic writing: A review of technologies and pedagogies’”

LAK 2019 in Tempe, Arizona

I attended the Learning Analytics and Knowledge Conference LAK this year in the midst of my tight thesis writing schedule, and did not regret it 🙂 This 9th International LAK (4-8 Mar, 2019) was held in Tempe, Arizona which meant a flight travel of 15 hours + transit from Sydney one way; I survived, thankfully.

First of all, I was excited to have been awarded a scholarship from ACM-W that supports Women in Computing for conference travel. And super excited to have received it for LAK which competes with journals in publishing some of the most influential work in educational technology.

I kicked off LAK2019 with the full day Writing Analytics workshop I chaired on Advances in Writing Analytics: Mapping the state of the field. While the other workshop organizers could not make it that day which was unfortunate, I’m thankful for the support from UTS CIC colleagues and the participants for helping to run a successful workshop. This fourth workshop in the series of Writing Analytics workshops in LAK had great participation and discussions. We saw interesting presentations on writing analytics from various speakers, and tried a demo version of AcaWriter to see the tool in action – check out tweets with #WaLAK19 and #LAK19. We brainstormed utopian and dystopian visions of how writing analytics in 2030 would look like, and discussed ways to get to a desirable future from where we are now. The potential formation of a Special Interest Group on Writing Analytics (SIGWA) was discussed to facilitate a community of researchers in the area. Notes from the workshop are shared here.

In the main conference, I presented our full research paper, co-authored by Dr. Simon Knight and Prof. Simon Buckingham Shum on Contextualizable Learning Analytics Design: A Generic Model and Writing Analytics Evaluations. We emphasized the need for flexible Learning Analytics Applications that can provide contextualized support, and demonstrated the CLAD model with our example.

I recommend watching the key note recordings from LAK’19, which are added in the SOLAR youtube channel. I would have loved to go into more detail to highlight some of the interesting work across LAK, but my notes for this conference are shorter than my usual notes since I’m now back to thesis writing and frantically managing time 😂. I did come across exciting work and meet lots of interesting people, most of whom I followed-up (I think!), so hope there would be new collaborations! I also officially joined the Society of Learning Analytics (SOLAR) executive committee as the elected student member. Thrilled and looking forward to serving on the committee!

Contextualizable learning analytics for writing support

Recently I gave a talk on Augmenting pedagogical writing support with contextualizable learning analytics at the CRLI seminar series in the University of Sydney.  It was a great opportunity to share and discuss ideas from my PhD research, and indeed a privilege to be invited to present at this seminar. Long time slot means less time constraints, so I enjoyed doing the 1 hour+ session. The talk is recorded and available for viewing on Youtube, and the slides are here. This post is a summary of the key ideas from this talk and an upcoming paper on ‘Contextualizable Learning Analytics Design (CLAD)’.

Big data, learning analytics and education:

Big data and artificial intelligence are changing many ways we do things to improve our lives (for better or for worse). Companies around the world including Facebook, Google, Apple and Amazon use data everyday to get big insights to support us. What can the more traditional organizations like educational institutions use data for? Can we harness this technology and data to improve learning? To answer these questions, Learning Analytics (LA) emerged as a field to attempt tackling huge amounts of data in education. Although data was previously available in education research for decades, different granularities of data from multiple sources in authentic scenarios and technical affordances of new tools can now support many causes which were not previously plausible. This root cause for the inception of the field has probably been a reason for its emphasis on ‘big impact’ and generalizable solutions that can cater to and scale up to huge numbers. Massive Open Online Courses (MOOCS) are a classic example of how we can scale teaching to a large number of learners using technology. However, the problem with scalable, generalizable solutions in learning analytics is that education is inherently contextual, and a one-size-fits all approach would not work in all contexts the same way. This has led to the argument on moving from big data to meaningful data for learning analytics.

Bringing in the context:

To bring the educational context to Learning Analytics (LA), it must be coupled with pedagogical approaches. This involves the integration of LA in pedagogical contexts to augment the learning design and provide analytics that are aligned with the intended learning outcomes. Learning Design (LD) describes an educational process, and involves the design of units of learning, learning activities or learning environment which are pedagogically informed. LA can provide the necessary data, methodologies and tools to test the assumptions of the learning design, and LD can add value to the analytics by making it meaningful for the learner. By bringing LA and LD together, they can contribute to each other and close the gap between the potential and actual use of technology.

Contextualizable Learning Analytics Design:

We introduce the Contextualizable Learning Analytics Design (CLAD) model in a forthcoming article by bringing together the elements of LA and LD for context. The educators are involved with LA developers to co-design this contextualization. This involves LD elements of assessment and task design, and LA elements of features and feedback working dynamically and in sync for different contexts, rather than being rigidly fixed. The CLAD model is demonstrated by implementing the Writing Analytics tool ‘AcaWriter’ in different learning contexts (Law essay writing, Accounting business report writing). AcaWriter, developed by the Connected Intelligence Centre, UTS provides automated feedback on student writing based on rhetorical moves. To contextualize the use of this LA tool for students, the elements of the CLAD model were employed as follows:

  • Assessment formed the basis of contextualization to align AcaWriter with the intended learning outcomes.
  • The features of data that are important for the context were picked so that AcaWriter can bring them to the attention of the learners.
  • The feedback from AcaWriter was tuned to make it relevant for the context of writing by mapping it back to assessment criteria.
  • Task design ensured that AcaWriter activities are relevant to the learner and grounded by pedagogic theory.

With such contextualized LA, the educator has agency to design learning analytics that is relevant to the learning context, and the learner finds it meaningful due to its embedding in the curriculum. This ensures that LA contributes to learning in authentic practice by augmenting existing good pedagogic practice. The approach scales over multiple learning contexts by transferring good design patterns from one learning context to another (for example from law essay writing to accounting business report writing).

More details on the above can be found in the following article, and related resources are available on the HETA project website.


Working with Jupyter notebooks #code

Jupyter is an open source program that helps you share and run code in many different programming languages. Jupyter notebooks are great to quickly prototype different versions of code, as they are easy to edit and try different outputs. The format of a Jupyter notebook is similar to reports in the form of Markdowns that are usually used in R. It can contain blocks of text, code, equations and results (including visualizations) all in one page. We’ve used Jupyter notebooks to run text analysis workshops in conferences, and the feedback was pretty good.

The Writing Analytics workshop is starting at #LAK18. Jupyter notebooks are being used. #great

I find that Jupyter notebooks are great for sharing code and results across different people, and if you’re hosting it, it saves a lot of trouble in organising a workshop where you want participants to install software. It works well for non-technical audience too, since they can choose to ignore what’s inside the code block by simply running it and focus on the results block. They are quite popular now for data science experiments, so this post will be a good place to start to know and use them. You can use an already available notebook (if you’ve downloaded one from Github) and play with it, or create your own Jupyter notebook from scratch. This post will guide you to create your own notebook from scratch demonstrating some basic text analysis in Python.

Installing Jupyter

If you want to try a Jupyter notebook first without installing anything, you can do so in this notebook hosted in the official Jupyter site. If you want to install your own copy of Jupyter running in your machine to develop code, then use one of the two options below:

  • If you are new to Python programming, and don’t have python installed in your machine, the easiest way to install Jupyter is by downloading the Anaconda distribution. This comes with in-built Python (you can choose either 2.7 or 3.6 version of Python when you download the distribution – the code I’m writing in this post is in 2.7).
  • If you already have Python working in your machine (as I did), the easiest way is to install Jupyter using the pip command as you do for any Python package. Note that if pip and python are already setup in your system path, you can simply use $ pip install jupyter from the command prompt.

Now that Jupyter is installed, type the command below in your anaconda prompt/command prompt to start a Jupyter notebook:

$ jupyter notebook

The Jupyter homepage opens in your default browser at http://localhost:8888, displaying the files present in the current folder like below. You can now create a new Python jupyter notebook by clicking on New -> Python2 (or Python 3 if you have Python version 3). You can move between folders or create a new folder for your Python notebooks. To change the default opening directory, you should first move to the required path using cd in the command prompt, and then type$ jupyter notebookOpen the created notebook, which would look like this:

This cell is a code block by default, which can be changed to a markdown text block from the drop-down list (check the figure above) to add narrative text accompanying the Python code. Now name your notebook, and try adding both a code block, and markdown block with different levels of text following the sample here:

To execute the blocks, click on the Run button (Alternatively, use Ctrl+Enter in Windows – Keyboard shortcuts can be found in Help -> Keyboard shortcuts). This renders the output of your code and your markdown text like this:

That’s it. You have a simple Jupyter notebook running on your machine. Now to try a bit more, here’s the sample code you can download and run to do some basic text analysis. I’ve defined three steps in this code: Importing required packages, defining input text, and analysis. Before importing the packages/ libraries you need in step 1 however, they should be first installed in your machine. This can be done using the Pip command in the command prompt/anaconda prompt like this:  $ pip install wordcloud (If you run into problems with that, the other option is to download an appropriate version of the package’s wheel from here and install it using $pip install C:/some-dir/some-file.whl).

Python code for the three steps is below:

#Step 1 - Importing libraries
from wordcloud import WordCloud, STOPWORDS  #For word cloud generation
import matplotlib.pyplot as plt             #For displaying figures
import re                          #Regular expresions for string operations

#Step 2 - Defining input text

inputtext = "A cockatoo is a parrot that is any of the 21 species belonging to the bird family Cacatuidae, the only family in the superfamily Cacatuoidea. Along with the Psittacoidea (true parrots) and the Strigopoidea (large New Zealand parrots), they make up the order Psittaciformes (parrots). The family has a mainly Australasian distribution, ranging from the Philippines and the eastern Indonesian islands of Wallacea to New Guinea, the Solomon Islands and Australia. Cockatoos are recognisable by the showy crests and curved bills. Their plumage is generally less colourful than that of other parrots, being mainly white, grey or black and often with coloured features in the crest, cheeks or tail. On average they are larger than other parrots; however, the cockatiel, the smallest cockatoo species, is a small bird. The phylogenetic position of the cockatiel remains unresolved, other than that it is one of the earliest offshoots of the cockatoo lineage. The remaining species are in two main clades. The five large black coloured cockatoos of the genus Calyptorhynchus form one branch. The second and larger branch is formed by the genus Cacatua, comprising 11 species of white-plumaged cockatoos and four monotypic genera that branched off earlier; namely the pink and white Major Mitchell's cockatoo, the pink and grey galah, the mainly grey gang-gang cockatoo and the large black-plumaged palm cockatoo. Cockatoos prefer to eat seeds, tubers, corms, fruit, flowers and insects. They often feed in large flocks, particularly when ground-feeding. Cockatoos are monogamous and nest in tree hollows. Some cockatoo species have been adversely affected by habitat loss, particularly from a shortage of suitable nesting hollows after large mature trees are cleared; conversely, some species have adapted well to human changes and are considered agricultural pests. Cockatoos are popular birds in aviculture, but their needs are difficult to meet. The cockatiel is the easiest cockatoo species to maintain and is by far the most frequently kept in captivity. White cockatoos are more commonly found in captivity than black cockatoos. Illegal trade in wild-caught birds contributes to the decline of some cockatoo species in the wild. Source:"

print("\nInput text for analysis:\n ")

#Step 3 - Analysis

print "Summary statistics of input text:"

wordcount = len(re.findall(r'\w+', inputtext))
print "Wordcount: ", wordcount

charcount = len(inputtext) #including spaces
print "Number of characters: ", charcount

#More options for wordclouds here:
wordcloud = WordCloud(    stopwords=STOPWORDS,

plt.imshow(wordcloud, interpolation="bilinear")

The downloadable ipynb file is available on Github.

Other notes:

  • This post is intended for anyone who wants to start working with Jupyter notebooks, and assumes prior understanding of programming in Python. The Jupyter notebook is another environment to easily work with code, but the coding process is still very traditional. If you’re new to Python programming, this website is a good place to start.
  • You can use multiple versions of Python to run Jupyter notebooks by changing its Kernel (the computational engine which executes the code). I have both Python 2 & Python 3 installed, and I switch between them for different programs as needed.
  • While Jupyter notebooks are mainly used to run Python code, they can also be used to run R programs, which requires R kernel to be installed. The blog post below is a useful guide to do that:

Telling stories with data and visualizations – Some key messages

The topic of telling stories from data is huge and probably needs many many hours and books to explain the ideal ways of doing it. But Dr. Roberto Martinez did a great job in giving us a quick introduction to the topic and its pragmatic application in an hour at his talk at the UTS LX lab. It very much aligned with the Connected Intelligence Centre‘s  vision of building staff capacity in data science particularly by keeping human in the center of the data. This post includes my notes from this talk where I summarize some of the key messages.

Humans are producing enormous amounts of data these days. According to recent statistics, 2.5 quintillion bytes of data are created every day and the pace keeps growing. But, there is a stark contrast between data and knowledge – Data by itself means very little, and knowledge is created only when the data is made sense of. We might be drowning in data, but not in knowledge. Roberto compares this abundance of data to oysters and an insight to a pearl. We need to open many oysters to maybe find one pearl.

The rest of the blog is divided into two main sections 1. Data Storytelling, 2. Data visualization, and a few overall key messages that I took away from the talk.

Data Storytelling:

The value of data is not the data itself, but how we present it. This is what makes storytelling really important to present insights from data. It is not about presenting ALL the data we have, but to highlight the main insights from the data that should be noted. It is about finding patterns from the data to make people engaged with the story just like finding hooks in a fictional story. It often operates in conjunction with data visualization to communicate results from data. Check out the list of resources given at the end of this post for detailed reading.

There are a few ways to make the insights clear and pop out when communicating the story from data:

  • The first step is to declutter the data by removing all the noise. This can be done by stripping down all the unwanted information and building up on the useful insights.
  • The next key thing to do is to foreground things that are important. We do not want too much ink/ data that makes the results too complicated to understand.
  • A data story approach can be used merging narrative and visuals together to engage audience and point to key messages from the data (see examples of line graphs annotated this way here). Also check out this interesting article and podcast on the good and bad of storytelling for further reading.

Continue reading “Telling stories with data and visualizations – Some key messages”