The topic of telling stories from data is huge and probably needs many many hours and books to explain the ideal ways of doing it. But Dr. Roberto Martinez did a great job in giving us a quick introduction to the topic and its pragmatic application in an hour at his talk at the UTS LX lab. It very much aligned with the Connected Intelligence Centre‘s vision of building staff capacity in data science particularly by keeping human in the center of the data. This post includes my notes from this talk where I summarize some of the key messages.
Humans are producing enormous amounts of data these days. According to recent statistics, 2.5 quintillion bytes of data are created every day and the pace keeps growing. But, there is a stark contrast between data and knowledge – Data by itself means very little, and knowledge is created only when the data is made sense of. We might be drowning in data, but not in knowledge. Roberto compares this abundance of data to oysters and an insight to a pearl. We need to open many oysters to maybe find one pearl.
— UTSfutures (@UTSfutures) August 29, 2018
The rest of the blog is divided into two main sections 1. Data Storytelling, 2. Data visualization, and a few overall key messages that I took away from the talk.
The value of data is not the data itself, but how we present it. This is what makes storytelling really important to present insights from data. It is not about presenting ALL the data we have, but to highlight the main insights from the data that should be noted. It is about finding patterns from the data to make people engaged with the story just like finding hooks in a fictional story. It often operates in conjunction with data visualization to communicate results from data. Check out the list of resources given at the end of this post for detailed reading.
There are a few ways to make the insights clear and pop out when communicating the story from data:
- The first step is to declutter the data by removing all the noise. This can be done by stripping down all the unwanted information and building up on the useful insights.
- The next key thing to do is to foreground things that are important. We do not want too much ink/ data that makes the results too complicated to understand.
- A data story approach can be used merging narrative and visuals together to engage audience and point to key messages from the data (see examples of line graphs annotated this way here). Also check out this interesting article and podcast on the good and bad of storytelling for further reading.
Visualizations make data and statistics easier to understand, and gain insights which could be hard to get otherwise. To appreciate the power of visualized data as opposed to only numbers, take a look at the Anscombe’s quartet which illustrates how nearly identical simple descriptive statistics appear very different when graphed.
But before diving into creating a visualization, it is important to understand the intended purpose of the visualization. For instance, is it meant to help the audience in exploring, understanding, finding patterns and insights from the data? Such a tool to make sense of the data is an exploratory visualization. If the purpose of the visualization is to explain something from the data when we have already have an insight to highlight, then it is an explanatory visualization. To craft the most appropriate visualization, context is key. Think about the following:
- Who – Who is the audience of the visualization?
- You – What is your relationship to the audience, do they trust you, and believe your credibility?
- What – What is the action being prompted from the visualization? Is there a goal for the story to be told?
- How – How will you deliver the story and the insight?
To apply these ideas practically, below is a list of fundamental visualization ideas that can be used to represent data in different forms, and the considerations to keep in mind while using them:
- Line charts are great for showing trends in time.
- Bar charts are simple but useful visualizations but the design principles should be thought through. E.g. Are the bars too think, thick, or just right?
- Text is a way to present data too. It is a much simpler and better way to present data instead of a basic bar chart – E.g. 20% of children have a stay-at-home mom in 2012 compared to 41% in 1970.
- Tables can be represented as heat maps – combining tables and texts to consider visual aspects.
- Scatter plots are useful for categorical data E.g. partitioned data set.
- Waterfall charts can be used to represent information in a timeline E.g. Biography details.
- Horizontal stacked charts work well for a list of responses rated in a scale E.g. Survey responses ranging from 1 to 5.
- Slopegraph could be very effective for comparing pre-and post tests to plot the changes over time.
- Comparing bars on the higher levels of vertical stack bars is hard when they’re all in different levels.
- Some visualizations might need to be divided into sub-visualizations to declutter data E.g. Multiple line charts
- Donut charts and pie charts should be used with caution, especially in 3D, since their sizes do not correspond well to the data
- Colours used to highlight key aspects should be picked appropriately for the visualization, generally less colour is more. Keep in mind to design for inclusion so that the visualization works for people having colour blindness as well. There are palettes available in tools like Tableau to cater to them, and simulators to test our visualizations for different color blindness.
My key take-home messages:
- Data by itself means nothing, unless we get insights out of it.
- Not everything is important in data, highlight the message you want to convey in your story.
- Know the purpose of a visualization before constructing it – is it for an exploratory/explanatory purpose?
- Finding the best way to visualize is key to catering for the context. Think about ‘Who, You, What, How’.
- Not all designs are the same – find the best chart for the data, keep it simple, use the right colours and design.