Factsheets & Guides

GUIDE: How to get started with data journalism

Researched by Adi Eyal and Raymond Joseph

There is no such thing as a data journalist. But you do get journalists who work with data, however.

The fundamentals of journalism have not changed with the introduction of data, analysis and interactive visualisation. Likewise, the core elements of the profession – discovering, reporting and disseminating news – remain intact.

What is data journalism?

Data journalism refers to a new set of tools that can help journalists work with large amounts of information, to dig deeper and communicate better.

It is actually an old concept under a new name. Computer-aided reporting, as it was once known, involves the use of computers to analyse large amounts of data by searching and ordering information within databases.

As the number of documents that we have grows, it becomes impossible to read each one individually. Computers are vastly superior to humans when it comes to doing things quickly. This becomes even more apparent when you look at thousands of numbers, such as the value of all procurements by the South African government or crime statistics at each police station – a head-spinning 330,000 rows of numbers in all crime categories across South Africa since 2004.

Conceptually, the data journalism process can be thought of as series of steps that you need to take in order to turn numbers into insights. It consists of four components:

  • acquisition,
  • cleaning,
  • analysis, and
  • storytelling.

You do not necessarily need to do each one of these steps for every story, but they do need to be approached in this order.

We discuss each of these steps below. Be warned: You are going to have to summon your inner nerd. But you do not need to learn how to program a computer in order to learn about data journalism, unless you want to. Or put another way, you don’t have to be a mechanic to drive a car.

Some of the best data journalism is the result of teamwork between a journalist and a coder or data wrangler.

1. Acquisition

At the heart of data journalism is a juicy dataset. Good examples might be the salaries of elected officials, the national crime statistics or even the causes of death in South Africa. Even seemingly boring data, such as fine revenue by municipality, has good stories to tell.

So, where do we find it? As a data journalist, you will need to create your own list of data sources. In South Africa there are a few standard places that you should look at first: Statistics South Africa, the Independent Electoral Commission and Treasury. Data First at the University of Cape Town has an enviable collection of datasets.

Looking further afield, the World Bank and United Nations data portals are extremely useful. There are also datasets scattered over government websites in South Africa. The City of Cape Town has launched their open data portal and Code for South Africa (Code4SA) maintains a niche portal that hosts unusual or difficult to find data.

Unfortunately, it is not always as easy as downloading a data file from the Internet. More often than not, data is published in inconvenient formats and you may need to “scrape” – extract – the data off a website. (Note: The chapter “Getting Data” in the Data Journalism Handbook gives you a rundown of the process. And use this basic tutorial courtesy of School of Data, which also offers some excellent free online data journalism training.)

If all else fails and you cannot lay your hands on data you know exists in South Africa, you can turn to the Promotion of Access to Information Act (PAIA) to request it. But it should never be your first resort as it can be costly and time-consuming. However, it is worth giving it a try if you reach a dead-end.

The South African History Archive offers a service to help organisations, researchers, journalists and individuals submit PAIA requests.

2. Cleaning

Now that you have your dataset, it might be “dirty” and not ready to reveal its secrets. This can mean many things; but a typical example might be misspellings of a place, such as ‘Rosebank’ vs. ‘Rosbank’.

Unfortunately, this is the part where you will need to roll up your sleeves and give the data a good scrub to fix all the spelling mistakes, remove junk characters and generally make the data look more presentable. If you do not do it, your results are probably going to stink.

This is the least glamorous part of working with data. Also, it generally takes up the most time – as much as 80% of the entire project.

There are lots of ways of going about it. Your spreadsheet programme is a good place to start. More complex data may need something a little heavier. Open Refine is a great tool and is your last stop before you need to learn how to code.

3. Analysis

You now have a clean, or a clean as possible, dataset in your hands. This is where the fun – the real journalism – kicks in.

The most important tool is your own curiosity and your journalist’s nose for news. Interesting facts, unusual anecdotes and fascinating stories can be found within many datasets, but unfortunately they are hiding amongst a mishmash of numbers.

In order to tease these stories out, you will need to interrogate the data. What was the value of all procurements in 2014? How many individual procurements where there? Which companies won the largest tenders? The more questions you ask, the better and more specific the answers become, until eventually you learn something unexpected.

You can find many of tools to help you with your analysis but the best by far is Excel. (Note: Download this excellent how-to guide.)

4. Storytelling

And now you are where we wanted to get to in the first place. You have found something interesting in the data and you are ready to tell the world about it.

Wait a minute.

Regardless of the tools you are using to find the story, you still need to do the journalism. You will need to do the appropriate fact checking, speak to experts and all the other things required for any other story. Also, always treat a dataset (and your discoveries) with the appropriate scepticism and verify your work. Beware of the data – it can be misleading, or even lie.

You will need to find an appropriate way to report your story. And, because readers don’t want to be confronted with a string of numbers, you will need to package the story it in a way that is easy to consume.

In many cases, it is useful to visualise your findings using graphs or other visual aids. If you are publishing online, you could even go further and create interactive tools to help your audience engage with the data.

It is easy to get carried away though. You can get lost in creating a beautiful visualisation and completely forgot why you did all this work in the first place.

Graphics are not the story, but are there to support it and to assist readers to better understand what you are saying. Too often, graphics become the story and while you will find your article shared on social networks, readers might only be looking at your graphic and completely forget about the underlying story.

Some free data visualisation tools to explore are:

Addition to journalism toolkit

Data journalism is an exciting and rapidly evolving addition to the standard journalism toolkit. It may seem intimidating at first, especially if the thought of opening up a spreadsheet makes you uncomfortable.

Fortunately, a lot of work has gone into creating tools that make working with data much more accessible. To get started, this factsheets contains tools that you can use to wet your feet. You could also subscribe to the local Cape Town and Johannesburg data journalism mailing lists. Finally, subscribe to Naked Data, Code for South Africa’s weekly data journalism newsletter.

Adi Eyal is the founder and director of Code4SA. Ray Joseph is a freelance journalist, journalism trainer and media consultant.

© Copyright Africa Check 2015. You may reproduce this report or content from it for the purpose of reporting and/or discussing news and current events, subject to providing a credit to "Africa Check a non-partisan organisation which promotes accuracy in public debate and the media. Twitter @AfricaCheck and www.africacheck.org".