Census 2011: StatsSA failed the open data test
The release last week of the census findings in England and Wales has been an excellent opportunity to compare the way in which it was handled, against the launch of the South African Census 2011 in October.
And, compared to the extent of the content and the format in which the data was released by The UK Office for National Statistics (ONS), it is clear that how Statistics South Africa (StatsSA) has a long way to go to getting it right.
While ONS released extensive data sets both as PDFs and in user-friendly Excel spreadsheets, allowing researchers and data journalists to independently interrogate the data, in SA it was a different story.
In SA the census release was accompanied by expensive countrywide road shows and much self-congratulatory back slapping by StatsSA for a job well done. But because the release of data to South African journalists was locked inside PDFs, it was difficult to extract without hours of mind numbing work to scrape and clean it, a problem with most of the official data available.
StatsSA also came up with several different tools, some relatively simple to use, others quite complicated, with Excel and some CSV files to interrogate their data – but each had some indicators not available in the others.
So, for example, Google Public Data Explorer had some indicators available nowhere else – but since it is merely a visualisation tool, the underlying data cannot be extracted. Shockingly, data accompanying some of these tools differed to that supplied in others, but so far queries about these discrepancies to StatsSA have gone unanswered.
The quality of the reportage in the UK compared to South Africa (more here) was also worlds apart: while data journalists on the serious papers in the UK dug deep into the data and delivered some excellent reportage, including interactive visualisations, in South Africa most media merely took what StatsSA released and regurgitated it (including colourful graphs and charts) without any serious attempt at interrogation, interpretation or analysis.
South African newsrooms are light years behind better resourced British and American newsrooms (like The Guardian and The New York Times), but there has been a concerted attempt in 2012 to introduce data-driven journalism to local newsrooms via local chapters of HacksHackers – a worldwide movement that facilitates collaboration between journalists and coders.
Census 2011 would have been a golden opportunity to boost data journalism in our newsrooms, but the shoddy and unstructured way in which the data has been released so far, means a golden opportunity has been lost.
Raymond Joseph is a freelance journalist and journalism trainer and media consultant. He is also voluntary convenor of the Cape Town chapter of HacksHackers. Contact him on [email protected] or via Twitter on @rayjoe