Fact checking with national statistical office data

Full Fact’s “How Fact Checking Works” lists the steps in the fact checking process:

  1. Decide what you’re checking
  2. Check the claim’s context
  3. Find the source
  4. Empathise with the audience
  5. Check the source quality
  6. Ask deeper questions
  7. Write up
  8. Review

While all these steps are important, steps 3 and 5 are clearly related to finding and using data. Often for fact checkers this is national statistical data, particularly from NSOs. If the data from NSOs is difficult to find, understand or analyse, it can slow down or even make it impossible to fact check a claim.

From this, we can then list the ways national statistics publishing affects fact checkers.

  • What data exits – know what data is available in the country
  • Find data – look for data related to a claim
  • Access data – read, download, and interact with the data
  • Analyse data – analyse data, or use analysis provided
  • Contact about data – get more info about the data
  • Reference data – present and discuss the data in the published fact check

In the following sections, we describe these in more detail. We show examples from NSO websites, describing where, in our view, they go well and where they could be improved.

What data exists

NSOs can make it clear what topics they cover and what data they hold. This can be high-level lists of themes or categories, along with more detailed inventories of data.

This helps fact checkers to have a broad understanding of the scope of the NSO in their country. And in turn this helps them know which claims can and can’t be fact checked.

For example, nearly all NSOs will have high-level statistics about the economy. But not all will publish data on, say, the environment. If a fact checker knows environment data is not available, they’ll not lose time trying to check claims related to the topic. However, they can put their efforts into advocating for the NSO to collect data on the topic of the environment.

Fact checks are also affected by the quality and timeliness of the published data. If someone makes a claim that inflation has risen this quarter but the NSO only publishes inflation data yearly, then this affects if the claim can be verified.

For examples of how NSOs show the range of topics they cover. Statistics South Africa has a list of categories in the right hand bar on their site. Statistics Canada has a page listing the subjects they cover. While Argentina’s INEC homepage shows a list of topics by hovering over the ESTADÍSTICAS heading. The Pakistan Bureau of Statistics keeps a simple list on the left-hand side of their homepage.

Find data

Fact checkers search for NSO datasets in a number of ways. They can use an external search engine like Google, or use the search on the NSO website, or navigate through the site, or search and scroll through reports which contain data tables.

All of these can be made easier by how NSOs describe and publish data. NSOs can have data available alongside reports, a website structure that’s easy to navigate or include good dataset descriptions.

A useful search function on the NSO website with filters can be of great help. For example, the search results page on the UK’s Office of National Statistics site allows the user to filter by publication or data type. The Department of Statistics Singapore uses Google as a site search, but adds a number of filters for things like data, publications, PDFs and Excel files. Whereas the results on the India Ministry of Statistics and Programme Implementation has just a straight list of results.

Metadata is data that describes datasets. NSOs can add quality metadata to their datasets such that they’re easier to understand, find and work with. This slidedeck from the United Nations Statistical Division describes the benefits of good metadata for statistics:

Metadata (and data) that follow specific standardized patterns:

  • are easier for users to interpret and lend themselves to machine readability and electronic exchange
  • use a common template for organizing metadata
  • improve comparability of data, both at the global level (between countries), and within countries – to understand comparability in time series over time

Adding standardised metadata to datasets can make it easier for external search engines to find them. In turn it makes it easier for fact checkers to find the data they need.

Google’s Dataset Search is a tool that indexes datasets online that are marked up in the right format (info on this formatting from Google). To see the effects of this, searching for the website of the UK’s Office of National Statistics produces lots of results. Whereas searching for the website for CSO Ireland does not produce results – indicating that Dataset Search and the CSO site are not sharing information about datasets.

Access data

Different file formats are used for different purposes. Some are more suitable for reports such as Word documents or PDFs. Some are good for graphs or images. Others are more suitable for storing data such as spreadsheets.

Choosing the right file format involves considering the data user, their data skills, and their use cases. When possible, they can also publish the data in multiple formats to cover a wider range of users.

Spreadsheets

Data tables (tabular data) are a fundamental part of publishing statistical data. Often referred to as spreadsheets, they provide data in rows and columns which can be viewed, used and analysed with software applications.

Unfortunately some NSOs include data tables solely inside of reports which are in PDF files. This makes it hard to do analysis without having to copy all the cells into their own spreadsheet software. Here, Statistics South Africa have financial statistics in a PDF report but have the data tables in both a separate PDF and an XLS file.

Portals

Some NSOs provide data within data portals. Statistics Canada has transport data viewable in a table on the website. This works as a type of data portal, allowing users to bring up different slices of the data. It also allows downloading data in different formats and slices. Another example of a data portal is the database of the federal statistical office of Germany, where users can slice and dice the data for their needs. It also offers data visualisations and multiple file formats for data downloads. The Uganda Bureau of Statistics has a collection of data portals. Some of these portals are impressive and in good shape, others maybe less so.

This points to a challenge with portals and other dashboards. When done well they can be quite powerful for data access and analysis. However dashboards do take up resources on maintenance and updating. Simple alternatives like a single webpage with good metadata and a downloadable CSV or spreadsheet file take less work and may cover most of the use cases.

APIs

A more advanced form of publishing data is an Application Programming Interface (API). This is software that allows access to databases in a consistent way so that a computer can query the data. The standards in APIs means information retrieval can be automated, and not always needing a human to download data from a website.

It’s not common for fact checkers to use APIs, or for NSOs to publish data with them. However, in our research we found more tech-savvy fact checkers do want their NSOs to publish data with APIs. Also, technologists working in fact checking organisations can use APIs to build tools that help fact checkers.

A few examples of NSOs with APIs include the UK’s Office of National Statistics, Statistics Sweden’s API, and the Web Data Service from Statistics Canada.

Analyse data

An important part of an NSO’s role is to provide analysis of data they publish. Good analyses can save fact checkers time as they clearly highlight the most important insights from a dataset. Publishing every interpretation of every dataset is of course impossible, and so NSOs need to understand the main analyses their users expect of them.

Fact checkers will also perform their own analysis on NSO datasets they use. This may be as the NSO does not have the analysis available, or it needs to be adapted, or the style doesn’t align with the fact checker’s.

As mentioned before, static files like PDFs make it hard to do analysis. This report on population estimates from Statistics South Africa, has the quite common approach of data tables and graphs inside a report PDF. This is fine for viewing, but the data should also be available as a separate download in a CSV or Excel file.

Visualisations like graphs or plots can also be made available. In this productivity overview from the UK’s Office of National Statistics, we can see graphs are available to download as image files. These image files include figure details and the “Source: Office for National Statistics” watermark. It means fact checkers don’t have to create screenshots of graphs or add surrounding info themselves. For more advanced use, the page also allows users to copy the visualisation code so that it can be included in websites elsewhere. Another example is this page on aviation statistics from the Central Statistics Office Ireland. It has some useful features, including highlighting data on hover over, links to open the tables in their data portal, and the option to download data as an Excel file.

Contact about data

Fact checkers contact national statistical offices about data for a number of reasons. To enquire if certain datasets exist, to understand extra context about a dataset or to get further analysis.

However, fact checkers can often spend a lot of time just trying to get this information from NSOs. It may be that they can’t find the right person and are passed through multiple people. Or they send an email and they get no response. Or the contact details are out of date – the person has left or the phone number is invalid.

There are lots of options for points of contact but the most important is being reliable. It could be phone numbers, emails, fax, postal address or social media. Or contact details for the entire organisation, certain teams, or individuals. As long as it’s clear who to contact with which issues, and those contact points are reliable, then this would greatly speed up data use.

The Pakistan Bureau of Statistics contacts page has a simple, clear list of individuals in the organisation, including their job title, name, phone, fax and email. CSO Ireland lists subject matter experts – giving a contact name, phone number and email address. Whereas the UK’s Office of National Statistics includes who to contact about each dataset (for example this crime dataset).

NSOs can also have a more active engagement with their audience. They can find their data users, understand their requirements, tell them about new developments, and use them to test new services. This could be through workshops, surveys, user testing or consultancy. This would help build familiarity and trust between the NSO and the fact checking community. A good example is Statistics Canada’s Data service centre. This is a data user engagement programme which works with different groups to improve relations and understand how the statistical service can be more effective.

Reference data

Fact checkers should always provide a reference to the source data in a fact check. Usually this is a link to the web page where they found the data. They should also provide enough context about the data so that the reader trusts the numbers.

Two examples of referencing NSO data are in these fact checks from Full Fact.

However, there are still open questions on how fact checkers should reference NSO data.

  • What exactly should the fact check link to? To the actual dataset, or the webpage with the dataset, or the report which talks about the data?
  • Does the fact check need to explain who the NSO are?
  • If the NSO provides context and caveat about the data, how should it be presented in the fact check?
  • The fact check needs to be clear if any analysis or visualisation came from the NSO or the fact checker.

The fact checking community could perhaps develop some best practices for referencing data to help fact checkers handle these issues.