nso-stats-fetcher

fetch-stats

National Statistical Offices Statistics Fetcher

Fetches and cleans data from NSO websites and publishes them as in a standardised tidy data format.

This work has two goals

The data files follows a simple timescale,observation format. Time is YYYY-MM, and observation is percentage change. For example:

month,observation
1996-01,47.56
1996-02,43.645
1996-03,41.9048
...

These are the statistics that are fetched, reformatted and stored in the ./data directory:

In almost all cases the data file is downloaded and read in (except for Philippines where the numbers were hard-coded). Preferably the files would be JSON or a CSV, but some countries have PDFs or XLS files. The location of all these files online and other metadata is in the data/nso_stats_metadata.json file.

It is also deployed as a Github action which runs several times between 6am and 10am UTC. So some of the statistics should stay up-to-date. You can view this Github action in .github/workflow/fetch_stats.yaml. However, given the variability of these statistics data, it wouldn’t be surprising if the action breaks at some point if the published format changes.

Dependenices

Setup

Clone this repo

git clone https://github.com/FullFact/nso-stats-fetcher.git

Install required libraries

Either

poetry install

or

pip install -r requirements.txt

To run the scripts and fetch updated versions of all the statistics data, run:

python src/nsofetch/fetch_all.py

Or just run each country’s individual script individually. We use ISO 3166 country codes for standardised country names.