Fetches and cleans data from NSO websites and publishes them as in a standardised tidy data format.
This work has two goals
The data files follows a simple timescale,observation
format. Time is YYYY-MM, and observation is percentage change. For example:
month,observation
1996-01,47.56
1996-02,43.645
1996-03,41.9048
...
These are the statistics that are fetched, reformatted and stored in the ./data
directory:
In almost all cases the data file is downloaded and read in (except for Philippines where the numbers were hard-coded). Preferably the files would be JSON or a CSV, but some countries have PDFs or XLS files. The location of all these files online and other metadata is in the data/nso_stats_metadata.json file.
It is also deployed as a Github action which runs several times between 6am and 10am UTC. So some of the statistics should stay up-to-date. You can view this Github action in .github/workflow/fetch_stats.yaml
. However, given the variability of these statistics data, it wouldn’t be surprising if the action breaks at some point if the published format changes.
Clone this repo
git clone https://github.com/FullFact/nso-stats-fetcher.git
Install required libraries
Either
poetry install
or
pip install -r requirements.txt
To run the scripts and fetch updated versions of all the statistics data, run:
python src/nsofetch/fetch_all.py
Or just run each country’s individual script individually. We use ISO 3166 country codes for standardised country names.