OpenAQ aggregates air quality data from disparate sources around the world to provide access of these sources in a single location.
OpenAQ uses an ETL (Extract-Transform-Load) process to ingest and harmonize air quality data. The data process has four main components: fetch, storage, presentation and archive.
After identifying a source of air quailty data, a fetch adapter is developed to pull data from the source and parse into a standard file format. OpenAQ fetch scripts range from HTML scrapers, FTP directory scanners to REST API scrapers depending on the source of the data.
Currently data fetching is split between two repositories:
https://github.com/openaq/openaq-fetch — primarily reference grade/government sources written in NodeJS and runs on AWS Fargate processes.
https://github.com/openaq/openaq-lcs-fetch — our newer generation of fetch processes, primarily fetching low cost sensor sources. Written in NodeJS and runs on AWS lambda processes.
Data is stored in a PostgreSQL database using the TimeScale extension for added time series functionality and PostGIS for geospatial functionality.
View the source code for the database on github (https://github.com/openaq/openaq-db
OpenAQ provides a REST API for programmatic access of the database.
Read more in the API section below or in the API Reference for detailed API reference.
View the source code for the REST API on github https://github.com/openaq/openaq-api-v2
After insertion into the database, data is stored in a publicly available AWS S3 bucket via the Open Data on AWS program.
Updated 9 months ago