Implementation

OpenAQ aggregates air quality data from disparate sources around the world to provide access of these sources in a single location.

1000

OpenAQ uses an ETL (Extract-Transform-Load) process to ingest and harmonize air quality data. The data process has four main components: fetch, storage, presentation and archive.

Fetch

After identifying a source of air quailty data, a fetch adapter is developed to pull data from the source and parse into a standard file format. OpenAQ fetch scripts range from HTML scrapers, FTP directory scanners to REST API scrapers depending on the source of the data.

Currently data fetching is split between two repositories:

Storage

Data is stored in a PostgreSQL database using the TimeScale extension for added time series functionality and PostGIS for geospatial functionality.

View the source code for the database on github (https://github.com/openaq/openaq-db

Presentation

OpenAQ provides a REST API for programmatic access of the database.

Read more in the API section below or in the API Reference for detailed API reference.

View the source code for the REST API on github https://github.com/openaq/openaq-api-v2

Archive

After insertion into the database, data is stored in a publicly available AWS S3 bucket via the Open Data on AWS program.