Skip to content

About Open Data on AWS

The bucket root url is

https://openaq-data-archive.s3.amazonaws.com/

The files are organized with the the following structure:

  • Directoryrecords
    • Directorycsv.gz
      • Directorylocationid={locationid}
        • Directoryyear={year}
          • Directorymonth={month}
            • location-{locationid}-{year}{month}{day}.csv.gz

example file path:

/records/csv.gz/locationid=2178/year=2022/month=05/location-2178-20220503.csv.gz

The prefix structure follows the Apache Hive partitioning format with key value pairing in the directory title i.e. key=value. Hive formatting is not used for the top two directories named records and csv.gz.

records

Top level directory containing individual records from locations.

file format

The file format of the data. Currently only csv.gz (gzip csv)

locationid={locationid}

The OpenAQ id for the location, a numerical value, e.g. location_id=42

year={year}

The four digit number of the year, e.g. year=2022

month={month}

The zero padded two digit number of the month, e.g. month=02

Measurements are stored as csv (comma-separated values) compressed with gzip (csv.gz). The measurements file holds measurement values for all sensors for a given location on a given day. The file follows the following naming convention:

loc-{locationid}-{year}{month}{day}.{ext}

Data are stored in narrow format with the following columns:

| Column name | description | example | | :---------- | :------------------------------------------------------------------------------------------------------------- | :------------------------ | | location_id | the OpenAQ location id of the station | 2178 | | sensor_id | the OpenAQ sensor id of the pollutant measured | 3919 | | location | name of location | Del Norte-2178 | | datetime | date and time in ISO-8601 format with timezone offset of the measurement | 2021-12-16 01:00:00+00:00 | | lat | decimal degree (EPSG:4326) representation of the latitude (Y) of the sensor. Minimum of -90. Maximum of 90. | 35.1353 | | lon | decimal degree (EPSG:4326) representation of the longitude (X) of the sensor. Minimum of -180, Maximum of 180. | -106.584702 | | parameter | type of pollutant being reported | pm10 | | unit | unit of the parameter | µg/m³ | | value | the decimal value of the measurement | 42.2 |

location_id,sensors_id,location,datetime,lat,lon,parameter,units,value
2178,3919,Del Norte-2178,2023-02-22T01:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,14.0
2178,3919,Del Norte-2178,2023-02-22T02:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,4.0
2178,3919,Del Norte-2178,2023-02-22T03:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,7.0
2178,3919,Del Norte-2178,2023-02-22T04:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,6.0
2178,3919,Del Norte-2178,2023-02-22T05:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,5.0
2178,3919,Del Norte-2178,2023-02-22T06:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.0
2178,3919,Del Norte-2178,2023-02-22T07:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,20.0
2178,3919,Del Norte-2178,2023-02-22T08:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,28.0
2178,3919,Del Norte-2178,2023-02-22T09:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,19.0
2178,3919,Del Norte-2178,2023-02-22T10:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,26.0
2178,3919,Del Norte-2178,2023-02-22T11:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,127.0
2178,3919,Del Norte-2178,2023-02-22T12:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,318.0
2178,3919,Del Norte-2178,2023-02-22T13:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,241.0
2178,3919,Del Norte-2178,2023-02-22T14:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.0
...

An AWS SNS Topic is available for subscribing to S3 object creation events.

SNS ARN:

arn:aws:sns:us-east-1:817926761842:openaq-data-archive-object_created

To learn more about how to subscribe to SNS topics, view the AWS SNS documentation.

Files are written 72 hours after the end of day, i.e. 0:00 for the given location’s timezone. Files may be retroactively patched if data are missing due to fetching error or from historical data scrape.