Skip to content

About Open Data on AWS

Bucket structure

The bucket root url is

https://openaq-data-archive.s3.amazonaws.com/

The files are organized with the the following structure:

  • Directoryrecords
    • Directorycsv.gz
      • Directorylocationid={locationid}
        • Directoryyear={year}
          • Directorymonth={month}
            • location-{locationid}-{year}{month}{day}.csv.gz

example file path:

/records/csv.gz/locationid=2178/year=2022/month=05/location-2178-20220503.csv.gz

Prefixes

The prefix structure follows the Apache Hive partitioning format with key value pairing in the directory title i.e. key=value. Hive formatting is not used for the top two directories named records and csv.gz.

records

Top level directory containing individual records from locations.

file format

The file format of the data. Currently only csv.gz (gzip csv)

locationid={locationid}

The OpenAQ id for the location, a numerical value, e.g. location_id=42

year={year}

The four digit number of the year, e.g. year=2022

month={month}

The zero padded two digit number of the month, e.g. month=02

Measurement File

Measurements are stored as csv (comma-separated values) compressed with gzip (csv.gz). The measurements file holds measurement values for all sensors for a given location on a given day. The file follows the following naming convention:

loc-{locationid}-{year}{month}{day}.{ext}

File structure

Data are stored in narrow format with the following columns:

Column namedescriptionexample
location_idthe OpenAQ location id of the station2178
sensor_idthe OpenAQ sensor id of the pollutant measured3919
locationname of locationDel Norte-2178
datetimedate and time in ISO-8601 format with timezone offset of the measurement2021-12-16 01:00:00+00:00
latdecimal degree (EPSG:4326) representation of the latitude (Y) of the sensor. Minimum of -90. Maximum of 90.35.1353
londecimal degree (EPSG:4326) representation of the longitude (X) of the sensor. Minimum of -180, Maximum of 180.-106.584702
parametertype of pollutant being reportedpm10
unitunit of the parameterµg/m³
valuethe decimal value of the measurement42.2

Example file

location_id,sensors_id,location,datetime,lat,lon,parameter,units,value
2178,3919,Del Norte-2178,2023-02-22T01:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,14.0
2178,3919,Del Norte-2178,2023-02-22T02:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,4.0
2178,3919,Del Norte-2178,2023-02-22T03:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,7.0
2178,3919,Del Norte-2178,2023-02-22T04:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,6.0
2178,3919,Del Norte-2178,2023-02-22T05:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,5.0
2178,3919,Del Norte-2178,2023-02-22T06:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.0
2178,3919,Del Norte-2178,2023-02-22T07:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,20.0
2178,3919,Del Norte-2178,2023-02-22T08:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,28.0
2178,3919,Del Norte-2178,2023-02-22T09:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,19.0
2178,3919,Del Norte-2178,2023-02-22T10:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,26.0
2178,3919,Del Norte-2178,2023-02-22T11:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,127.0
2178,3919,Del Norte-2178,2023-02-22T12:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,318.0
2178,3919,Del Norte-2178,2023-02-22T13:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,241.0
2178,3919,Del Norte-2178,2023-02-22T14:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.0
...

SNS Notification

An AWS SNS Topic is available for subscribing to S3 object creation events.

SNS ARN:

arn:aws:sns:us-east-1:817926761842:openaq-data-archive-object_created

To learn more about how to subscribe to SNS topics, view the AWS SNS documentation.

Update frequency

Files are written 72 hours after the end of day, i.e. 0:00 for the given location’s timezone. Files may be retroactively patched if data are missing due to fetching error or from historical data scrape.