About Open Data on AWS
Bucket structure
The bucket root url is
https://openaq-data-archive.s3.amazonaws.com/
The files are organized with the the following structure:
Directoryrecords
Directorycsv.gz
Directorylocationid={locationid}
Directoryyear={year}
Directorymonth={month}
- location-{locationid}-{year}{month}{day}.csv.gz
- …
example file path:
/records/csv.gz/locationid=2178/year=2022/month=05/location-2178-20220503.csv.gz
Prefixes
The prefix structure follows the Apache Hive partitioning format with key value pairing in the directory title i.e. key=value
. Hive formatting is not used for the top two directories named records
and csv.gz
.
records
Top level directory containing individual records from locations.
file format
The file format of the data. Currently only csv.gz
(gzip csv)
locationid={locationid}
The OpenAQ id for the location, a numerical value, e.g. location_id=42
year={year}
The four digit number of the year, e.g. year=2022
month={month}
The zero padded two digit number of the month, e.g. month=02
Measurement File
Measurements are stored as csv (comma-separated values) compressed with gzip (csv.gz). The measurements file holds measurement values for all sensors for a given location on a given day. The file follows the following naming convention:
loc-{locationid}-{year}{month}{day}.{ext}
File structure
Data are stored in narrow format with the following columns:
Column name | description | example |
---|---|---|
location_id | the OpenAQ location id of the station | 2178 |
sensor_id | the OpenAQ sensor id of the pollutant measured | 3919 |
location | name of location | Del Norte-2178 |
datetime | date and time in ISO-8601 format with timezone offset of the measurement | 2021-12-16 01:00:00+00:00 |
lat | decimal degree (EPSG:4326) representation of the latitude (Y) of the sensor. Minimum of -90. Maximum of 90. | 35.1353 |
lon | decimal degree (EPSG:4326) representation of the longitude (X) of the sensor. Minimum of -180, Maximum of 180. | -106.584702 |
parameter | type of pollutant being reported | pm10 |
unit | unit of the parameter | µg/m³ |
value | the decimal value of the measurement | 42.2 |
Example file
location_id,sensors_id,location,datetime,lat,lon,parameter,units,value2178,3919,Del Norte-2178,2023-02-22T01:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,14.02178,3919,Del Norte-2178,2023-02-22T02:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,4.02178,3919,Del Norte-2178,2023-02-22T03:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,7.02178,3919,Del Norte-2178,2023-02-22T04:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,6.02178,3919,Del Norte-2178,2023-02-22T05:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,5.02178,3919,Del Norte-2178,2023-02-22T06:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.02178,3919,Del Norte-2178,2023-02-22T07:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,20.02178,3919,Del Norte-2178,2023-02-22T08:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,28.02178,3919,Del Norte-2178,2023-02-22T09:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,19.02178,3919,Del Norte-2178,2023-02-22T10:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,26.02178,3919,Del Norte-2178,2023-02-22T11:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,127.02178,3919,Del Norte-2178,2023-02-22T12:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,318.02178,3919,Del Norte-2178,2023-02-22T13:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,241.02178,3919,Del Norte-2178,2023-02-22T14:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.0...
SNS Notification
An AWS SNS Topic is available for subscribing to S3 object creation events.
SNS ARN:
arn:aws:sns:us-east-1:817926761842:openaq-data-archive-object_created
To learn more about how to subscribe to SNS topics, view the AWS SNS documentation.
Update frequency
Files are written 72 hours after the end of day, i.e. 0:00 for the given location’s timezone. Files may be retroactively patched if data are missing due to fetching error or from historical data scrape.