About Open Data on AWS
Bucket structure
The bucket root url is
The files are organized with the the following structure:
Directoryrecords
Directorycsv.gz
Directorylocationid={locationid}
Directoryyear={year}
Directorymonth={month}
- location-{locationid}-{year}{month}{day}.csv.gz
- …
example file path:
Prefixes
The prefix structure follows the Apache Hive partitioning format with key value pairing in the directory title i.e. key=value
. Hive formatting is not used for the top two directories named records
and csv.gz
.
records
Top level directory containing individual records from locations.
file format
The file format of the data. Currently only csv.gz
(gzip csv)
locationid={locationid}
The OpenAQ id for the location, a numerical value, e.g. location_id=42
year={year}
The four digit number of the year, e.g. year=2022
month={month}
The zero padded two digit number of the month, e.g. month=02
Measurement File
Measurements are stored as csv (comma-separated values) compressed with gzip (csv.gz). The measurements file holds measurement values for all sensors for a given location on a given day. The file follows the following naming convention:
loc-{locationid}-{year}{month}{day}.{ext}
File structure
Data are stored in narrow format with the following columns:
Column name | description | example |
---|---|---|
location_id | the OpenAQ location id of the station | 2178 |
sensor_id | the OpenAQ sensor id of the pollutant measured | 3919 |
location | name of location | Del Norte-2178 |
datetime | date and time in ISO-8601 format with timezone offset of the measurement | 2021-12-16 01:00:00+00:00 |
lat | decimal degree (EPSG:4326) representation of the latitude (Y) of the sensor. Minimum of -90. Maximum of 90. | 35.1353 |
lon | decimal degree (EPSG:4326) representation of the longitude (X) of the sensor. Minimum of -180, Maximum of 180. | -106.584702 |
parameter | type of pollutant being reported | pm10 |
unit | unit of the parameter | µg/m³ |
value | the decimal value of the measurement | 42.2 |
Example file
SNS Notification
An AWS SNS Topic is available for subscribing to S3 object creation events.
SNS ARN:
To learn more about how to subscribe to SNS topics, view the AWS SNS documentation.
Update frequency
Files are written 72 hours after the end of day, i.e. 0:00 for the given location’s timezone. Files may be retroactively patched if data are missing due to fetching error or from historical data scrape.