About Open Data on AWS
Bucket structure
Section titled “Bucket structure”The bucket root url is
https://openaq-data-archive.s3.amazonaws.com/
The files are organized with the the following structure:
Directoryrecords
Directorycsv.gz
Directorylocationid={locationid}
Directoryyear={year}
Directorymonth={month}
- location-{locationid}-{year}{month}{day}.csv.gz
- …
example file path:
/records/csv.gz/locationid=2178/year=2022/month=05/location-2178-20220503.csv.gz
Prefixes
Section titled “Prefixes”The prefix structure follows the Apache Hive partitioning format with key value pairing in the directory title i.e. key=value
. Hive formatting is not used for the top two directories named records
and csv.gz
.
records
Top level directory containing individual records from locations.
file format
The file format of the data. Currently only csv.gz
(gzip csv)
locationid={locationid}
The OpenAQ id for the location, a numerical value, e.g. location_id=42
year={year}
The four digit number of the year, e.g. year=2022
month={month}
The zero padded two digit number of the month, e.g. month=02
Measurement File
Section titled “Measurement File”Measurements are stored as csv (comma-separated values) compressed with gzip (csv.gz). The measurements file holds measurement values for all sensors for a given location on a given day. The file follows the following naming convention:
loc-{locationid}-{year}{month}{day}.{ext}
File structure
Section titled “File structure”Data are stored in narrow format with the following columns:
Column name | description | example |
---|---|---|
location_id | the OpenAQ location id of the station | 2178 |
sensor_id | the OpenAQ sensor id of the pollutant measured | 3919 |
location | name of location | Del Norte-2178 |
datetime | date and time in ISO-8601 format with timezone offset of the measurement | 2021-12-16 01:00:00+00:00 |
lat | decimal degree (EPSG:4326) representation of the latitude (Y) of the sensor. Minimum of -90. Maximum of 90. | 35.1353 |
lon | decimal degree (EPSG:4326) representation of the longitude (X) of the sensor. Minimum of -180, Maximum of 180. | -106.584702 |
parameter | type of pollutant being reported | pm10 |
unit | unit of the parameter | µg/m³ |
value | the decimal value of the measurement | 42.2 |
Example file
Section titled “Example file”location_id,sensors_id,location,datetime,lat,lon,parameter,units,value2178,3919,Del Norte-2178,2023-02-22T01:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,14.02178,3919,Del Norte-2178,2023-02-22T02:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,4.02178,3919,Del Norte-2178,2023-02-22T03:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,7.02178,3919,Del Norte-2178,2023-02-22T04:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,6.02178,3919,Del Norte-2178,2023-02-22T05:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,5.02178,3919,Del Norte-2178,2023-02-22T06:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.02178,3919,Del Norte-2178,2023-02-22T07:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,20.02178,3919,Del Norte-2178,2023-02-22T08:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,28.02178,3919,Del Norte-2178,2023-02-22T09:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,19.02178,3919,Del Norte-2178,2023-02-22T10:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,26.02178,3919,Del Norte-2178,2023-02-22T11:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,127.02178,3919,Del Norte-2178,2023-02-22T12:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,318.02178,3919,Del Norte-2178,2023-02-22T13:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,241.02178,3919,Del Norte-2178,2023-02-22T14:00:00-07:00,35.1353,-106.584702,pm10,µg/m³,9.0...
SNS Notification
Section titled “SNS Notification”An AWS SNS Topic is available for subscribing to S3 object creation events.
SNS ARN:
arn:aws:sns:us-east-1:817926761842:openaq-data-archive-object_created
To learn more about how to subscribe to SNS topics, view the AWS SNS documentation.
Update frequency
Section titled “Update frequency”Files are written 72 hours after the end of day, i.e. 0:00 for the given location’s timezone. Files may be retroactively patched if data are missing due to fetching error or from historical data scrape.