Guide - Querying with AWS Athena
AWS Athena provides a powerful SQL interface to query data from the OpenAQ Archive S3 bucket.
To start create a table.
The Athena table definition defines all the columns and partitions for the data file in the OpenAQ Archive. The Apache Hive Partitioning format help limit objects scans throughout the bucket, reducing cost and time for querying. Partitions are optional and are typically best optimized on a case by case basis. Read more about Athena partitioning on S3 on the AWS docs.
Example
Loading all partitions can be very slow, so for this example we will partition manually for a single location and month that we are interested in:
Now with the table defined we can query the dataset using the SQL. To get all measurements for a single location (locationid 2178) in a single month (December 2022) we can use the following:
Now due to the partitioning Athena will only scan the selected locations instead of the entire bucket, reducing costs and time to compute. This will return all measurements for December 2022 in this following form: