Derive the Data Swiftly by Querying using Amazon Athena

October 3, 2022 | Comments(0) |

TABLE OF CONTENT

1. Overview
2. Introduction to Amazon Athena
3. Need for Amazon Athena
4. Steps to query the data present in Amazon S3 using Athena and Glue Data Catalog
5. Limitations of Amazon Athena
6. Conclusion
7. About CloudThat
8. FAQs

 

Overview

In this fast-moving world, there is a lot of data being generated and stored in multiple data stores which can later be used to get insights into it. But when you want to know the details of a person either to personalize or to promote we need to access only specific information from our entire data, so this is where querying comes into the picture. Here we have a query service called Athena provided by AWS which is a serverless and SQL-based service.

Introduction to Amazon Athena

Amazon Athena is a serverless and interactive query service that uses standard SQL to query data directly from Amazon S3 to analyze it. We can directly point Amazon Athena to the data in S3 using the AWS management console to run queries and get results in a very less period also Athena scales parallelly by running the queries on a huge amount of data.

Need for Amazon Athena

Athena works directly with querying without worrying about the data store and data load. This can query data in different formats like structured, semi-structured, and unstructured data which is stored in S3. Athena can integrate with AWS QuickSight to generate reports from the queried data to get insights into the data. Athena also integrates with Glue Data Catalog which is the metadata store for data present in S3. This helps to create tables and query data from the central metadata store available.

If you want to know more about the Top 5 Data Analytics Tools in AWS here.

Steps to query the data present in Amazon S3 using Athena and Glue Data Catalog

Step 1: Create two buckets in S3, one bucket is for your data and the other bucket is to store the results of queries.

Step 2: We need to store the JSON file in the data bucket.

  • Note: When you open Athena in the settings which is in the right corner make sure you mention the results in the exact bucket path.

Step 3: After you open the Athena query editor, first we need to connect to data sources, here our data sources are S3 and Glue Data Catalog and click next.

Step 4: Now, choose an option to set up a crawler in AWS Glue to retrieve the schema information and click on connect to AWS Glue.

Step 5: Create a crawler in the AWS Glue by providing the S3 bucket details where the file is present as well as the database where this data should be present also, choose run it on demand, and once the crawler is created click on run crawler.

Step 6: Once the crawler run is successful, the data will be shown in the Athena. Now you can perform any query, and your result will be stored on the S3 result bucket.

Limitations of Amazon Athena

  • If your source files start with an underscore or a dot then it will treat them as hidden
  • The rows and column size in Athena should not exceed 32 MB
  • Athena cannot query data in S3 Glacier and S3 Glacier Deep archive

Conclusion

Amazon Athena is a serverless service that uses SQL to query. This is easy to use and also flexible to run multiple queries at the same time. In Athena, we pay only for the queries we run. Athena uses IAM for security and also can integrate with other AWS services. Hence, Athena can run queries parallelly for large data sets by making complex queries fast.

About CloudThat

CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding AWS Athena, AWS S3, or Glue Data Catalog and I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.

FAQs

Q1. What are the data formats supported by Athena?


A. Amazon Athena supports data formats like CSV, TSV, JSON, or text files and also supports open-source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.


Q2. What kind of data types does Amazon Athena support?


A. Amazon Athena supports both simple data types such as INTEGER, DOUBLE, and VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.

 

Q3. What are the AWS data sources Athena can connect to?


A. Athena provides built-in connectors for several data stores including Amazon Redshift, Amazon DynamoDB, Amazon DocumentDB, and BigQuery. You can use these connectors to enable SQL analytics use cases on structured, semi-structured, object, graph, time series, and other data storage types.


Leave a Reply