Data Lake Solution for Live Streaming Text & Audio Data with AWS Deep-Learning/ML Services

AWS Solution Overview

The below architecture diagram illustrates high-level infrastructure components of the customer’s production environment.

AD1

Data Engineering Pipeline for Text and Audio Data

This solution helped in building a data engineering pipeline for data in the form of JSON and audio data from the source.

The pipeline includes 4-5 different steps that were performed on the source data:

Source text data coming from the customer application was first stored in a central data lake solution like S3
An ETL operation is performed on it to clean the raw data
A querying mechanism is set up to extract important data from the cleaned data.
The audio data is transcribed into SRT format and translated to the desired language
This formatted audio data is again stored in a central data store for more feature implementation.

Cloud Migration
Devops
AIML & IoT

Know More

AWS Services Leveraged

Amazon API Gateway
Amazon DynamoDB
Amazon S3
AWS Lambda
Amazon Transcribe
Amazon Translate
AWS CloudFront
AWS Glue Crawler, Data Catalog, and ETL-jobs
AWS Athena
Amazon Kinesis Data Streams
Amazon Kinesis Firehose

Solution Outcome

Data-driven Architecture helped in new feature releases for their application, with an increased number of customer registration and better customer experience.
AWS Transcribe and Translate integration helped in attracting more customers from different language backgrounds.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

Cloud Training
Customized Training
Experiential Learning

WRITTEN BY Bhavesh Goswami

Bhavesh Goswami is the Founder & CEO of CloudThat Technologies. He is a leading expert in the Cloud Computing space with over a decade of experience. He was in the initial development team of Amazon Simple Storage Service (S3) at Amazon Web Services (AWS) in Seattle. and has been working in the Cloud Computing and Big Data fields for over 12 years now. He is a public speaker and has been the Keynote Speaker at the ‘International Conference on Computer Communication and Informatics’. He also has authored numerous research papers and patents in various fields.