In this fast growing world, humongous amount of data is being produced from all sources in every part of the world. It can be anything like logs from the machines, data produced from the traffic signals, data from the IoT devices, smart devices installed in homes/IT industries and a lot of other sources. After production of this vast amount of data, another problem arises of storing, configuring, managing and streaming of data.
How to manage data which occupies storage, utilizes compute power, used for analysis is an important aspect for decision making?
AWS has a solution to it. Amazon Kinesis streams is the service that you are looking for to stream the data.
Kinesis Streams will collect data form the source and stream to application for further analysis. The data is replicated across availability zones for high availability and reliability of data. It can scale based on the incoming data. It can scale from megabytes to terabytes while streaming data. It loads data into stream using HTTPs, Kinesis Producer library, Kinesis Client Library and Kinesis Agent. Basically in Kinesis Streams the data is available up to 24 hours and can also be extended up till 7 days.
Kinesis Streams resolved the problem of analysis, compute power and decision making. But, we still have a problem of storing the data. Since Kinesis Streams can only save data up to 24Hrs initially and can be saved till 7 days.
What if we need to store the data for long???
What if we need to access the data afterwards for another set of task???
There comes Kinesis Firehose into picture, AWS introduced new service called Kinesis Firehose.
This is the easiest way of streaming data when compared to Kinesis Streams. It will take care of monitoring, scaling, data management and provides data security. This blog will take you through Kinesis Firehose in an out.
Kinesis firehose captures data from web app, sensors, mobile applications and various different sources and streams them into Amazon S3 and/or Amazon Redshift and/or Amazon Elasticsearch.
It load’s massive volume of streaming data into Amazon S3 and Amazon Redshift.
It is fully managed service, which automatically scales the stream based on data and no need of administration. It can also batch, compress and encrypt data before loading, minimizes the storage used at the destination and increase security.
It automatically loads data into S3 or Redshift and can also compress and encrypt data, which helps in decreasing the storage and increasing the security.
Kinesis Firehose Vs Kinesis Streams
Delivery stream is a stream of data or collection of data records. Initially, Firehose creates the delivery stream and sends data to it, which will be stored either in S3 or Redshift.
You can create the delivery stream using Firehose console or CreateDeliveryStream API call.
Records are data blobs (blobs are binary data), which are sent by data producer. Each data blob should be maximum of 1000 KB to delivery stream. Data blobs are named as records.
Destination is data store where the data is delivered. Here, Amazon S3 and Redshift are destinations.
Features of Kinesis Firehose
Kinesis Firehose take care of infrastructure, storage, networking and also the configuration needed to load data to S3 and Redshift. There’s no need to worry about the provisioning, deployment and maintenance of hardware or software to manage the process.
Supports Multiple Destination
Pay only for the amount of data transmitted through the service. There is no minimum fees or upfront commitments.
Kinesis Firehose buffers incoming stream for certain period or based on the amount of data buffered. If any one of the feature fulfills it will stream data to destination
Its provides high level of data security. Firehose also have an option to encrypt the data automatically before moving data to destination.
Buffer size and Buffer Interval
Firehose buffers incoming stream before driving to destination for certain period of time. Buffer size is in MBs and Interval in seconds.
Choose Buffer size (1 – 128 MBs) and Buffer Interval (60 – 900 seconds) based on data delivery to Amazon S3.
Data Compression reduces the number of bits needed to store same amount of data. Three compression formats supported are GZIP, ZIP and SNAPPY or choose no data compression.
Choose Encrypt data or no encryption with a key from AWS Key Management Service.
What Kinesis Firehose made simpler??
Kinesis does not process or interrupt the raw data, you need to simply create a stream and writes data record to it.
- The compression of data (client-side) and encryption (server-side) appear based on request, and then data is driven into specified bucket
- Control Buffer size and Buffer interval for stream
- Add delimiter to isolate record sets
Steps to Create a Delivery Stream
- Select Destination of stream as Amazon S3 or Redshift based on requirements
- Provide Delivery Stream name
- Select bucket name from the list
- Create Kinesis Firehose IAM role
Configure buffer and compression options.
- Select Buffer Size in the range of 1 MB – 128 MB
- Select Buffer Interval between 60 – 900 seconds
- Select Data Compression format as you required or specify it as Uncompressed
In this blog, we saw what is Kinesis Firehose, what is the need of it, where to use Kinesis Firehose and how to configure it. In my next blog, we will see how we should use this firehose to get the analytics of an application logs.