Amazon Kinesis Data Streams Overview

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.

Amazon Kinesis Data Streams enables users to build custom applications that process or analyze streaming data for specialized needs. Users can continuously add various types of data such as clickstreams, application logs, and social media to an Amazon Kinesis data stream from hundreds of thousands of sources.

Amazon Kinesis Data Streams manages the infrastructure, storage, networking, and configuration needed to stream data at the level of data throughput. Users do not have to worry about provisioning, deployment, ongoing-maintenance of hardware, software, or other services for the data streams. In addition, Amazon Kinesis Data Streams synchronously replicates data across three availability zones, providing high availability and data durability.

Scenarios for using Amazon Kinesis Data Streams

  • Real-time metrics and reporting: Amazon Kinesis Application can work on metrics and reporting for system and application logs as the data is streaming in, rather than wait to receive data batches
  • Real-time data analytics: Users can add clickstreams to Amazon Kinesis data stream and have Amazon Kinesis Application run analytics in real-time, enabling them to gain insights out of data at a scale of minutes instead of hours or days
  • Accelerated log intake: System and application logs can be continuously added to a data stream and be available for processing within seconds

Quotas and Limits of Amazon Kinesis Data Streams

  • By default, Records of a stream are accessible for up to 24 hours from the time they are added to the stream. You can raise this limit to up to 7 days by enabling extended data retention or up to 365 days by enabling long-term data retention.
  • The maximum size of a data blob (the data payload before Base64-encoding) within one record is 1 MB
  • Each shard can support up to 1000 PUT records per second.
  • There is no upper quota on the number of streams in an account.
  • The default shard quota is 500 shards per AWS account for the following AWS regions: US East (N. Virginia), US West (Oregon), and Europe (Ireland). For all other regions, the default shard quota is 200 shards per AWS account. However this can be increased with Quota increase request.
  • A single shard can ingest up to 1 MB of data per second (including partition keys) or 1,000 records per second for writes. If you need more ingest capacity, you can easily scale up the number of shards in the stream using the AWS Management Console or the UpdateShardCount AP
  • GetRecords can retrieve up to 10 MB of data per call from a single shard, and up to 10,000 records per call. Each call to GetRecords is counted as one read transaction
  • Each shard can support up to five read transactions per second. Each read transaction can provide up to 10,000 records with an upper quota of 10 MB per transaction
  • Each shard can support up to a maximum total data read rate of 2 MB per second via GetRecords. If a call to GetRecords returns 10 MB, subsequent calls made within the next 5 seconds throw an exception.

Note:

AWS keeps on changing limits and quotas, for latest information refer to Kinesis Data Streams Quotas and Limits