Kafka Streams
Kafka Streams Tutorial : In this tutorial, we shall get you introduced to the Streams API for Apache Kafka, how Kafka Streams API has evolved, its architecture, how Streams API is used for building Kafka Applications and many more.
Kafka Streams API is a part of the open-source Apache Kafka project.
How Streams API evolved
If you are curious enough to know how Streams API has evolved for Apache Kafka, then here we are.
Imagine you had a super robust, world-class horizontally scalable messaging system which runs on open source and was so broadly deployed as to be ubiquitous. If you are imagining to build such a system, then you don’t have to work very hard if that system is Apache Kafka.
And in this horizontally scalabale system, if you had deployed Kafka into all of the nodes, you may have worked on producing messages into topics and consuming messages from topics. Producing messages using Kafka Producers, writing messages to Kafka Topics and then Kafka Consumers feeding on these messages from Kafka Topics is lot of hard work and pretty much low level Kafka API you are using. With time there emerged lot of patterns and Kafka Streams API is a notable one. Kafka Streams API provides a higher level of abstraction than just working with messages.
In comparison to low level Kafka Consumer API, Kafka Streams provide a simple way to consume records.
In Kafka Streams API, data is referred to as stream of records instead of messages.
Record
In Kafka Streams API, each record is a key-value pair. Under the hood, they could be byte arrays or anything, but through Kafka Stream, it is a key-value pair.
Stream
Stream is a continuous flow of records being generated at real-time. Steam has no bounds like our universe. It has no definite time at which it started in the past and there is no definite time where it will end in the future. You could expect that there is always a message, you are about to receive.
There is no need to request the source of stream for a record. It happens implicitly. From your point of view, you just receive the records.
Characteristics of Kafka Streams API
- Kafka Streams support stateless and stateful processing. It also supports windowing operations.
- Kafka Streams is a just a library and therefore could be integrated into your application with a single JAR file.
- Kafka Streams is masterless. There is no master and no election nor re-election of master (in case of node failure).
- To provide scalability, fault-tolerance and failover Kafka Streams uses Kafka’s in-built coordination mechanism.
- It is not tied to a specific deployment architecture and hence you can use any modern application deployment framework like Kubernetes etc.
- Kafka Streams is fully integrated with Kafka Security.
Kafka Streams Tutorial
Application Development Environment with Kafka Streams API
You can develop your application with Kafka Streams API in any of your favourite Operating System. It could be Mac, Linux or Windows.
Integrating Kafka Streams API into your application
Kafka Streams is a Java API. You can integrate Kafka Streams just like any other jar file. There is no constraint on how you run your application built with Kafka Streams. You can run it locally on a single node Kafka cluster instance that is running in your development machine or in a cluster at production, just the same code. Kafka Streams is a modern stream processing system and is elastically scalable.
Kafka Streams Use Cases
Some of the Kafka Streams Use Cases are
- Stateless Record Processing – The processing of a record neither depends on a record in the past or future nor the time of processing.
- Stateful Record Processing – A simple example for stateful record processing is word-count program.
- Window Processing – For building analytics or reports for over a period of time or for customers belonging to a region etc, you can use Kafka Streams.
Deployment of Kafka Streams Application
Application with Kafka Streams could be deployed in cloud, containers like dockers, Virtual Machines, Bare-Metal Servers or on computers on the premises.
Microservices with Kafka Streams
You can build microservices containing Kafka Streams API. Like any other microservices you can run multiple instances of your microservice. Kafka Cluster takes care of the distributed computation among the microservices.
Kafka Streams Example Application
A step by step process to build a basic application with Kafka Streams is provided in the following tutorial.
Assumptions
If you are building an application with Kafka Streams, the only assumption is that you are building a distributed system that is elastically scalable and does some stream processing.
Conclusion
In this Apache Kafka Tutorial – Kafka Streams Tutorial, we have learnt about Kafka Streams, its characteristics and assumptions it make, how to integrate Kafka Streams into Java Applications, use cases of Kafka Streams