Read Data From the Beginning Using Kafka Consumer API

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Introduction

Apache Kafka is an open-source and distributed event stream processing system. It’s basically an event streaming platform that can publish, subscribe to, store, and process a stream of records.

Kafka provides a high-throughput and low-latency platform for real-time data processing. Basically, Kafka implements a publisher-subscriber model where producer applications publish events to Kafka while consumer applications subscribe to these events.

In this tutorial, we’ll learn how we can read data from the beginning of a Kafka topic using the Kafka Consumer API.

2. Setup

Before we begin, let’s first set up the dependencies, initialize the Kafka cluster connection, and publish some messages to Kafka.

Kafka provides a convenient Java client library that we can use to perform various operations on the Kafka cluster.

2.1. Dependencies

Firstly, let’s add the Kafka Clients Java library’s Maven dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.4.0</version>
</dependency>

2.2. Cluster and Topic Initialization

Throughout the guide, we’ll assume that a Kafka cluster is running on our local system with the default configurations.

Secondly, we need to create a Kafka topic that we can use to publish and consume messages. Let’s create a Kafka topic named “baeldung” by referring to our Kafka Topic Creation guide.

Now that we have the Kafka cluster up and running with a topic created, let’s publish some messages to Kafka.

2.3. Publishing Messages

Lastly, let’s publish a few dummy messages to the Kafka topic “baeldung“.

To publish messages, let’s create an instance of KafkaProducer with a basic configuration defined by a Properties instance:

Properties producerProperties = new Properties();
producerProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
producerProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

KafkaProducer<String, String> producer = new KafkaProducer<>(producerProperties);

We use the KafkaProducer.send(ProducerRecord) method to publish messages to the Kafka topic “baeldung“:

for (int i = 1; i <= 10; i++) {
    ProducerRecord<String, String> record = new ProducerRecord<>("baeldung", String.valueOf(i));
    producer.send(record);
}

Here, we published ten messages to our Kafka cluster. We’ll use these to demonstrate our consumer implementations.

3. Consuming Messages From the Beginning

Until now, we have initialized our Kafka cluster and published a few sample messages to the Kafka topic. Next, let’s see how we can read messages from the beginning.

To demonstrate this, we first initialize an instance of KafkaConsumer with a specific set of consumer properties defined by the Properties instance. Then, we use the created KafkaConsumer instance to consume messages and seek back again to the start of the partition offset.

Let’s take a look at each of these steps in detail.

3.1. Consumer Properties

To consume messages from the beginning of a Kafka topic, we create an instance of KafkaConsumer with a randomly generated consumer group id. We do so by setting the “group.id” property of the consumer to a randomly generated UUID:

Properties consumerProperties = new Properties();
consumerProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
consumerProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());

When we generate a new consumer group id for the consumer, the consumer will always belong to a new consumer group identified by the “group.id” property. A new consumer group won’t have any offset associated with it. In such cases, Kafka provides a property “auto.offset.reset” that indicates what should be done when there’s no initial offset in Kafka or if the current offset doesn’t exist anymore on the server.

The “auto.offset.reset” property accepts the following values:

earliest: This value automatically resets the offset to the earliest offset
latest: This value automatically resets the offset to the latest offset
none: This value throws an exception to the consumer if no previous offset is found for the consumer’s group
anything else: If anything else other than the previous three values is set, an exception is thrown to the consumer

Since we want to read from the beginning of the Kafka topic, we set the value of the “auto.offset.reset” property to “earliest”:

consumerProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

Let’s now create an instance of KafkaConsumer using the consumer properties:

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProperties);

We use this KafkaConsumer instance to consume messages from the beginning of the topic.

3.2. Consuming Messages

To consume messages, we first subscribe our consumer to consume messages from the topic “baeldung”:

consumer.subscribe(Arrays.asList("baeldung"));

Next, we use the KafkaConsumer.poll(Duration duration) method to poll for new messages from the topic “baeldung” until the time specified by the Duration parameter:

ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(10));

for (ConsumerRecord<String, String> record : records) {
    logger.info(record.value());
}

With this, we have read all the messages from the beginning of the “baeldung” topic.

Additionally, to reset the existing consumer to read from the beginning of the topic, we use the KafkaConsumer.seekToBeginning(Collection<TopicPartition> partitions) method. This method accepts a collection of TopicPartition and points the offset of the consumer to the beginning of the partition:

consumer.seekToBeginning(consumer.assignment());

Here, we pass the value of KafkaConsumer.assignment() to the seekToBeginning() method. The KafkaConsumer.assignment() method returns the set of partitions currently assigned to the consumer.

Finally, polling the same consumer again for messages now reads all the messages from the beginning of the partition:

ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(10));

for (ConsumerRecord<String, String> record : records) {
    logger.info(record.value());
}

4. Conclusion

In this article, we’ve learned how to read messages from the beginning of a Kafka topic using the Kafka Consumer API.

We first look at how a new consumer can read a message from the beginning of a Kafka topic, along with its implementation. We then saw how an already consuming consumer could seek its offset to read messages from the beginning.

As always, the complete code for all the examples is available over on GitHub.

Read Data From the Beginning Using Kafka Consumer API

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Introduction

2. Setup

2.1. Dependencies

2.2. Cluster and Topic Initialization

2.3. Publishing Messages

3. Consuming Messages From the Beginning

3.1. Consumer Properties

3.2. Consuming Messages

4. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Introduction

2. Setup

2.1. Dependencies

2.2. Cluster and Topic Initialization

2.3. Publishing Messages

3. Consuming Messages From the Beginning

3.1. Consumer Properties

3.2. Consuming Messages

4. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: