During one of my professional adventures I’ve observed a lot of errors like the infamous CommitFailedException presented below. There were many of them or similar in log messages generated by one of micro-services.
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
At the same time I’ve observed that the system was not consuming new messages from Kafka topic… but log messages were showing it was consuming constantly. Actually it looked like the micro-service was receiving the same messages from Kafka over and over again.
The exception message is very verbal and it suggests to tweak Kafka client parameters like “session timeout”,
I’ve investigated default values of above-mentioned Kafka client parameters. It turned out the default value for
max.poll.records is 500. And as processing of these messages was rather heavy it was quite likely all 500 messages won’t be done in 5 minutes (the default value for
max.poll.records parameter to value of 5 solved the issue. When using Spring Boot 2 this can be done by setting this application property:
In my opinion the default value of 500 for
max.poll.records parameter is rather crazy for general purpose usage. And this story is a good lesson for Kafka lovers who see it as a general purpose messaging technology. The truth is Kafka client configuration is complex and without good understanding and quite impressive knowledge of its configuration parameters one can easily shots his own feet.