Skip to content

Clarify support for manual ack/nack with batch mode + DLQ; potential bug in acknowledge(index) path #3189

@ferblaca

Description

@ferblaca

Describe the issue
We found potential issues when using Spring Cloud Stream Kafka Binder with batch consumers + DLQ + manual acknowledgments.

According to the docs, in batch mode with DLQ enabled, all records from the previous poll are sent to DLQ. That matches current behavior.
However, in manual-ack scenarios we observed:

  1. A likely bug with partial commits using acknowledge(index) + DLQ.
  2. A general improvement opportunity: DLQ handling appears to ignore already acknowledged/committed progress and republishes records that were already processed successfully.

Repro project (public): https://github.com/ferblaca/demoSCSKafka/tree/batch-consumer-dlq

To Reproduce

  1. Clone and open the repro project:
git clone -b batch-consumer-dlq https://github.com/ferblaca/demoSCSKafka.git
cd demoSCSKafka
  1. Start Kafka/Zookeeper/Kafka UI using the provided compose file:
docker compose -f src/main/resources/compose/docker-compose.yml up -d
  1. In src/main/resources/application.yml, set one function definition at a time:

    • batchConsumerAckManual (scenario A)
    • batchConsumerNackManual (scenario B)
  2. Run the application.

  3. Produce 100 records to input-topic-batch (the project includes the producer binding foo-out-0 to that topic, per config).

  4. In the batch consumer logic, force an exception at index 20.

  5. Inspect topic/DLQ behavior in Kafka UI:

    • http://localhost:9090

Scenario A: acknowledge(index) (partial commit) + DLQ

Observed:

  • After calling acknowledgment.acknowledge(19), the internal partial state is set (partial=19).
  • During DLQ handling for the failed batch, acknowledgments continue over the full polled batch.
  • The failure starts when processing approximately record 80/100, because the effective index reaches 100 (80 + partial(19) + 1 = 100), which exceeds the batch bounds.
  • Then it throws:
    IllegalArgumentException: index (100) is out of range (0-99)
  • The batch is retried multiple times.
  • Final DLQ count becomes much larger than input (observed: 810 (10 retries x 81 messages) DLQ records for 100 input records).

Consumer behavior:

  • Partial ack every 20 records: acknowledgment.acknowledge(i)
  • Forced exception at i == 20

Observed:

  • After partial ack (acknowledge(19)), internal partial state is set.
  • During DLQ flow for the failed batch, an index overflow occurs:
    IllegalArgumentException: index (100) is out of range (0-99)
  • Batch is retried multiple times.
  • Final DLQ count is much larger than input (observed: 810 DLQ records for 100 input records).

This looks like a bug in the interaction between partial batch ack state and DLQ processing:

Caused by: java.lang.IllegalArgumentException: index (100) is out of range (0-99)

Scenario B: nack(index, Duration) + DLQ

Same batch/DLQ/manual setup, but using:

  • acknowledgment.nack(i, Duration.ofMillis(100)) on failure at index 20

Observed:

  • No index out-of-range exception in this path.
  • But DLQ still receives records in repeated waves as re-seek/retry progresses.
  • Final DLQ count is also amplified (observed: 280 DLQ records for 100 input records).

Expected behavior

  1. Clarify/document whether manual ack (acknowledge/nack) is officially supported with DLQ in batch mode.
  2. If supported, fix the apparent bug for acknowledge(index) + DLQ (index out-of-range).
  3. Improve DLQ error handling so that records already acknowledged/committed are not resent to DLQ (or provide a configurable strategy), reducing duplicate DLQ traffic and unnecessary reprocessing.

Additional context

  • We understand Kafka processing is at-least-once and consumers must handle duplicates.
  • Even so, current behavior appears to produce avoidable DLQ amplification in manual-ack batch use cases.
  • We also ask for guidance on recommended patterns when using:
    • acknowledge(index)
    • nack(index, Duration)
    • BatchListenerFailedException(index)
      together with DLQ in batch mode.

Version of the framework

  • Spring Cloud: 2025.1.1
  • Spring Boot: 4.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions