@confluentinc/kafka-javascript
Version:
Node.js bindings for librdkafka
1,023 lines (814 loc) • 103 kB
Markdown
# librdkafka v2.11.0
librdkafka v2.11.0 is a feature release:
* [KIP-1102](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1102%3A+Enable+clients+to+rebootstrap+based+on+timeout+or+error+code) Enable clients to rebootstrap based on timeout or error code (#4981).
* [KIP-1139](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1139%3A+Add+support+for+OAuth+jwt-bearer+grant+type) Add support for OAuth jwt-bearer grant type (#4978).
* Fix for poll ratio calculation in case the queues are forwarded (#5017).
* Fix data race when buffer queues are being reset instead of being
initialized (#4718).
* Features BROKER_BALANCED_CONSUMER and SASL_GSSAPI don't depend on
JoinGroup v0 anymore, missing in AK 4.0 and CP 8.0 (#5131).
* Improve HTTPS CA certificates configuration by probing several paths
when OpenSSL is statically linked and providing a way to customize their location
or value (#).
## Fixes
### General fixes
* Issues: #4522.
A data race happened when emptying buffers of a failing broker, in its thread,
with the statistics callback in main thread gathering the buffer counts.
Solved by resetting the atomic counters instead of initializing them.
Happening since 1.x (#4718).
* Issues: #4948
Features BROKER_BALANCED_CONSUMER and SASL_GSSAPI don't depend on
JoinGroup v0 anymore, missing in AK 4.0 and CP 8.0. This PR partially
fixes the linked issue, a complete fix for all features will follow.
Rest of fixes are necessary only for a subsequent Apache Kafka major
version (e.g. AK 5.x).
Happening since 1.x (#5131).
### Telemetry fixes
* Issues: #5109
Fix for poll ratio calculation in case the queues are forwarded.
Poll ratio is now calculated per-queue instead of per-instance and
it allows to avoid calculation problems linked to using the same
field.
Happens since 2.6.0 (#5017).
# librdkafka v2.10.1
librdkafka v2.10.1 is a maintenance release:
* Fix to add locks when updating the metadata cache for the consumer
after no broker connection is available (@marcin-krystianc, #5066).
* Fix to the re-bootstrap case when `bootstrap.servers` is `NULL` and
brokers were added manually through `rd_kafka_brokers_add` (#5067).
* Fix an issue where the first message to any topic produced via `producev` or
`produceva` was delivered late (by up to 1 second) (#5032).
* Fix for a loop of re-bootstrap sequences in case the client reaches the
`all brokers down` state (#5086).
* Fix for frequent disconnections on push telemetry requests
with particular metric configurations (#4912).
* Avoid copy outside boundaries when reading metric names in telemetry
subscription (#5105)
* Metrics aren't duplicated when multiple prefixes match them (#5104)
## Fixes
### General fixes
* Issues: #5088.
Fix for a loop of re-bootstrap sequences in case the client reaches the
`all brokers down` state. The client continues to select the
bootstrap brokers given they have no connection attempt and doesn't
re-connect to the learned ones. In case it happens a broker restart
can break the loop for the clients using the affected version.
Fixed by giving a higher chance to connect to the learned brokers
even if there are new ones that never tried to connect.
Happens since 2.10.0 (#5086).
* Issues: #5057.
Fix to the re-bootstrap case when `bootstrap.servers` is `NULL` and
brokers were added manually through `rd_kafka_brokers_add`.
Avoids a segmentation fault in this case.
Happens since 2.10.0 (#5067).
### Producer fixes
* In case of `producev` or `produceva`, the producer did not enqueue a leader
query metadata request immediately, and rather, waited for the 1 second
timer to kick in. This could cause delays in the sending of the first message
by up to 1 second.
Happens since 1.x (#5032).
### Consumer fixes
* Issues: #5051.
Fix to add locks when updating the metadata cache for the consumer.
It can cause memory corruption or use-after-free in case
there's no broker connection and the consumer
group metadata needs to be updated.
Happens since 2.10.0 (#5066).
### Telemetry fixes
* Issues: #5106.
Fix for frequent disconnections on push telemetry requests
with particular metric configurations.
A `NULL` payload is sent in a push telemetry request when
an empty one is needed. This causes disconnections every time the
push is sent, only when metrics are requested and
some metrics are matching the producer but none the consumer
or the other way around.
Happens since 2.5.0 (#4912).
* Issues: #5102.
Avoid copy outside boundaries when reading metric names in telemetry
subscription. It can cause that some metrics aren't matched.
Happens since 2.5.0 (#5105).
* Issues: #5103.
Telemetry metrics aren't duplicated when multiple prefixes match them.
Fixed by keeping track of the metrics that already matched.
Happens since 2.5.0 (#5104).
# librdkafka v2.10.0
librdkafka v2.10.0 is a feature release:
> [!WARNING] it's suggested to upgrade to 2.10.1 or later
> because of the possibly critical bug #5088
## [KIP-848](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol) – Now in **Preview**
- [KIP-848](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol) has transitioned from *Early Access* to *Preview*.
- Added support for **regex-based subscriptions**.
- Implemented client-side member ID generation as per [KIP-1082](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Require+Client-Generated+IDs+over+the+ConsumerGroupHeartbeat+RPC).
- `rd_kafka_DescribeConsumerGroups()` now supports KIP-848-style `consumer` groups. Two new fields have been added:
- **Group type** – Indicates whether the group is `classic` or `consumer`.
- **Target assignment** – Applicable only to `consumer` protocol groups (defaults to `NULL`).
- Group configuration is now supported in `AlterConfigs`, `IncrementalAlterConfigs`, and `DescribeConfigs`. ([#4939](https://github.com/confluentinc/librdkafka/pull/4939))
- Added **Topic Authorization Error** support in the `ConsumerGroupHeartbeat` response.
- Removed usage of the `partition.assignment.strategy` property for the `consumer` group protocol. An error will be raised if this is set with `group.protocol=consumer`.
- Deprecated and disallowed the following properties for the `consumer` group protocol:
- `session.timeout.ms`
- `heartbeat.interval.ms`
- `group.protocol.type`
Attempting to set any of these will result in an error.
- Enhanced handling for `subscribe()` and `unsubscribe()` edge cases.
> [!Note]
> The [KIP-848](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol) consumer is currently in **Preview** and should not be used in production environments. Implementation is feature complete but contract could have minor changes before General Availability.
## Upgrade considerations
Starting from this version, brokers not reported in Metadata RPC call are
removed along with their threads. Brokers and their threads are added back
when they appear in a Metadata RPC response again. When no brokers are left
or they're not reachable, the client will start a re-bootstrap sequence
by default. `metadata.recovery.strategy` controls this,
which defaults to `rebootstrap`.
Setting `metadata.recovery.strategy` to `none` avoids any re-bootstrapping and
leaves only the broker received in last successful metadata response.
## Enhancements and Fixes
* [KIP-899](https://cwiki.apache.org/confluence/display/KAFKA/KIP-899%3A+Allow+producer+and+consumer+clients+to+rebootstrap) Allow producer and consumer clients to rebootstrap
* Identify brokers only by broker id (#4557, @mfleming)
* Remove unavailable brokers and their thread (#4557, @mfleming)
* Commits during a cooperative incremental rebalance aren't causing
an assignment lost if the generation id was bumped in between (#4908).
* Fix for librdkafka yielding before timeouts had been reached (#4970)
* Removed a 500ms latency when a consumer partition switches to a different
leader (#4970)
* The mock cluster implementation removes brokers from Metadata response
when they're not available, this simulates better the actual behavior of
a cluster that is using KRaft (#4970).
* Doesn't remove topics from cache on temporary Metadata errors but only
on metadata cache expiry (#4970).
* Doesn't mark the topic as unknown if it had been marked as existent earlier
and `topic.metadata.propagation.max.ms` hasn't passed still (@marcin-krystianc, #4970).
* Doesn't update partition leaders if the topic in metadata
response has errors (#4970).
* Only topic authorization errors in a metadata response are considered
permanent and are returned to the user (#4970).
* The function `rd_kafka_offsets_for_times` refreshes leader information
if the error requires it, allowing it to succeed on
subsequent manual retries (#4970).
* Deprecated `api.version.request`, `api.version.fallback.ms` and
`broker.version.fallback` configuration properties (#4970).
* When consumer is closed before destroying the client, the operations queue
isn't purged anymore as it contains operations
unrelated to the consumer group (#4970).
* When making multiple changes to the consumer subscription in a short time,
no unknown topic error is returned for topics that are in the new subscription but weren't in previous one (#4970).
* Prevent metadata cache corruption when topic id changes
(@kwdubuc, @marcin-krystianc, @GerKr, #4970).
* Fix for the case where a metadata refresh enqueued on an unreachable broker
prevents refreshing the controller or the coordinator until that broker
becomes reachable again (#4970).
* Remove a one second wait after a partition fetch is restarted following a
leader change and offset validation (#4970).
* Fix the Nagle algorithm (TCP_NODELAY) on broker sockets to not be enabled
by default (#4986).
## Fixes
### General fixes
* Issues: #4212
Identify brokers only by broker id, as happens in Java,
avoid to find the broker with same hostname and use the same thread
and connection.
Happens since 1.x (#4557, @mfleming).
* Issues: #4557
Remove brokers not reported in a metadata call, along with their threads.
Avoids that unavailable brokers are selected for a new connection when
there's no one available. We cannot tell if a broker was removed
temporarily or permanently so we always remove it and it'll be added back when
it becomes available again.
Happens since 1.x (#4557, @mfleming).
* Issues: #4970
librdkafka code using `cnd_timedwait` was yielding before a timeout occurred
without the condition being fulfilled because of spurious wake-ups.
Solved by verifying with a monotonic clock that the expected point in time
was reached and calling the function again if needed.
Happens since 1.x (#4970).
* Issues: #4970
Doesn't remove topics from cache on temporary Metadata errors but only
on metadata cache expiry. It allows the client to continue working
in case of temporary problems to the Kafka metadata plane.
Happens since 1.x (#4970).
* Issues: #4970
Doesn't mark the topic as unknown if it had been marked as existent earlier
and `topic.metadata.propagation.max.ms` hasn't passed still. It achieves
this property expected effect even if a different broker had
previously reported the topic as existent.
Happens since 1.x (@marcin-krystianc, #4970).
* Issues: #4907
Doesn't update partition leaders if the topic in metadata
response has errors. It's in line with what Java client does and allows
to avoid segmentation faults for unknown partitions.
Happens since 1.x (#4970).
* Issues: #4970
Only topic authorization errors in a metadata response are considered
permanent and are returned to the user. It's in line with what Java client
does and avoids returning to the user an error that wasn't meant to be
permanent.
Happens since 1.x (#4970).
* Issues: #4964, #4778
Prevent metadata cache corruption when topic id for the same topic name
changes. Solved by correctly removing the entry with the old topic id from metadata cache
to prevent subsequent use-after-free.
Happens since 2.4.0 (@kwdubuc, @marcin-krystianc, @GerKr, #4970).
* Issues: #4970
Fix for the case where a metadata refresh enqueued on an unreachable broker
prevents refreshing the controller or the coordinator until that broker
becomes reachable again. Given the request continues to be retried on that
broker, the counter for refreshing complete broker metadata doesn't reach
zero and prevents the client from obtaining the new controller or group or transactional coordinator.
It causes a series of debug messages like:
"Skipping metadata request: ... full request already in-transit", until
the broker the request is enqueued on is up again.
Solved by not retrying these kinds of metadata requests.
Happens since 1.x (#4970).
* The Nagle algorithm (TCP_NODELAY) is now disabled by default. It caused a
large increase in latency for some use cases, for example, when using an
SSL connection.
For efficient batching, the application should use `linger.ms`,
`batch.size` etc.
Happens since: 0.x (#4986).
### Consumer fixes
* Issues: #4059
Commits during a cooperative incremental rebalance could cause an
assignment lost if the generation id was bumped by a second join
group request.
Solved by not rejoining the group in case an illegal generation error happens
during a rebalance.
Happening since v1.6.0 (#4908)
* Issues: #4970
When switching to a different leader a consumer could wait 500ms
(`fetch.error.backoff.ms`) before starting to fetch again. The fetch backoff wasn't reset when joining the new broker.
Solved by resetting it, given it's not needed to backoff
the first fetch on a different node. This way faster leader switches are
possible.
Happens since 1.x (#4970).
* Issues: #4970
The function `rd_kafka_offsets_for_times` refreshes leader information
if the error requires it, allowing it to succeed on
subsequent manual retries. Similar to the fix done in 2.3.0 in
`rd_kafka_query_watermark_offsets`. Additionally, the partition
current leader epoch is taken from metadata cache instead of
from passed partitions.
Happens since 1.x (#4970).
* Issues: #4970
When consumer is closed before destroying the client, the operations queue
isn't purged anymore as it contains operations
unrelated to the consumer group.
Happens since 1.x (#4970).
* Issues: #4970
When making multiple changes to the consumer subscription in a short time,
no unknown topic error is returned for topics that are in the new subscription
but weren't in previous one. This was due to the metadata request relative
to previous subscription.
Happens since 1.x (#4970).
* Issues: #4970
Remove a one second wait after a partition fetch is restarted following a
leader change and offset validation. This is done by resetting the fetch
error backoff and waking up the delegated broker if present.
Happens since 2.1.0 (#4970).
*Note: there was no v2.9.0 librdkafka release,
it was a dependent clients release only*
# librdkafka v2.8.0
librdkafka v2.8.0 is a maintenance release:
* Socket options are now all set before connection (#4893).
* Client certificate chain is now sent when using `ssl.certificate.pem`
or `ssl_certificate` or `ssl.keystore.location` (#4894).
* Avoid sending client certificates whose chain doesn't match with broker
trusted root certificates (#4900).
* Fixes to allow to migrate partitions to leaders with same leader epoch,
or NULL leader epoch (#4901).
* Support versions of OpenSSL without the ENGINE component (Chris Novakovic, #3535
and @remicollet, #4911).
## Fixes
### General fixes
* Socket options are now all set before connection, as [documentation](https://man7.org/linux/man-pages/man7/tcp.7.html)
says it's needed for socket buffers to take effect, even if in some
cases they could have effect even after connection.
Happening since v0.9.0 (#4893).
* Issues: #3225.
Client certificate chain is now sent when using `ssl.certificate.pem`
or `ssl_certificate` or `ssl.keystore.location`.
Without that, broker must explicitly add any intermediate certification
authority certificate to its truststore to be able to accept client
certificate.
Happens since: 1.x (#4894).
### Consumer fixes
* Issues: #4796.
Fix to allow to migrate partitions to leaders with NULL leader epoch.
NULL leader epoch can happen during a cluster roll with an upgrade to a
version supporting KIP-320.
Happening since v2.1.0 (#4901).
* Issues: #4804.
Fix to allow to migrate partitions to leaders with same leader epoch.
Same leader epoch can happen when partition is
temporarily migrated to the internal broker (#4804), or if broker implementation
never bumps it, as it's not needed to validate the offsets.
Happening since v2.4.0 (#4901).
*Note: there was no v2.7.0 librdkafka release*
# librdkafka v2.6.1
librdkafka v2.6.1 is a maintenance release:
* Fix for a Fetch regression when connecting to Apache Kafka < 2.7 (#4871).
* Fix for an infinite loop happening with cooperative-sticky assignor
under some particular conditions (#4800).
* Fix for retrieving offset commit metadata when it contains
zeros and configured with `strndup` (#4876)
* Fix for a loop of ListOffset requests, happening in a Fetch From Follower
scenario, if such request is made to the follower (#4616, #4754, @kphelps).
* Fix to remove fetch queue messages that blocked the destroy of rdkafka
instances (#4724)
* Upgrade Linux dependencies: OpenSSL 3.0.15, CURL 8.10.1 (#4875).
* Upgrade Windows dependencies: MSVC runtime to 14.40.338160.0,
zstd 1.5.6, zlib 1.3.1, OpenSSL 3.3.2, CURL 8.10.1 (#4872).
* SASL/SCRAM authentication fix: avoid concatenating
client side nonce once more, as it's already prepended in server sent nonce (#4895).
* Allow retrying for status code 429 ('Too Many Requests') in HTTP requests for
OAUTHBEARER OIDC (#4902).
## Fixes
### General fixes
* SASL/SCRAM authentication fix: avoid concatenating
client side nonce once more, as it's already prepended in
server sent nonce.
librdkafka was incorrectly concatenating the client side nonce again, leading to [this fix](https://github.com/apache/kafka/commit/0a004562b8475d48a9961d6dab3a6aa24021c47f) being made on AK side, released with 3.8.1, with `endsWith` instead of `equals`.
Happening since v0.0.99 (#4895).
### Consumer fixes
* Issues: #4870
Fix for a Fetch regression when connecting to Apache Kafka < 2.7, causing
fetches to fail.
Happening since v2.6.0 (#4871)
* Issues: #4783.
A consumer configured with the `cooperative-sticky` partition assignment
strategy could get stuck in an infinite loop, with corresponding spike of
main thread CPU usage.
That happened with some particular orders of members and potential
assignable partitions.
Solved by removing the infinite loop cause.
Happening since: 1.6.0 (#4800).
* Issues: #4649.
When retrieving offset metadata, if the binary value contained zeros
and librdkafka was configured with `strndup`, part of
the buffer after first zero contained uninitialized data
instead of rest of metadata. Solved by avoiding to use
`strndup` for copying metadata.
Happening since: 0.9.0 (#4876).
* Issues: #4616
When an out of range on a follower caused an offset reset, the corresponding
ListOffsets request is made to the follower, causing a repeated
"Not leader for partition" error. Fixed by sending the request always
to the leader.
Happening since 1.5.0 (tested version) or previous ones (#4616, #4754, @kphelps).
* Issues:
Fix to remove fetch queue messages that blocked the destroy of rdkafka
instances. Circular dependencies from a partition fetch queue message to
the same partition blocked the destroy of an instance, that happened
in case the partition was removed from the cluster while it was being
consumed. Solved by purging internal partition queue, after being stopped
and removed, to allow reference count to reach zero and trigger a destroy.
Happening since 2.0.2 (#4724).
# librdkafka v2.6.0
librdkafka v2.6.0 is a feature release:
* [KIP-460](https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC) Admin Leader Election RPC (#4845)
* [KIP-714] Complete consumer metrics support (#4808).
* [KIP-714] Produce latency average and maximum metrics support for parity with Java client (#4847).
* [KIP-848] ListConsumerGroups Admin API now has an optional filter to return only groups
of given types.
* Added Transactional id resource type for ACL operations (@JohnPreston, #4856).
* Fix for permanent fetch errors when using a newer Fetch RPC version with an older
inter broker protocol (#4806).
## Fixes
### Consumer fixes
* Issues: #4806
Fix for permanent fetch errors when brokers support a Fetch RPC version greater than 12
but cluster is configured to use an inter broker protocol that is less than 2.8.
In this case returned topic ids are zero valued and Fetch has to fall back
to version 12, using topic names.
Happening since v2.5.0 (#4806)
# librdkafka v2.5.3
librdkafka v2.5.3 is a feature release.
* Fix an assert being triggered during push telemetry call when no metrics matched on the client side. (#4826)
## Fixes
### Telemetry fixes
* Issue: #4833
Fix a regression introduced with [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) support in which an assert is triggered during **PushTelemetry** call. This happens when no metric is matched on the client side among those requested by broker subscription.
Happening since 2.5.0 (#4826).
*Note: there were no v2.5.1 and v2.5.2 librdkafka releases*
# librdkafka v2.5.0
> [!WARNING]
This version has introduced a regression in which an assert is triggered during **PushTelemetry** call. This happens when no metric is matched on the client side among those requested by broker subscription.
>
> You won't face any problem if:
> * Broker doesn't support [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability).
> * [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) feature is disabled on the broker side.
> * [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) feature is disabled on the client side. This is enabled by default. Set configuration `enable.metrics.push` to `false`.
> * If [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) is enabled on the broker side and there is no subscription configured there.
> * If [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) is enabled on the broker side with subscriptions that match the [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) metrics defined on the client.
>
> Having said this, we strongly recommend using `v2.5.3` and above to not face this regression at all.
librdkafka v2.5.0 is a feature release.
* [KIP-951](https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client)
Leader discovery optimisations for the client (#4756, #4767).
* Fix segfault when using long client id because of erased segment when using flexver. (#4689)
* Fix for an idempotent producer error, with a message batch not reconstructed
identically when retried (#4750)
* Removed support for CentOS 6 and CentOS 7 (#4775).
* [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) Client
metrics and observability (#4721).
## Upgrade considerations
* CentOS 6 and CentOS 7 support was removed as they reached EOL
and security patches aren't publicly available anymore.
ABI compatibility from CentOS 8 on is maintained through pypa/manylinux,
AlmaLinux based.
See also [Confluent supported OSs page](https://docs.confluent.io/platform/current/installation/versions-interoperability.html#operating-systems) (#4775).
## Enhancements
* Update bundled lz4 (used when `./configure --disable-lz4-ext`) to
[v1.9.4](https://github.com/lz4/lz4/releases/tag/v1.9.4), which contains
bugfixes and performance improvements (#4726).
* [KIP-951](https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client)
With this KIP leader updates are received through Produce and Fetch responses
in case of errors corresponding to leader changes and a partition migration
happens before refreshing the metadata cache (#4756, #4767).
## Fixes
### General fixes
* Issues: [confluentinc/confluent-kafka-dotnet#2084](https://github.com/confluentinc/confluent-kafka-dotnet/issues/2084)
Fix segfault when a segment is erased and more data is written to the buffer.
Happens since 1.x when a portion of the buffer (segment) is erased for flexver or compression.
More likely to happen since 2.1.0, because of the upgrades to flexver, with certain string sizes like a long client id (#4689).
### Idempotent producer fixes
* Issues: #4736
Fix for an idempotent producer error, with a message batch not reconstructed
identically when retried. Caused the error message "Local: Inconsistent state: Unable to reconstruct MessageSet".
Happening on large batches. Solved by using the same backoff baseline for all messages
in the batch.
Happens since 2.2.0 (#4750).
# librdkafka v2.4.0
librdkafka v2.4.0 is a feature release:
* [KIP-848](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol): The Next Generation of the Consumer Rebalance Protocol.
**Early Access**: This should be used only for evaluation and must not be used in production. Features and contract of this KIP might change in future (#4610).
* [KIP-467](https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records): Augment ProduceResponse error messaging for specific culprit records (#4583).
* [KIP-516](https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers)
Continue partial implementation by adding a metadata cache by topic id
and updating the topic id corresponding to the partition name (#4676)
* Upgrade OpenSSL to v3.0.12 (while building from source) with various security fixes,
check the [release notes](https://www.openssl.org/news/cl30.txt).
* Integration tests can be started in KRaft mode and run against any
GitHub Kafka branch other than the released versions.
* Fix pipeline inclusion of static binaries (#4666)
* Fix to main loop timeout calculation leading to a tight loop for a
max period of 1 ms (#4671).
* Fixed a bug causing duplicate message consumption from a stale
fetch start offset in some particular cases (#4636)
* Fix to metadata cache expiration on full metadata refresh (#4677).
* Fix for a wrong error returned on full metadata refresh before joining
a consumer group (#4678).
* Fix to metadata refresh interruption (#4679).
* Fix for an undesired partition migration with stale leader epoch (#4680).
* Fix hang in cooperative consumer mode if an assignment is processed
while closing the consumer (#4528).
* Upgrade OpenSSL to v3.0.13 (while building from source) with various security fixes,
check the [release notes](https://www.openssl.org/news/cl30.txt)
(@janjwerner-confluent, #4690).
* Upgrade zstd to v1.5.6, zlib to v1.3.1, and curl to v8.8.0 (@janjwerner-confluent, #4690).
## Upgrade considerations
* With KIP 467, INVALID_MSG (Java: CorruptRecordExpection) will
be retried automatically. INVALID_RECORD (Java: InvalidRecordException) instead
is not retriable and will be set only to the records that caused the
error. Rest of records in the batch will fail with the new error code
_INVALID_DIFFERENT_RECORD (Java: KafkaException) and can be retried manually,
depending on the application logic (#4583).
## Early Access
### [KIP-848](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol): The Next Generation of the Consumer Rebalance Protocol
* With this new protocol the role of the Group Leader (a member) is removed and
the assignment is calculated by the Group Coordinator (a broker) and sent
to each member through heartbeats.
The feature is still _not production-ready_.
It's possible to try it in a non-production enviroment.
A [guide](INTRODUCTION.md#next-generation-of-the-consumer-group-protocol-kip-848) is available
with considerations and steps to follow to test it (#4610).
## Fixes
### General fixes
* Issues: [confluentinc/confluent-kafka-go#981](https://github.com/confluentinc/confluent-kafka-go/issues/981).
In librdkafka release pipeline a static build containing libsasl2
could be chosen instead of the alternative one without it.
That caused the libsasl2 dependency to be required in confluent-kafka-go
v2.1.0-linux-musl-arm64 and v2.3.0-linux-musl-arm64.
Solved by correctly excluding the binary configured with that library,
when targeting a static build.
Happening since v2.0.2, with specified platforms,
when using static binaries (#4666).
* Issues: #4684.
When the main thread loop was awakened less than 1 ms
before the expiration of a timeout, it was serving with a zero timeout,
leading to increased CPU usage until the timeout was reached.
Happening since 1.x.
* Issues: #4685.
Metadata cache was cleared on full metadata refresh, leading to unnecessary
refreshes and occasional `UNKNOWN_TOPIC_OR_PART` errors. Solved by updating
cache for existing or hinted entries instead of clearing them.
Happening since 2.1.0 (#4677).
* Issues: #4589.
A metadata call before member joins consumer group,
could lead to an `UNKNOWN_TOPIC_OR_PART` error. Solved by updating
the consumer group following a metadata refresh only in safe states.
Happening since 2.1.0 (#4678).
* Issues: #4577.
Metadata refreshes without partition leader change could lead to a loop of
metadata calls at fixed intervals. Solved by stopping metadata refresh when
all existing metadata is non-stale. Happening since 2.3.0 (#4679).
* Issues: #4687.
A partition migration could happen, using stale metadata, when the partition
was undergoing a validation and being retried because of an error.
Solved by doing a partition migration only with a non-stale leader epoch.
Happening since 2.1.0 (#4680).
### Consumer fixes
* Issues: #4686.
In case of subscription change with a consumer using the cooperative assignor
it could resume fetching from a previous position.
That could also happen if resuming a partition that wasn't paused.
Fixed by ensuring that a resume operation is completely a no-op when
the partition isn't paused.
Happening since 1.x (#4636).
* Issues: #4527.
While using the cooperative assignor, given an assignment is received while closing the consumer
it's possible that it gets stuck in state WAIT_ASSIGN_CALL, while the method is converted to
a full unassign. Solved by changing state from WAIT_ASSIGN_CALL to WAIT_UNASSIGN_CALL
while doing this conversion.
Happening since 1.x (#4528).
# librdkafka v2.3.0
librdkafka v2.3.0 is a feature release:
* [KIP-516](https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers)
Partial support of topic identifiers. Topic identifiers in metadata response
available through the new `rd_kafka_DescribeTopics` function (#4300, #4451).
* [KIP-117](https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations) Add support for AdminAPI `DescribeCluster()` and `DescribeTopics()`
(#4240, @jainruchir).
* [KIP-430](https://cwiki.apache.org/confluence/display/KAFKA/KIP-430+-+Return+Authorized+Operations+in+Describe+Responses):
Return authorized operations in Describe Responses.
(#4240, @jainruchir).
* [KIP-580](https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients): Added Exponential Backoff mechanism for
retriable requests with `retry.backoff.ms` as minimum backoff and `retry.backoff.max.ms` as the
maximum backoff, with 20% jitter (#4422).
* [KIP-396](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484): completed the implementation with
the addition of ListOffsets (#4225).
* Fixed ListConsumerGroupOffsets not fetching offsets for all the topics in a group with Apache Kafka version below 2.4.0.
* Add missing destroy that leads to leaking partition structure memory when there
are partition leader changes and a stale leader epoch is received (#4429).
* Fix a segmentation fault when closing a consumer using the
cooperative-sticky assignor before the first assignment (#4381).
* Fix for insufficient buffer allocation when allocating rack information (@wolfchimneyrock, #4449).
* Fix for infinite loop of OffsetForLeaderEpoch requests on quick leader changes. (#4433).
* Fix to add leader epoch to control messages, to make sure they're stored
for committing even without a subsequent fetch message (#4434).
* Fix for stored offsets not being committed if they lacked the leader epoch (#4442).
* Upgrade OpenSSL to v3.0.11 (while building from source) with various security fixes,
check the [release notes](https://www.openssl.org/news/cl30.txt)
(#4454, started by @migarc1).
* Fix to ensure permanent errors during offset validation continue being retried and
don't cause an offset reset (#4447).
* Fix to ensure max.poll.interval.ms is reset when rd_kafka_poll is called with
consume_cb (#4431).
* Fix for idempotent producer fatal errors, triggered after a possibly persisted message state (#4438).
* Fix `rd_kafka_query_watermark_offsets` continuing beyond timeout expiry (#4460).
* Fix `rd_kafka_query_watermark_offsets` not refreshing the partition leader
after a leader change and subsequent `NOT_LEADER_OR_FOLLOWER` error (#4225).
## Upgrade considerations
* `retry.backoff.ms`:
If it is set greater than `retry.backoff.max.ms` which has the default value of 1000 ms then it is assumes the value of `retry.backoff.max.ms`.
To change this behaviour make sure that `retry.backoff.ms` is always less than `retry.backoff.max.ms`.
If equal then the backoff will be linear instead of exponential.
* `topic.metadata.refresh.fast.interval.ms`:
If it is set greater than `retry.backoff.max.ms` which has the default value of 1000 ms then it is assumes the value of `retry.backoff.max.ms`.
To change this behaviour make sure that `topic.metadata.refresh.fast.interval.ms` is always less than `retry.backoff.max.ms`.
If equal then the backoff will be linear instead of exponential.
## Fixes
### General fixes
* An assertion failed with insufficient buffer size when allocating
rack information on 32bit architectures.
Solved by aligning all allocations to the maximum allowed word size (#4449).
* The timeout for `rd_kafka_query_watermark_offsets` was not enforced after
making the necessary ListOffsets requests, and thus, it never timed out in
case of broker/network issues. Fixed by setting an absolute timeout (#4460).
### Idempotent producer fixes
* After a possibly persisted error, such as a disconnection or a timeout, next expected sequence
used to increase, leading to a fatal error if the message wasn't persisted and
the second one in queue failed with an `OUT_OF_ORDER_SEQUENCE_NUMBER`.
The error could contain the message "sequence desynchronization" with
just one possibly persisted error or "rewound sequence number" in case of
multiple errored messages.
Solved by treating the possible persisted message as _not_ persisted,
and expecting a `DUPLICATE_SEQUENCE_NUMBER` error in case it was or
`NO_ERROR` in case it wasn't, in both cases the message will be considered
delivered (#4438).
### Consumer fixes
* Stored offsets were excluded from the commit if the leader epoch was
less than committed epoch, as it's possible if leader epoch is the default -1.
This didn't happen in Python, Go and .NET bindings when stored position was
taken from the message.
Solved by checking only that the stored offset is greater
than committed one, if either stored or committed leader epoch is -1 (#4442).
* If an OffsetForLeaderEpoch request was being retried, and the leader changed
while the retry was in-flight, an infinite loop of requests was triggered,
because we weren't updating the leader epoch correctly.
Fixed by updating the leader epoch before sending the request (#4433).
* During offset validation a permanent error like host resolution failure
would cause an offset reset.
This isn't what's expected or what the Java implementation does.
Solved by retrying even in case of permanent errors (#4447).
* If using `rd_kafka_poll_set_consumer`, along with a consume callback, and then
calling `rd_kafka_poll` to service the callbacks, would not reset
`max.poll.interval.ms.` This was because we were only checking `rk_rep` for
consumer messages, while the method to service the queue internally also
services the queue forwarded to from `rk_rep`, which is `rkcg_q`.
Solved by moving the `max.poll.interval.ms` check into `rd_kafka_q_serve` (#4431).
* After a leader change a `rd_kafka_query_watermark_offsets` call would continue
trying to call ListOffsets on the old leader, if the topic wasn't included in
the subscription set, so it started querying the new leader only after
`topic.metadata.refresh.interval.ms` (#4225).
# librdkafka v2.2.0
librdkafka v2.2.0 is a feature release:
* Fix a segmentation fault when subscribing to non-existent topics and
using the consume batch functions (#4273).
* Store offset commit metadata in `rd_kafka_offsets_store` (@mathispesch, #4084).
* Fix a bug that happens when skipping tags, causing buffer underflow in
MetadataResponse (#4278).
* Fix a bug where topic leader is not refreshed in the same metadata call even if the leader is
present.
* [KIP-881](https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers):
Add support for rack-aware partition assignment for consumers
(#4184, #4291, #4252).
* Fix several bugs with sticky assignor in case of partition ownership
changing between members of the consumer group (#4252).
* [KIP-368](https://cwiki.apache.org/confluence/display/KAFKA/KIP-368%3A+Allow+SASL+Connections+to+Periodically+Re-Authenticate):
Allow SASL Connections to Periodically Re-Authenticate
(#4301, started by @vctoriawu).
* Avoid treating an OpenSSL error as a permanent error and treat unclean SSL
closes as normal ones (#4294).
* Added `fetch.queue.backoff.ms` to the consumer to control how long
the consumer backs off next fetch attempt. (@bitemyapp, @edenhill, #2879)
* [KIP-235](https://cwiki.apache.org/confluence/display/KAFKA/KIP-235%3A+Add+DNS+alias+support+for+secured+connection):
Add DNS alias support for secured connection (#4292).
* [KIP-339](https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+IncrementalAlterConfigs+API):
IncrementalAlterConfigs API (started by @PrasanthV454, #4110).
* [KIP-554](https://cwiki.apache.org/confluence/display/KAFKA/KIP-554%3A+Add+Broker-side+SCRAM+Config+API): Add Broker-side SCRAM Config API (#4241).
## Enhancements
* Added `fetch.queue.backoff.ms` to the consumer to control how long
the consumer backs off next fetch attempt. When the pre-fetch queue
has exceeded its queuing thresholds: `queued.min.messages` and
`queued.max.messages.kbytes` it backs off for 1 seconds.
If those parameters have to be set too high to hold 1 s of data,
this new parameter allows to back off the fetch earlier, reducing memory
requirements.
## Fixes
### General fixes
* Fix a bug that happens when skipping tags, causing buffer underflow in
MetadataResponse. This is triggered since RPC version 9 (v2.1.0),
when using Confluent Platform, only when racks are set,
observers are activated and there is more than one partition.
Fixed by skipping the correct amount of bytes when tags are received.
* Avoid treating an OpenSSL error as a permanent error and treat unclean SSL
closes as normal ones. When SSL connections are closed without `close_notify`,
in OpenSSL 3.x a new type of error is set and it was interpreted as permanent
in librdkafka. It can cause a different issue depending on the RPC.
If received when waiting for OffsetForLeaderEpoch response, it triggers
an offset reset following the configured policy.
Solved by treating SSL errors as transport errors and
by setting an OpenSSL flag that allows to treat unclean SSL closes as normal
ones. These types of errors can happen it the other side doesn't support `close_notify` or if there's a TCP connection reset.
### Consumer fixes
* In case of multiple owners of a partition with different generations, the
sticky assignor would pick the earliest (lowest generation) member as the
current owner, which would lead to stickiness violations. Fixed by
choosing the latest (highest generation) member.
* In case where the same partition is owned by two members with the same
generation, it indicates an issue. The sticky assignor had some code to
handle this, but it was non-functional, and did not have parity with the
Java assignor. Fixed by invalidating any such partition from the current
assignment completely.
# librdkafka v2.1.1
librdkafka v2.1.1 is a maintenance release:
* Avoid duplicate messages when a fetch response is received
in the middle of an offset validation request (#4261).
* Fix segmentation fault when subscribing to a non-existent topic and
calling `rd_kafka_message_leader_epoch()` on the polled `rkmessage` (#4245).
* Fix a segmentation fault when fetching from follower and the partition lease
expires while waiting for the result of a list offsets operation (#4254).
* Fix documentation for the admin request timeout, incorrectly stating -1 for infinite
timeout. That timeout can't be infinite.
* Fix CMake pkg-config cURL require and use
pkg-config `Requires.private` field (@FantasqueX, @stertingen, #4180).
* Fixes certain cases where polling would not keep the consumer
in the group or make it rejoin it (#4256).
* Fix to the C++ set_leader_epoch method of TopicPartitionImpl,
that wasn't storing the passed value (@pavel-pimenov, #4267).
## Fixes
### Consumer fixes
* Duplicate messages can be emitted when a fetch response is received
in the middle of an offset validation request. Solved by avoiding
a restart from last application offset when offset validation succeeds.
* When fetching from follower, if the partition lease expires after 5 minutes,
and a list offsets operation was requested to retrieve the earliest
or latest offset, it resulted in segmentation fault. This was fixed by
allowing threads different from the main one to call
the `rd_kafka_toppar_set_fetch_state` function, given they hold
the lock on the `rktp`.
* In v2.1.0, a bug was fixed which caused polling any queue to reset the
`max.poll.interval.ms`. Only certain functions were made to reset the timer,
but it is possible for the user to obtain the queue with messages from
the broker, skipping these functions. This was fixed by encoding information
in a queue itself, that, whether polling, resets the timer.
# librdkafka v2.1.0
librdkafka v2.1.0 is a feature release:
* [KIP-320](https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation)
Allow fetchers to detect and handle log truncation (#4122).
* Fix a reference count issue blocking the consumer from closing (#4187).
* Fix a protocol issue with ListGroups API, where an extra
field was appended for API Versions greater than or equal to 3 (#4207).
* Fix an issue with `max.poll.interval.ms`, where polling any queue would cause
the timeout to be reset (#4176).
* Fix seek partition timeout, was one thousand times lower than the passed
value (#4230).
* Fix multiple inconsistent behaviour in batch APIs during **pause** or **resume** operations (#4208).
See **Consumer fixes** section below for more information.
* Update lz4.c from upstream. Fixes [CVE-2021-3520](https://github.com/advisories/GHSA-gmc7-pqv9-966m)
(by @filimonov, #4232).
* Upgrade OpenSSL to v3.0.8 with various security fixes,
check the [release notes](https://www.openssl.org/news/cl30.txt) (#4215).
## Enhancements
* Added `rd_kafka_topic_partition_get_leader_epoch()` (and `set..()`).
* Added partition leader epoch APIs:
- `rd_kafka_topic_partition_get_leader_epoch()` (and `set..()`)
- `rd_kafka_message_leader_epoch()`
- `rd_kafka_*assign()` and `rd_kafka_seek_partitions()` now supports
partitions with a leader epoch set.
- `rd_kafka_offsets_for_times()` will return per-partition leader-epochs.
- `leader_epoch`, `stored_leader_epoch`, and `committed_leader_epoch`
added to per-partition statistics.
## Fixes
### OpenSSL fixes
* Fixed OpenSSL static build not able to use external modules like FIPS
provider module.
### Consumer fixes
* A reference count issue was blocking the consumer from closing.
The problem would happen when a partition is lost, because forcibly
unassigned from the consumer or if the corresponding topic is deleted.
* When using `rd_kafka_seek_partitions`, the remaining timeout was
converted from microseconds to milliseconds but the expected unit
for that parameter is microseconds.
* Fixed known issues related to Batch Consume APIs mentioned in v2.0.0
release notes.
* Fixed `rd_kafka_consume_batch()` and `rd_kafka_consume_batch_queue()`
intermittently updating `app_offset` and `store_offset` incorrectly when
**pause** or **resume** was being used for a partition.
* Fixed `rd_kafka_consume_batch()` and `rd_kafka_consume_batch_queue()`
intermittently skipping offsets when **pause** or **resume** was being
used for a partition.
## Known Issues
### Consume Batch API
* When `rd_kafka_consume_batch()` and `rd_kafka_consume_batch_queue()` APIs are used with
any of the **seek**, **pause**, **resume** or **rebalancing** operation, `on_consume`
interceptors might be called incorrectly (maybe multiple times) for not consumed messages.
### Consume API
* Duplicate messages can be emitted when a fetch response is received
in the middle of an offset validation request.
* Segmentation fault when subscribing to a non-existent topic and
calling `rd_kafka_message_leader_epoch()` on the polled `rkmessage`.
# librdkafka v2.0.2
librdkafka v2.0.2 is a maintenance release:
* Fix OpenSSL version in Win32 nuget package (#4152).
# librdkafka v2.0.1
librdkafka v2.0.1 is a maintenance release:
* Fixed nuget package for Linux ARM64 release (#4150).
# librdkafka v2.0.0
librdkafka v2.0.0 is a feature release:
* [KIP-88](https://cwiki.apache.org/confluence/display/KAFKA/KIP-88%3A+OffsetFetch+Protocol+Update)
OffsetFetch Protocol Update (#3995).
* [KIP-222](https://cwiki.apache.org/confluence/display/KAFKA/KIP-222+-+Add+Consumer+Group+operations+to+Admin+API)
Add Consumer Group operations to Admin API (started by @lesterfan, #3995).
* [KIP-518](https://cwiki.apache.org/confluence/display/KAFKA/KIP-518%3A+Allow+listing+consumer+groups+per+state)
Allow listing consumer groups per state (#3995).
* [KIP-396](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484)
Partially implemented: support for AlterConsumerGroupOffsets
(started by @lesterfan, #3995).
* OpenSSL 3.0.x support - the maximum bundled OpenSSL version is now 3.0.7 (previously 1.1.1q).
* Fixes to the transactional and idempotent producer.
## Upgrade considerations
### OpenSSL 3.0.x
#### OpenSSL default ciphers
The introduction of OpenSSL 3.0.x in the self-contained librdkafka bundles
changes the default set of available ciphers, in particular all obsolete
or insecure ciphers and algorithms as listed in the
OpenSSL [legacy](https://www.openssl.org/docs/man3.0/man7/OSSL_PROVIDER-legacy.html)
manual page are now disabled by default.
**WARNING**: These ciphers are disabled for security reasons and it is
highly recommended NOT to use them.
Should you need to use any of these old ciphers you'll need to explicitly
enable the `legacy` provider by configuring `ssl.providers=default,legacy`
on the librdkafka client.
#### OpenSSL engines and providers
OpenSSL 3.0.x deprecates the use of engines, which is being replaced by
providers. As such librdkafka will emit a deprecation warning if
`ssl.engine.location` is configured.
OpenSSL providers may be configured with the new `ssl.providers`
configuration property.
### Broker TLS certificate hostname verification
The default value for `ssl.endpoint.identification.algorithm` has been
changed from `none` (no hostname verification) to `https`, which enables
broker hostname verification (to counter man-in-the-middle
impersonation attacks) by default.