Blogg
Här finns tekniska artiklar, presentationer och nyheter om arkitektur och systemutveckling. Håll dig uppdaterad, följ oss på LinkedIn
Här finns tekniska artiklar, presentationer och nyheter om arkitektur och systemutveckling. Håll dig uppdaterad, följ oss på LinkedIn
In this blog we will take a look at one way to encrypt your Kafka records using Kroxylicious. This blog post is supported by a working example with source code in this GitHub repository.
It is highly likely that your Kafka platform is handling large volumes of very sensitive data. Encryption in transit (TLS) and encryption at rest (encrypted disks) will protect your data from interception. Well structured Access Control Lists (ACLs) will limit who has access to the data. Using binary schemas like Avro and Protobuf may help obfuscate your data further limiting access (but schemas can be reverse engineered). This may sufficiently reduce your attack surface so that you feel your Kafka platform is secure.
However there may be a nagging worry that some form of super-user account exists and that your well constructed ACLs can be compromised. If you are using a cloud provider you may want to protect your data from conflicting legal requirements, for example the CLOUD act. In this case you may want to consider encrypting the data you send to Kafka.
The Apache Kafka platform itself provides no support for the encryption of records and is not likely to in the near future.
One solution is to ask all client applications to assume responsibility for encrypting their own records. For example you could choose to add an encryption functionality to custom Serializers and Deserializers. As we will see below, the encryption process is quite complicated and may significantly increase the burden on each client application. If you have a diverse portfolio of client applications this strategy may be impractical or even impossible.
A Proxy solution provides a way to bridge this gap. The proxy will intercept traffic between the client application and the broker, and may choose to extend the functionality, in this case by encrypting and decrypting the data. To the client, the proxy is indistinguishable from the broker, and to the broker the proxy is indistinguishable from the client application. This gives us a relatively seamless way of introducing encryption.
Kroxylicious is a RedHat project that aims to deliver a Kafka proxy. At the time of writing Kroxylicious is at version 0.9.0 and is not considered ready for production envirionments. However it certainly is ready for use in a demo!
In this demo we will focus solely on using Kroxylicious to provide encryption (however this is not the only use case). The Kroxylicious documentation provides a rich description of the use of filter chains that:
..implements some logic for intercepting, inspecting and/or manipulating Kafka protocol messages
The concepts of filters and filter chains will be familiar to most Java developers and it is tempting to consider writing your own - for example an audit filter that logs who is accessing which topics. A word of caution though, it is no small task to master the Apache Kafka Protocol.
The Record Encryption Filter is provided as part of the Kroxylicious package.
The filter is reliant on an external Key Managment System (KMS) that will provide the keys required to perform encryption and decryption.
The filter uses envelope encryption; a data encryption key (DEK) is used for encryption of the record and a Key Encryption Key (KEK) to encrypt the DEK. The intention is to rotate the DEK frequently and the KEK less frequently. The KEK should never leave the KMS system, only an alias is shared with the filter. Encryption of the record is performed by Kroxylicious to reduce the load on the KMS.
Every record on a topic consists of a key and a value. When a client application writes a record via the Kroxylicious proxy the record value is swapped - the “new” value contains the data encrypted using the DEK, the encrypted DEK, and a reference to the KEK. When reading a record the Kroxylicious proxy decrypts the encrypted DEK and then decrypts the data. This process is described in more detail in the Record Encryption Filter documentation.
Note that the record key is not encrypted. As the key is often used in partitioning strategies it may not make sense to encrypt it (rotating a DEK could assign records to different partitions) and it may be a better strategy to avoid storing sensitive data in the key.
I will reuse the concept from my last blog post of a weather reporting scenario. Once again I will use Quarkus to play the role of a weather station reporting weather information (a Kafka producer) and the role of the central weather collecting service (a Kafka consumer). These weather reports are randomly generated and are published as Json. Let’s assume that this data is considered sensitive enough to warrant encrpytion.
The final destination of these weather reporting is a Confluent Kafka broker. Between the Quarkus application and the broker a Kroxylicious implementation has been deployed. A quick look at the Kroxylicious configuration shows a Virtual Cluster exposing a listener for the Quarkus app on kroxylicious:9093
and a Target Cluster that uses the broker listener on broker:29094
:
targetCluster:
bootstrap_servers: broker:29094
clusterNetworkAddressConfigProvider:
type: PortPerBrokerClusterNetworkAddressConfigProvider
config:
bootstrapAddress: kroxylicious:9093
Once the weather application is up and running the logs will show that messages are happily being published and consumed, i.e. the data is being encrypted and decrypted successfully:
2025-01-02 13:13:10,002 INFO [se.mar.wea.kaf.pro.WeatherScheduler] (executor-thread-1) Producing message with id d62b0eea-9956-47ef-9eeb-f9667605d2cd
2025-01-02 13:13:10,013 INFO [se.mar.wea.kaf.con.WeatherConsumer] (vert.x-eventloop-thread-0) Reading message from topic weather-topic partition 7 and offset 136 with id d62b0eea-9956-47ef-9eeb-f9667605d2cd
Confluent Control Center has been added to read data from a topic. For this demo Control Center is connected directly to the broker, not via the proxy. If we have a peak at the record on partition 7, offset 136 in Control Center we can see that the message is encrypted:
Or in Json format…
{
"topic":"weather-topic",
"partition":7,
"offset":136,
"timestamp":1735823590003,
"timestampType":"CREATE_TIME",
"headers":[{"key":"kroxylicious.io/encryption","stringValue":"\u0002"}],
"key":"i_am_a_little_teapot_-1467935660",
"value":"\u0000y\u001fKEK_kroxylicious_encryption_keyvault:v1:GrH4zpnO..."
}
(Note: the “value” field has been shortened for brevity).
There we have it! Data is being encrypted and decrypted, the client is unaware of the proxy, and our data is secured!
This demo uses Hashicorp vault as a KMS system. The Record Encryption filter has limited support for other KMS systems. However there is nothing to stop you implementing your own by implementing the KmsService interface:
public interface KmsService<C, K, E> extends AutoCloseable {
void initialize(C config);
Kms<K, E> buildKms() throws IllegalStateException;
default void close() {
}
}
…and the Kms interface:
public interface Kms<K, E> {
CompletionStage<DekPair<E>> generateDekPair(@NonNull K kekRef);
CompletionStage<SecretKey> decryptEdek(@NonNull E edek);
Serde<E> edekSerde();
CompletionStage<K> resolveAlias(@NonNull String alias);
}
…and adding your classes to the Classpath using the Java Service Provider Interface.
However before you do that you may want to consider whether your code is production ready and if you can support this in your given SLAs.
This quick demo shows you how Kroxylicious could be used to encrypt your data in Kafka with minimal consequences for your consumers and producers.
This demo uses a version of Kroxylicious that is not recommended for production. As you can see in the demo code there was a need to get associated with the Kroxylicious source code, to assemble the Kroxylicious implementation, and to build a Docker image, but this was certainly no major effort (and has been getting noticably easier as the project matures).
There are plenty more things to discuss that are out of scope of this demo - deployment strategies for your Kroxylicious cluster, key rotation strategies, etc.
One aspect I would recommend focussing on is how adding a proxy and a KMS will affect the SLA of your Kafka platform. Remember that the SLA of your Kafka platform as a whole cannot be better than the worst SLA of any component in the platform. If the KMS cannot provide a key or if the proxy is down all traffic to all brokers could be stopped, and your entire Kafka platform will be down. Proceed with care!
I believe that a production ready version of Kroxylicious will be available in early 2025 and look forward to giving it a trial in a real life scenario!