How Elastio Protects Your Data In Streaming Architectures

Elastio is a data protection-as-code (DPaC) company that protects clients’ data during unforeseeable events modern applications may experience. For example, downtime caused by ransomware, deployment mistakes, data errors, and security vulnerabilities. Elastio provides cloud-native solutions that can be deployed in self-cloud environments to give you full visibility and full control of where your data goes and how it’s used.

Apache Kafka is a very popular distributed event store and stream-processing platform that is used by many companies to process data.

To ensure fault-tolerance Kafka replication and mirroring are used but these technologies don’t protect against data loss caused by processing errors or application failures. These cases may severely impact a company’s operation and result in unrecoverable data.

What makes streaming data difficult is the ability to persist data along the processing path (e.g. checkpoints) and the ability to rewind the processing of past data. You hope that these checkpoints are never needed but a company won’t know if they are needed until a failure happens.

A few examples where data loss may occur:

Leader-follower:

When high replication is put in place in our messaging queue platform, the leader broker may fail during the process to make the followers consistent with the leader.

No replication:

On the other side of the spectrum, companies may have replication disabled completely. A failure of the master broker will result in data loss.

All brokers failed:

In this extreme situation or with poor replication design, all brokers may fail at the same time (e.g. a zone is down and all the replicas are in the same zone).

Deletion of a topic:

A topic may be accidentally deleted and that may prevent the producer from sending the data to the queue for consumption.

Incorrect transformation/processing:

Developers may run incorrect code to transform data during consumption. This may give wrong results or drop important information.

Elastio and its data protection capabilities can help with solving all the pain points mentioned above to provide Kafka business continuity and protect from data loss and downtime.

How we use elastio to protect our customers from kafka outages and application failures

At Elastio, the events stored in Kafka are crucial for our platform. Kafka directly influences the reliability and data consistency of our customer Tenants. Because of this, we need to protect these events to isolate our customers from Kafka outages and downtime.

The Elastio Tenant is built on top of the AWS cloud and uses Elastic Kubernetes Services (EKS), RDS, ElastiCache, and MSK.

The Tenant is built on a microservice architecture, and each of our services are bounded by the domain context. Our services communicate synchronously, with internal API calls and asynchronously, passing messages over the AWS Managed Streaming for Apache Kafka (MSK). The Tenant also synchronously communicates with every customer’s Cloud Connector by calling a Cloud Connector Lambda function and asynchronously polling and putting SQS messages from/into a Cloud Connector.

Here is a diagram of how the Tenant works:

As a result, we generate a lot of external and internal asynchronous communication that relies on Kafka. All event messages polled from the customers’ Cloud Connectors are sent to a specific Kafka topic. Event messages from Cloud Connectors can include more detailed information such as backup metadata, and security report details. The Tenant processes, stores, and visualizes that data for our customers.Our team investigated the market to find a solution to backing up Kafka and was surprised that no product or service is available for MSK including from AWS.

Introducing Elastio. The Elastio CLI offers advanced backup options. The stream backup capability became the key to solving the Kafka backup issue. We built a script around the Elastio CLI that creates a Kafka consumer with a unique consumer group id and streams a Kafka topic to the Elastio vault. The data is encrypted, deduped and cataloged as a recovery point for future use.

The script works as follows:

Next we wanted to embed Elastio into the CloudOps workflow to protect Kafka. We include the Elastio service and cloud connector in our Tenant infrastructure. Then we wrapped the script into a Docker image and served the image to an ECS cluster to schedule regular Kafka topics’ backups.

Here is how that works in the Tenant now:

Scroll to Top