Elastio Software

How Elastio Protects Your Data In Streaming Architectures

Author

Naj Husain

Date Published

Ransomware Recovery | Elastio Software

Amazon Managed Streaming for Apache Kafka (MSK) is a popular distributed event store and stream-processing platform many companies use to process data.

To ensure fault tolerance, replication, and mirroring are used, but these technologies don’t protect against data loss caused by processing errors or application failures. These cases may severely impact a company’s operation and result in unrecoverable data.

What makes streaming data difficult is the ability to persist data along the processing path (e.g., checkpoints) and the ability to rewind the processing of past data. You hope these checkpoints are never needed, but a company won’t know if they are needed until a failure happens.

A few examples where data loss may occur:

Leader-follower:

When high replication is put in place in our messaging queue platform, the leader broker may fail during the process to make the followers consistent with the leader.

No replication:

On the other side of the spectrum, companies may have replication disabled completely. A failure of the master broker will result in data loss.

All brokers failed:

In this extreme situation or with poor replication design, all brokers may fail at the same time (e.g. a zone is down and all the replicas are in the same zone).

Deletion of a topic:

A topic may be accidentally deleted and that may prevent the producer from sending the data to the queue for consumption.

Incorrect transformation/processing:

Developers may run incorrect code to transform data during consumption. This may give wrong results or drop important information.

Elastio and its data protection capabilities can help with solving all the pain points mentioned above to provide Kafka business continuity and protect from data loss and downtime.

How we use Elastio to protect our customers from kafka outages and application failures

At Elastio, the events stored in Amazon MSK are crucial for our platform. Kafka directly influences the reliability and data consistency of our customer Tenants. Because of this, we need to protect these events to isolate our customers from Kafka outages and downtime.

The Elastio Tenant is built on the AWS cloud and uses Elastic Kubernetes Services (EKS), RDS, ElastiCache, and MSK.

The Tenant is built on a microservice architecture, and each of our services is bounded by the domain context. Our services communicate synchronously, with internal API calls and asynchronously, passing messages over the AWS Managed Streaming for Apache Kafka (MSK). The Tenant also synchronously communicates with every customer’s Cloud Connector by calling a Cloud Connector Lambda function and asynchronously polling and putting SQS messages from/into a Cloud Connector.

Here is a diagram of how the Tenant works:

As a result, we generate a lot of external and internal asynchronous communication that relies on Kafka. All event messages polled from the customers’ Cloud Connectors are sent to a specific Kafka topic. Event messages from Cloud Connectors can include more detailed information, such as backup metadata, and security report details. The Tenant processes, stores, and visualizes that data for our customers. Our team investigated the market to find a solution to backing up Kafka and was surprised that no product or service is available for Amazon MSK.

Introducing Elastio. The Elastio CLI offers advanced backup options. The stream backup capability became the key to solving the Kafka backup issue. We built a script around the Elastio CLI that creates a Kafka consumer with a unique consumer group id and streams a Kafka topic to the Elastio vault. The data is encrypted, deduped, and cataloged as a recovery point for future use.

The script works as follows:

  • It captures the first message offset in the stream and the last one and stores these offsets as recovery point tags.
  • It is agnostic to the message structure and captures the RAW message from the topic.
  • On the next backup of the same topic in the MSK cluster, it gets the last message offset from the previous backup from recovery point tags and starts a new message stream from that offset. In this way, it ensures that there is no duplicated Kafka message backed up. This is crucial when discussing the restoration of the topic messages in case of any Kafka outage. It can take the specified recovery point of the topic in the MSK cluster and produce the messages collected in that recovery point into the topic in the same order initially stored in the cluster. The code for the consumer and producer is here.

Next we wanted to embed Elastio into the CloudOps workflow to protect Kafka. We include the Elastio service and cloud connector in our Tenant infrastructure. Then we wrapped the script into a Docker image and served the image to an ECS cluster to schedule regular Kafka topics’ backups.

Here is how that works in the Tenant now:

About Elastio

Elastio detects and precisely identifies ransomware in your data and assures rapid post-attack recovery. Our data resilience platform protects against cyber attacks when traditional cloud security measures fail.

Elastio’s agentless deep file inspection continuously monitors business-critical data to identify threats and enable quick response to compromises and infected files. Elastio provides best-in-class application protection and recovery and delivers immediate time-to-value.

Recover With Certainty

See how Elastio validates every backup across clouds and platforms to recover faster, cut downtime by 90%, and achieve 25x ROI.

Related Articles
Elastio Software
January 22, 2026

In early 2026, U.S. authorities issued a cyber threat alert warning organizations about evolving tactics used by North Korean state-sponsored cyber actors. The advisory highlights how the Democratic People’s Republic of Korea (DPRK) continues to refine its cyber operations to conduct espionage, gain persistent access to networks, and generate revenue to support state objectives. This activity underscores a broader reality: DPRK cyber operations are no longer niche or experimental. They are mature, adaptive, and increasingly effective against both public- and private-sector targets. Evolving Tradecraft: From Phishing to QR Code Attacks A key focus of the alert is the growing use of malicious QR codes embedded in phishing emails, a technique often referred to as “quishing.” Instead of directing victims to malicious links, attackers embed QR codes that prompt users to scan them with mobile devices. This approach allows attackers to bypass traditional email security controls and exploit weaker defenses on mobile platforms. Once scanned, these QR codes redirect victims to attacker-controlled pages that closely mimic legitimate login portals, such as enterprise email or remote access services. Victims who enter their credentials unknowingly hand over access to their accounts, enabling attackers to move laterally, conduct follow-on phishing campaigns, or establish long-term persistence. Kimsuky and Targeted Espionage The activity described in the alert is attributed to a DPRK-linked cyber group commonly referred to as Kimsuky. This group has a long history of targeting policy experts, think tanks, academic institutions, and government entities, particularly those involved in foreign policy and national security issues related to the Korean Peninsula. What distinguishes recent campaigns is the subtlety of the lures and the deliberate exploitation of user trust. Emails are crafted to appear routine or administrative, and QR codes are presented as harmless conveniences. This increases the likelihood of successful compromise, even in security-aware environments. Cybercrime as Statecraft DPRK cyber operations should not be viewed solely through the lens of traditional espionage. North Korea has repeatedly demonstrated its willingness to use cybercrime as a strategic tool. In parallel with intelligence collection, DPRK-linked actors have conducted financially motivated attacks, including cryptocurrency theft, financial fraud, and illicit remote employment schemes. These activities serve a dual purpose: generating revenue to circumvent international sanctions and providing operational cover for broader intelligence objectives. In many cases, what appears to be simple fraud is ultimately tied to state-directed priorities. Why This Matters Now The techniques outlined in the 2026 alert highlight how DPRK cyber actors are adapting faster than many defensive programs. By shifting attacks to mobile devices, exploiting human behavior, and blending espionage with financial crime, they reduce the effectiveness of traditional security controls. For organizations, this means that technical defenses alone are no longer sufficient. User awareness, mobile security posture, identity protection, and anomaly detection all play a critical role in mitigating risk. Key Takeaways for Organizations Organizations should assume that DPRK cyber activity will continue to evolve and expand in scope. Practical steps include updating security awareness training to address QR code–based attacks, monitoring for anomalous authentication behavior, limiting credential reuse, and treating identity compromise as a high-impact security incident. Most importantly, leaders should recognize that DPRK cyber operations are persistent, well-resourced, and strategically motivated. Understanding this threat is essential not only for government and policy organizations, but for any enterprise operating in an increasingly interconnected and geopolitically influenced digital environment.

Elastio Software
December 24, 2025

Detonation Point is where cyber risk stops being an abstract headline and becomes an operational reality. In a recent episode presented by Elastio, host Matt O’Neill sat down with cloud security expert Costas Kourmpoglou at Spike Reply UK to unpack a hard truth many organizations only learn after an incident: Ransomware doesn’t succeed because attackers are smarter; it succeeds because recovery fails. Ransomware Is an Industry Early ransomware operations were vertically integrated. The same group wrote the malware, gained access, deployed it, negotiated payment, and laundered funds. That model is gone. Today’s ransomware ecosystem resembles a supply chain: Developers build ransomware toolingInitial access brokers sell credentialsAffiliates deploy attacksNegotiators manage extortionSeparate actors handle payments and laundering This “Ransomware-as-a-Service” model lowers the barrier to entry and scales attacks globally. No one really needs expert technical skills. They just need access and opportunity. How Daily Mistakes Set Ransomware in Motion Ransomware became dominant for a straightforward reason: it pays. Despite headlines about zero-day exploits, most ransomware campaigns still begin with mundane failures: Reused credentialsPhishing emailsThird-party access The uncomfortable reality is that most organizations already assume breaches, yet design security as if prevention is enough. In this Detonation Point podcast, Costas noted, “Many teams over-invest in stopping the first mistake and under-invest in what happens after that mistake inevitably occurs.” Attackers don’t rush. Once inside, they: Observe quietly and use native tools to blend in (“living off the land”)Map systems and privilegesIdentify backups and recovery paths Ransomware often detonates months after initial access and long after backups have quietly captured infected data. But Why Paying the Ransom Rarely Works Ransomware payments are often justified as the “cheapest option.” But data tells a different story: Recovery success after payment is worse than a coin flipPayments may violate sanctions lawsData is often not fully restored or released anyway As Costas put it, “If you’re willing to gamble on paying the ransom, you might as well invest that money in resilience, where the odds are actually in your favor.” One of the most critical insights from the conversation was this: If your business cannot operate, that is not just a cybersecurity failure, it’s a business failure. If your plan assumes everything else still works, it’s not a plan. And, if ransomware detonated tonight, do you know which recovery path would save you, and which ones would make things worse? Because when ransomware stops being theoretical, only validated recovery determines the outcome. This blog is adapted from the Detonation Point podcast presented by Elastio.

Elastio × AWS GuardDuty — Automated Scans for Malware
Elastio Software,  Ransomware
December 22, 2025

GuardDuty’s release of malware scanning on AWS Backup is an important enhancement to the AWS ecosystem, reflecting growing industry recognition that inspecting backup data has become a core pillar of cyber resilience. But real-world incidents show that ransomware often leaves no malware behind, making broader detection capabilities for encryption and zero-day attacks increasingly essential. Across industries, there are countless examples of enterprises with premium security stacks in place - EDR/XDR, antivirus scanners, IAM controls - still suffering extended downtime after an attack because teams couldn’t reliably identify an uncompromised recovery point when it mattered most. That’s because ransomware increasingly employs fileless techniques, polymorphic behavior, living-off-the-land tactics, and slow, stealthy encryption. These campaigns often reach backup andreplicated copies unnoticed, putting recovery at risk at the very moment organizations dependon it. As Gartner puts it: Modern ransomware tactics bypass traditional malware scanners, meaning backups may appear ‘clean’ during scans but prove unusable when restored. Equip your recovery environment with advanced capabilities that analyze backup data using content-level analytics and data integrity validation.”— Gartner, Enhance Ransomware Cyber Resilience With A Secure Recovery Environment, 2025 This is the visibility gap Elastio was designed to close. In this post, we walk through how Elastio’s data integrity validation works alongside AWS GuardDuty to support security and infrastructure teams through threat detection all the way to recovery confidence and why integrity validation has become essential in the age of identity-based and fileless attacks. What is AWS GuardDuty? AWS GuardDuty is a managed threat detection service that continuously monitors AWS environments for malicious or suspicious activity. It analyzes signals across AWS services, including CloudTrail, VPC Flow Logs, DNS logs, and malware protection scans, and produces structured security findings. GuardDuty integrates natively with Amazon EventBridge, which means every finding can be consumed programmatically and routed to downstream systems for automated response. For this integration, we focus on GuardDuty malware findings, including: Malicious file findings in S3Malware detections in EC2 environments These findings are high-confidence triggers that indicate potential compromise and warrant immediate validation of recovery data. Learn more about GuardDuty. Why a GuardDuty Finding Should Trigger Recovery Validation Malware detection is important, but it is no longer sufficient to validate data recoverability. Identity-based attacks dominate cloud breaches Today’s attackers increasingly rely on stolen credentials rather than exploits. With valid identities, they can: Use legitimate AWS APIsAccess data without dropping malwareBlend into normal operational behavior In these scenarios, there may be nothing malicious to scan, yet encryption or tampering can still occur. Fileless and polymorphic ransomware evade signatures Many ransomware families: Run entirely in memoryContinuously mutate their payloadsAvoid writing recognizable artifacts to disk Signature-based scanners may report “clean,” even as encryption spreads. Zero-day ransomware has no signatures By definition, zero-day ransomware cannot be detected by known signatures until after it has already caused damage - often widespread damage. The result is a dangerous failure mode: backups that scan clean but restore encrypted or corrupted data. Why Integrity Validation Changes the Outcome Elastio approaches ransomware from the impact side. Instead of asking only “is malware present?”, Elastio validates: Whether encryption has occurredWhat data was impactedWhen encryption startedWhich recovery points are still safe to restore The timeline above reflects a common real-world pattern: Initial access occurs quietlyEncryption begins days or weeks laterBackups continue, unknowingly capturing encrypted dataThe attack is only discovered at ransom time Without integrity validation, teams cannot know with confidence that their backups will work when they need them. This intelligence transforms a GuardDuty finding from an alert into an actionable recovery decision. Using GuardDuty as the Trigger for Recovery Validation Elastio’s new GuardDuty integration automatically initiates data integrity scans when GuardDuty detects suspicious or malicious activity. Instead of stopping at alerts, the integration immediately answers the implied next question: Did this incident affect our data, and can we recover safely? By validating backups and recovery assets in response to GuardDuty findings, Elastio reduces response time, limits attacker leverage, and enables faster, more confident recovery decisions. Architecture Overview At a high level: GuardDuty generates a malware findingThe finding is delivered to EventBridgeEventBridge routes the event into a trusted sender EventBusElastio’s receiver EventBus accepts events only from that senderElastio processes the finding and starts a targeted scanTeams receive recovery-grade intelligenceIncluding:Ransomware detection resultsFile- and asset-level impactLast known clean recovery pointOptional forwarding to SIEM or Security Hub The critical design constraint: trusted senders Each Elastio customer has a dedicated Receiver EventBus. For security reasons, that receiver only accepts events from a single allowlisted Sender EventBus ARN. This design ensures: Strong tenant isolationNo event spoofingClear security boundaries To support scale, customers can route many GuardDuty sources (multiple accounts, regions, or security setups) into that single sender bus. Elastio enforces trust at the receiver boundary. End-to-End Flow Step 1: GuardDuty detects malware GuardDuty identifies a malicious file or suspicious activity in S3 or EC2 and emits a finding. Step 2: EventBridge routes the finding Native EventBridge integration allows customers to filter and forward only relevant findings. Step 3: Sender EventBus enforces trust All GuardDuty findings flow through the designated sender EventBus, which represents the customer’s trusted identity. Step 4: Elastio receives and buffers events The Elastio Receiver EventBus routes events into an internal queue for resilience and burst handling. Step 5: Elastio validates recovery data Elastio maps the finding to impacted assets and initiates scans that analyze both malware indicators and ransomware encryption signals. Step 6: Recovery-grade results Teams receive actionable results: Ransomware detectionFile-level impactLast known clean recovery pointOptional forwarding to SIEM or Security Hub What This Enables for Security and Recovery Teams By combining GuardDuty and Elastio, organizations gain: Faster response triggered by high-signal findingsEarly detection of ransomware encryption inside backupsReduced downtime and data lossConfidence that restores will actually workAudit-ready evidence for regulators, insurers, and leadership Supported Today S3 malware findingsEC2 malware findings EBS-specific handling is in progress and will be added as it becomes available. Why This Matters in Practice In most ransomware incidents, the challenge isn’t identifying a security signal - it’s understanding whether that signal corresponds to meaningful data impact, and what it implies for recovery. Security and infrastructure teams often find themselves piecing together information across multiple tools to assess whether encryption or corruption has reached backups or replicated data. That assessment takes time, and during that window, recovery decisions are delayed or made conservatively. By using GuardDuty findings as a trigger for integrity validation, customers introduce earlier visibility into potential data impact. When suspicious activity is detected, Elastio provides additional context around whether recovery assets show signs of encryption or corruption, and which recovery points appear viable. This doesn’t replace incident response processes or recovery testing, but it helps teams make better-informed decisions sooner, particularly in environments where fileless techniques and identity-based attacks limit the effectiveness of traditional malware scanning. Extending GuardDuty From Detection Toward Recovery Readiness GuardDuty plays a critical role in surfacing high-confidence security findings. Elastio extends that signal into the recovery domain by validating the integrity of data organizations may ultimately depend on to restore operations. Together, they help teams bridge the gap between knowing an incident may have occurred and assessing recovery readiness, with supporting evidence that can be shared across security, infrastructure, and leadership teams. For organizations already using GuardDuty, this integration provides a practical way to connect detection workflows with recovery validation without changing existing security controls or response ownership. Watch our discussion: Understanding Elastio & AWS GuardDuty Malware Scanning for AWS Backup An open conversation designed to answer customer questions directly and help teams understand how these technologies work together to strengthen recovery posture. How signature-based malware detection compares to data integrity validationReal-world scenarios where behavioral and encryption-based detection mattersHow Elastio extends visibility, detection, and recovery assurance across AWS, Azure, and on-prem environmentsAn early look at Elastio’s new integration launching at AWS re:Invent