Restore or Rebuild After Ransomware? A Recovery Decision Framework
Ransomware recovery has two jobs that pull against each other: bring services back and avoid bringing the attacker back with them.
That is why “restore from backup” is not always the right first move. A restore can return encrypted data, malware, unauthorized accounts, altered configuration, poisoned scripts, or compromised identity state. A rebuild can reduce that risk, but only if the team has protected application sources, infrastructure definitions, secrets handling, package repositories, and deployment steps.
The decision is not a purity test. It is an evidence test.
Use the decision table before the bridge call
| Decision factor | Restore when | Rebuild when |
|---|---|---|
| Recovery point | The clean point is verified and recent enough. | The clean point is unknown, suspect, or too old for the business. |
| Identity | Directory state and privileged access are trusted. | Directory state or privileged access may be compromised. |
| Configuration | Configuration baseline matches a known-good reference, with no unauthorized persistence detected. | Configuration drift, persistence, or tampering is suspected. |
| Dependencies | Service dependencies are mapped and tested. | Dependency state is unknown or stale. |
| Business tolerance | Speed matters and the evidence supports restore. | Risk tolerance requires a known-good build path. |
Do not invent the decision criteria during the incident
An incident bridge is a bad place to create a recovery policy. The pressure is real. Someone will ask for the fastest path, someone else will ask for the safest path, and the team may not have enough evidence to defend either one.
NIST’s contingency planning guidance makes one useful point: recovery needs plans, procedures, and technical measures before the disruption, not during it, as described in its contingency planning topic page. NIST CSF 2.0 makes the same point from the risk side: recovery depends on planning and testing across the broader cyber program in the CSF 2.0 core.
The practical takeaway is blunt: the restore-or-rebuild criteria belong in the recovery plan before encryption starts.
Restore when the evidence is good enough
Restore is the right path when speed does not require guesswork.
The team needs a recovery point that is recent enough for the business and validated for integrity. AWS Well-Architected recommends periodic recovery tests that verify restored data is available, uncorrupted, accessible, and inside RPO and RTO targets in REL09-BP04. CIS Control 11 anchors the same standard from the controls side: recovery is not done until assets and data are back in a documented, trusted state, as described in its data recovery control.
For restore to be defensible, the team should be able to answer a few uncomfortable questions:
- Which recovery point is clean?
- How was that validated?
- Which identities, secrets, and privileged accounts will be used during recovery?
- Which tests prove the restored application works?
- What evidence will be kept for audit, insurance, legal, and post-incident review?
If those answers are current and rehearsed, restore can be the fastest path back. If they are being assembled from memory, the team is already late.
Rebuild when trust is broken
Rebuild is the safer path when the system state is not trustworthy.
That usually means the operating system, application binaries, configuration, endpoint management tooling, identity layer, privileged access path, or deployment pipeline might be contaminated. In that situation, restoring a server image may restore the attacker’s working environment.
MITRE ATT&CK’s Data Encrypted for Impact technique notes that encryption malware may use valid accounts, credential dumping, and admin shares to propagate. MITRE’s Inhibit System Recovery technique lists behavior that deletes shadow copies, disables recovery, and removes backup catalogs. Those tactics turn a data restore problem into a trust problem.
Rebuild needs preparation, not heroics:
- Clean operating system media or protected base images.
- Known-good infrastructure definitions.
- Reproducible application deployment steps.
- A path to recover data without restoring compromised OS or application state.
- Tests that prove business function, not only host availability.
Rebuild is not slow by nature. It is slow when the organization has never treated it as a recovery path.
Settle identity before reconnecting applications
Identity is not another dependency on the list. It decides who can recover everything else.
Active Directory recovery is where a lot of tidy runbooks get exposed. A domain controller is not just another VM to roll back and reconnect. If the directory state or privileged access path is untrusted, restoring AD can recreate the control plane the attacker used.
Microsoft’s Active Directory forest recovery documentation says forest recovery requires restoring at least one domain controller in every domain from an available backup, and that the forest is restored to the state of the last trusted backup in its forest recovery guidance. Microsoft also recommends a dedicated restore domain controller when planning forest recovery.
For virtualized domain controllers, Microsoft says system state backups are required for disaster recovery, and that domain controllers should be backed up regularly and at least every 90 days in its virtualized domain controller restore guidance. That 90-day minimum is not a ransomware recovery cadence. It is a floor for having a usable disaster recovery backup at all.
Here is the operational detail teams often miss: a clean application restore can still sit idle if the directory path is untrusted, service accounts are suspect, privileged access has not been reset, or nobody can safely use the recovery credentials. Identity recovery needs its own drill, evidence set, and approval gate.
Validate in isolation before production return
Production should not be the first place a restored system proves itself.
An isolated validation path lets the team inspect data, run malware and integrity checks, verify configuration, test application behavior, and confirm dependencies before reconnecting to production networks. Backup teams can prove the copy exists. Security teams can assess whether the recovery candidate is safe enough to advance. Application owners can prove the service actually works.
If one of those voices is missing, the recovery decision is under-evidenced.
Elastio’s role is to provide recovery-point evidence for that validation gate. Elastio documents that its iScan can scan AWS EBS volumes and snapshots, EC2 instances, AMIs, AWS Backup recovery points, EFS, S3 buckets, Azure VMs, Azure managed disks, snapshots, data protection recovery points, and local paths in Using iScan for Cloud and Local Resources. The same article says AWS snapshot scan results are tagged as clean or infected.
When ransomware is suspected, and Elastio had not been configured proactively, Elastio’s Ransomware 911 documentation describes a process to deploy Elastio, run threat hunts against backup recovery points, and identify the last clean backup across assets in Ransomware 911. Elastio’s model documentation also says its deterministic ransomware model detects ransomware encryption and the ransomware family involved in Elastio’s 7 Layers of Ransomware Protection.
That kind of verdict belongs before production return, not after users report that the restored service looks wrong.
Objections worth answering
Is rebuild always safer than restore?
No. Rebuild is the better option when there is no trust in the compromised system state, but it still needs clean data, trusted identity, validated dependencies, protected deployment sources, and tested steps. A poorly prepared rebuild path can be slower and less reliable than a well-tested restore.
Why not just restore the VM and let EDR clean it?
EDR can be part of validation, but it does not prove the restored data is clean, the directory is trusted, the service accounts are safe, or the recovery point predates the attacker’s changes. Treat it as one signal, not the recovery decision.
When should the incident commander stop waiting for more evidence?
When the business owner, security owner, and application owner can name the risk they are accepting. If the team cannot identify the clean point, identity path, dependency order, and validation result, the decision is not informed. It is a bet.
Run the next ransomware drill around one Tier 1 application. Force the team to choose restore or rebuild using only the evidence it has today.
Book a Recovery Assessment
Take one Tier 1 application into the next Recovery Assessment and force the restore-or-rebuild decision before the incident does.
Sources
[1] NIST, Contingency Planning
[2] NIST, Cybersecurity Framework (CSF) 2.0, 2024
[3] Amazon Web Services, AWS Well-Architected Framework, REL09-BP04: Perform periodic recovery of the data to verify backup integrity and processes
[4] Center for Internet Security, CIS Control 11: Data Recovery
[5] MITRE ATT&CK, T1486: Data Encrypted for Impact
[6] MITRE ATT&CK, T1490: Inhibit System Recovery
[7] Microsoft, AD Forest Recovery: Determine How to Recover the Forest
[8] Microsoft, Restore a Virtualized Domain Controller
[9] Elastio, Using iScan for Cloud and Local Resources
[10] Elastio, Ransomware 911
[11] Elastio, Elastio’s 7 Layers of Ransomware Protection
Can you prove your recovery points are clean?
Your board will ask if you can recover clean. This checklist lets you answer with evidence.


