Protecting Applications From CI/CD Pipeline Failure Cases

Our pipeline consists of a Source and a mock Deploy step which equates to just about the most basic pipeline possible.

Normally, we would also have some build steps as well as some approval gates or invoking of other services, but we’ll work with what we have. We want to modify this process to incorporate Elastio asset protection and recovery into the pipeline. The goals are to ensure that we can recover the application in the event of a pipeline failure.

In this example, we have an EC2 instance hosting the application and our Postgres database and its precious data. We’ll need to add a few things – a step for backups of the EC2 instance and of the database, one to wait for those backups to finish, and one to restore if [read: when] our deployment fails. To do this, we will use a generic CodeBuild job to run CLI commands.

What we want is something like this:

In the added protection stage, the Elastio CLI program is used to initialize an EC2 backup in the background, which gets managed by the Elastio Job scheduler and returns control. While that is running, we can then continue on and perform a database dump using the pg_dump command and piping the output to the Elastio CLI command as a stream backup.

Once that completes, the next stage of the pipeline waits for the background job to finish running by waiting, then periodically polling for the job status. Once that is completed, the next stage is initiated. Our mock deployment fails and the CodeBuild job detects it. As configured and instructed, we move to a self-healing solution by initiating a restore of both the EC2 instance as well as the Postgres database. This leaves the environment in the same state as it was prior to the deployment.

Though the deployment in this example is a no-op, a set up like this can protect both the code base and current running configuration (the EC2 instance) and the data by quickly undoing any migrations that have run that left the database in an inconsistent state when a deployment fails.

The entire proof of concept code can be downloaded here:

This approach has saved us hours of manual work and hardened our deployment process.