Stedi
Blog April 13, 2022

Parallel CDK stack deployments with GitHub Actions

Ken Winner

One deployment per hour is far too slow. Our initial deployment style starts with a paths filter to limit what stacks deploy, based on files and folders changed in a commit. We start our project this way thinking it will give us quick deploys for infrequently changed stacks. As our project grows, though, this is taking too long to deploy. Based on our analysis, our fastest recorded deployment time is 13.5 minutes, but our slowest deployments take up to 40 minutes. We are confident we can get our p99 down to 20 minutes or less.

How

Here on the Billing team our primary application is a single monorepo with 12 CDK stacks deployed using GitHub Actions. When we dive into the pipeline, we realize that we have a number of redundant steps that are increasing deployment times. These duplicate steps are a result of the way CDK deploys dependent stacks.

Stack Diagram

For instance, let’s take four of our stacks: Secrets, API, AsyncJobs, and Dashboard. The API stack relies on Secrets, while Dashboard relies on API and AsyncJobs. If we only need to update the Dashboard stack, CDK will still force a no-op deployment of Secrets, API, and AsyncJobs. This pipeline will always start from square one and run the full deployment graph for each stack.

We believe we can speed up our slowest deployment time by reducing these redundant deployment steps. Our first improvement is to modify the paths filter by triggering a custom “deploy all” CDK command if certain files (e.g. package.json) were changed. We alter the standard cdk deploy –all because we have some stacks listed that we don’t want to deploy. Instead, we use cdk deploy –exclusively Secrets-Stack API-Stack AsyncJobs-Stack Dashboard-Stack, which decreases our median deployment time by a full minute and decreases our slowest deployment time by 10 minutes.

With our slowest times still clocking in around 30 minutes, we know we need a new idea to reach our goal of 20 minutes or less. We have reached our limit of speed for serial deployments, so we attempt to parallelize the deployments. We use the CDK stack dependency graph, which gives us a simple mechanism to generate parallel jobs. We are now able to build the GitHub Actions jobs and link them together with the stack dependencies using the job’s need: [...] option.

To generate the stack graph, we synthesize the stacks (cdk synth) for each stage we'd like to deploy and then parse the resulting manifest.json in the cdk.out directory.

const execSync = require("child_process").execSync;
const fs = require('fs');
const path = require('path');
const { parseManifest } = require('./stackDeps');

const stages = ['demo'];
const stackGraphs = {};

stages.map((stage) => {
  execSync(`STAGE=${stage} npx cdk synth`, {
    stdio: ['ignore', 'ignore', 'ignore'],
  });
  stackGraphs[stage] = parseManifest();
});

const data = JSON.stringify(stackGraphs, undefined, 2);
fs.writeFileSync(path.join(__dirname, '..', 'generated', 'graph.json'), data);

Our stack graph:

{
  "demo": {
    "stacks": [
      {
        "id": "Secrets-demo",
        "name": "Secrets-demo",
        "region": "us-east-1",
        "dependencies": []
      },
      {
        "id": "Datastore-demo",
        "name": "Datastore-demo",
        "region": "us-east-1",
        "dependencies": []
      },
      {
        "id": "AsyncJobs-demo",
        "name": "AsyncJobs-demo",
        "region": "us-east-1",
        "dependencies": [
          "Datastore-demo",
          "Secrets-demo"
        ]
      },
      {
        "id": "Api-demo",
        "name": "Api-demo",
        "region": "us-east-1",
        "dependencies": [
          "Datastore-demo",
          "Secrets-demo"
        ]
      },
      {
        "id": "Dashboards-demo",
        "name": "Dashboards-demo",
        "region": "us-east-1",
        "dependencies": [
          "AsyncJobs-demo",
          "Api-demo"
        ]
      }
    ]
  }
}

The workflow below is easy to define programmatically from the stack graph, which allows GitHub Actions to do all the heavy lifting of orchestrating the jobs for us.

GitHub Actions Workflow

Due to network latency and variance in the GitHub runner setup, this change sometimes causes our fastest deployments to slow down. However, the median deployment performs 1 minute faster. Most importantly, our p99 deployment times always perform 12 minutes faster than before: 18 minutes!

At last we achieved our goal! With everything sped up and working smoothly, we are able to add it to our team’s projen setup and make this available to all of our other services.

How to get started

To see how this works, you can find a full working sample on GitHub.

The sample repo provides an un-opinionated example with just one stage to deploy. You could build on this in a few ways, such as specifying stage orders, implementing integration tests or whatever else is needed in your stack.

Share
Previous
Introducing lookup tables in Mappings
Subscribe

Get blog posts delivered to your inbox.

Products
EDI CoreMappingsConverterPricing
Follow
  1. Twitter
  2. GitHub
Backed by
AdditionBloomberg BetaFirst RoundStripeUSV
System StatusCustomer AgreementService TermsPrivacy Notice

Stedi is a registered trademark of Stedi, Inc. All names, logos, and brands of third parties listed on our site are trademarks of their respective owners (including “X12”, which is a trademark of X12 Incorporated). Stedi, Inc. and its products and services are not endorsed by, sponsored by, or affiliated with these third parties. Our use of these names, logos, and brands is for identification purposes only, and does not imply any such endorsement, sponsorship, or affiliation.