• AWS at Scale
  • Posts
  • AWS at Scale #4: Building at Scale without Compromise

AWS at Scale #4: Building at Scale without Compromise

I’ve spent years doing AWS at scale. So here’s a blueprint for product hosting at scale, to support development at velocity, all on AWS.

Table of Contents

Time to Read: 5 minutes, AI Content: Zero

I don’t use AI for writing content, I write how my brain works (I’m also neurodiverse) so sometimes my posts are structured a bit weird.

Introduction

Building on AWS at scale can be mind boggling at times, there are so many experts out there offering their good advice, opinions and best practices on the subject. One can easily become lost in the noise of it all and then forget about the core building blocks and decision making that will pay dividends in the future. With that said, here is my advice and opinion on the subject! 😆

I’ve spent years doing AWS at scale. So here’s a blueprint for product hosting at scale, to support development at velocity, all on AWS.

Understanding this will depend on where you are on your AWS/Cloud architecture journey, so I’m not expecting you to attempt to use this concept, it’s just a glimpse into how to host and develop product on AWS at an industrial scale.

Click the image below for a high res downloadable version

Product development at scale on AWS - tap the image for a high res version

The Tech Stack

  • Platform: AWS Cloud

    • AWS Organisations

    • AWS Control Tower

    • AWS EventBridge

  • DevOps foundation building blocks: Gruntwork

    • Enhanced AWS account vending

    • Battle tested consumer IaC module

    • Consumer and Provider pipelines

  • Enhanced IaC, pipelines and repo functionality: Terragrunt

  • Zero trust perimeter access: CloudFlare

  • Code hosting, actions, security & vulnerability scanning: GitHub

So what’s going on here

1) The Provider IAC Module Directory

Hundreds of battle tested commodity IaC module are wrapped, documented and made available for consumption. No need to write your own IaC, all that’s needed to consume modules is a wrapper that references the remote module and provides the configuration parameters.

Key points:

  • Consumers browse the modules, select and scaffold for immediate consumption, add to their consumer repo to deploy AWS resources at scale across dev, stage or prod within minutes.

  • Centralised modules are version controlled and tagged.

  • Centralised modules are a trusted source (over external modules) for battle tested IaC.

  • Consumers are notified when consumed modules require updates (and are expected to update their code within a given timeframe).

  • Consumers do have the ability to bring their own IaC if needed.

2) The Consumer Repo

A single ‘consumer’ repo spans dev, stage and prod accounts. Each AWS account is represented as a folder within the repo.

IaC Consumer Repo - AWS at Scale

OIDC connected GitHub Actions combined with a pre-defined CI/CD pipeline & Terragrunt integration provides promotional workflows to deploy AWS resources into the relevant account as code moves from dev to stage and finally, into prod, all within a single repo.

Key points:

  • Consumers create clean Terraform HCL wrappers that reference remote modules for immediate IaC modules to deploy resources.

  • Access outside of the pipeline should be restricted to promote the standard of ‘everything as code‘ over console access.

3) SDLC AWS Account Patterns

For higher tiered workloads (stuff that pretty important to the business for various reasons) dev, stage and prod accounts are segregated at the top level (like a ships bulkhead).

You’re workload and AWS account structure should designed like a bulkhead

If Dev is compromised (more developers have access to dev) then it doesn’t bleed into stage or production. Cost codes are tagged at the account level, providing clear consumption costs across all environments for charge back.

Key points:

  • Accounts and subsequent environments are all workload specific, no mixing of unrelated workloads or environments within the same account.

  • Prod is obviously prod, access is significantly restricted, observability, security, threat modelling and other SRE tooling is deployed to support the criticality of the workload within this account.

  • Stage is a mirror of production, used for UAT/QA and load testing.

  • Dev is a place where new features can be tested against the code base.

  • Although Sandbox isn’t visible within this account pattern, it is a different use case from dev and they shouldn’t get confused.

  • Zero trust perimeter access is configured for external facing URIs in Dev & Stage.

4) The Provider Repo

Centrally governed resources such as reference architecture based VPCs, AWS config, account security baselines and mandatory tagging are deployed through an OIDC connected GitHub Actions based ‘provider‘ pipeline.

IaC Provider Repo - AWS at Scale

  • The provider pipeline and repo is attached at account vend.

  • Centrally governed resources are deployed through automation once the accounts successfully vend.

  • Once provisioned, the VPC configuration can’t be changed (SCP governed), it also caters for all workload scenarios & requirements.

  • Mandatory tagging is deployed with values during the vending process and can only be updated centrally through automated approval workflows.

  • Consumers (workload builders) do not have access to this repo, they deploy AWS workload specific resources through their own consumer repos as outlined above.

5) Blue Green Deployments

Container building, versioning and release to Dev, Stage and Production is done outside of the SDLC AWS account pattern, using a standalone account.

  • Further segregation and separation of concerns.

  • Images for Dev & Stage are not stored in Prod (or any other SDLC account)

  • Image building and orchestration is automated.

  • Images are scanned for vulnerabilities.

  • Allows for different permissions than Dev and Stage accounts.

6) Dev, Stage and Prod Users

Access to URI’s in Dev and Stage are controlled by zero trust perimeter boundaries, providing a simple authentication strategy for UAT/QA and Load testing.

7) AWS at Scale by Lee Wynne

I hope you’ve found this useful, if so, please share the link to the web version of our newsletter on your socials for anyone else to discover.

You can also find me on LinkedIn and on X

All the best, Lee

Now that you’re here - some other posts that might be worth your time..

AWS at Scale Series Part One: What does AWS at Scale even mean?

AWS at Scale Series Part 2: Backing from the Board

AWS at Scale Side Quests

Reply

or to participate.