Hello everyone! My name is David and I haven’t posted in like 6 weeks. Helping a customer recover from ransomware can be a consuming and exhausting task. Additionally, I was fortunate to be selected as a delegate for Cloud Field Day 5 in Silicon Valley a couple weeks ago and it too can be a consuming and exhausting task, but in a much, much, much better way than ransomware. I hope to write about CFD5 very soon but this topic has been on my mind for some time and it’s causing “blockage” in that I don’t believe I can write about CFD5 until I get this post done.
The idea for this post originated in some questions asked by a new AWS customer and the one we’ll look at today is, “How would I restore my VPC to another region?”
NOTE: For this post, I’m going to focus on CloudFormation as the IaC (Infrastructure-as-Code) mechanism though the concepts should be applicable to any IaC tool.
My First CloudFormation Template
Firsts are always special. When you sit back and daydream about days past, do you fondly remember your first Atari, your first dog, your first kiss, your first car, your first house….your first CloudFormation template? You can be sure I remember my first cloud formation template like it was yesterday. I created a template to deploy a VPC with multiple subnets in multiple AZs, routing tables, an internet gateway, NAT gateways, Network ACLs, an S3 endpoint and so on. It was GLORIOUS!!! and I was beyond thrilled to become a the dev/ops engineer community.
So, how do I restore my VPC to another region? It’s easy. I just deploy my CloudFormation template into that region and depending on the reason for the VPC restore, I’m ready to build a test/dev environment that mirrors my production environment or I can begin to progress through a DR plan to get resources up and running. But….not having completed my metamorphosis to a dev/ops engineer I made some mistakes in regards to managing my AWS environment “as-code”.
A Common Mistake?
My mistake was though I built my initial AWS environment with a CloudFormation template, with code, I didn’t continue to manage it as code. Instead of making changes to my CloudFormation stack using a modified template, I reverted back to my pre-dev/ops “just login and fix it” mentality. Outside of the CloudFormation stack, I made changes to subnets, routing tables, NACLs, security groups, you name it so if I wanted to restore my VPC, as it exists today, to another region using my CloudFormation template, I couldn’t easily do it. To match my existing VPC configuration in a new region, there would be additional work beyond simply deploying my VPC stack as my actual environment has “drifted” from the original CloudFormation template.
To be honest, I don’t know for certain if this is a common problem or not, but if I was forced to, I would bet this is a fairly common occurrence when, what I’ll call “traditional deploy/break/fix data center engineers”, begin transitioning to a dev/ops mentality. Before the rise of dev/ops, I troubleshot a problem, the issue was diagnosed, and resolved through a management console or script/CLI. For me personally, it’s taken some time to get used to troubleshoot, diagnose, update JSON/YAML code, update stack, and test.
Getting Back to the Original Question
So, restoring your VPC configuration from one region to another can be both easy and hard but consider the following:
- Restoring your VPC configuration to another region will be easy if you can ensure all changes have been made programmatically “as code” through CloudFormation stack updates (or whatever you may use to build your environments). In this case, you can duplicate your environment by simply deploying the VPC stack to another region. This would certainly be the ideal case.
- However, can you be 100% sure that your organization conforms to the ideal? Now, I imagine that one will not wait for a disaster to test the restoration of a VPC so you’ll probably have at least some awareness as to how aligned your environment is with your CloudFormation stack. To gain insight into specific differences, you can now use CloudFormation Drift Detection. Though I love the idea, in my personal experience, I have found the actual results of drift detection to be somewhat of a mixed bag. For example, I have seen that if I manually remove resources created by a CF template, drift detection will inform me that my environment has drifted because a resource has been removed. But, if I manually add a resource (i.e., not updating the CF stack) and run drift detection, I typically get one of two results. One, I’m told my environment is “IN SYNC” or two, I get failure message because the “rate was exceeded”….I opened a case but hadn’t received a really good answer as to why this happens but if I get an answer, I’ll update you. It’s not my intention to discourage you from using CF drift detection, I encourage you to use it, I just want to caution you because your drift detection may find your environment to be IN SYNC even though it isn’t. Again, in my experience IN SYNC means all resources created by a CF stack still exist but it does not take into account resources added to the environment manually, bypassing the CF stack update process.
- I’ll expand on this thought on a future post, but you can also use Veeam/N2WS (v2.4 and up) to capture and clone your VPC settings allowing you to recover your VPC configuration in any region. N2WS will create a CloudFormation template based on an existing VPC (or VPCs) and use it to launch a new stack and create the VPC resources in another region. However, as great as this is, there are several resources, including but not limited to NAT gateways, Elastic IPs, and VPC Endpoints, that cannot be backed up/restored in this manner.
How can we minimize environmental drift to ensure a successful VPC restore?
So, as I’ve already stated earlier, restoring your VPC to another region can be easy or it can be hard. The ease by which the restoration happens is largely impacted by the amount manual resource creation or drift in the environment thus we must endeavor to ensure drift is minimized.
What have I learned in the two years since I deployed that first CloudFormation template in regards to managing AWS resources? I attended the Carolina VMware User/Con this week and saw the following statement on a PowerPoint slide:
Modularized components can make for much better lifecycle and sustainability when done right!
Though this statement was not made in regards to managing CF templates/stacks, I agree with it wholeheartedly. My advice to you is not to deploy every single resource in a single CF template but break up your environment into manageable silos (silos seems like a bad word) based on team responsibilities.
- Perhaps the data center/network team builds the foundational CF template that includes the VPC, subnets, internet gateway, and route tables.
- The security team builds an NACL template, CloudWatch modifications, and WAF configuration.
- The application team builds templates for their specific application servers including EC2 instances and security groups.
Modularize your environment to allow your teams to most effectively manage their sphere of influence and train them on the importance of maintaining that environment using dev/ops and IaC techniques. As an organization moves into the cloud, this can ensure as seamless a transition as possible to another region should a disaster occur. But as a cloud journey continues, and as the understanding of IaC grows within an organization, these techniques can be applied to deploy a global infrastructure hosting cloud-native applications that are not dependent on any one region for sustainability and availability.