“There’s no compression algorithm for experience. You can’t learn certain lessons without going through the curve.” Andy Jassy, CEO AWS
There’s no substitute for experience, I’m sure this is a lesson we’ve all learned many times. I had a refresher course in this recently while trying to further refine a CloudFormation template I’ve been working on. I was experimenting with adding an S3 VPC Endpoint and it wasn’t working as I had anticipated (though it was working as it should, the problem was really a “user error” but let’s move on….) so I then decided to perform the following to resolve the S3 VPC Endpoint problem:
- Delete the S3 VPC Endpoint manually
- Tweak my CloudFormation template to fix the S3 VPC Endpoint creation statement
- Perform an update of my CloudFormation stack to deploy the new S3 VPC Endpoint
So I did that and when I ran a stack update, I received an UPDATE_FAILED message because the VPC Endpoint I had created with the previous stack update no longer existed:
This is a clean blog, so I’m not going to share the first word that came to my mind but suffice to say, I immediately thought about worst case scenarios, that I had ruined my stack and everything in it, that I had hosed my EC2 instances, etc. Granted, this isn’t a production environment so it wouldn’t be the end of the road is I had to start from scratch but still, who wants to do that?
The first lesson I learned is that resources created as part of an AWS CloudFormation stack must be managed and modified through stack updates. Some resources, like an IAM role that is tracked by name, may be re-created with the same exact name if they are manually deleted to get stack updates working once again. But other resources, like VPC Endpoints, are created with a unique ID and resources with a unique ID cannot be manually recreated.
Fortunately, I didn’t totally ruin my infrastructure and this failure was pretty easy to recover from.
- Remove references to the VPC Endpoint in your CloudFormation template and run a stack update.
With each new feature added into my CloudFormation template, I’d save the template with a new “version” number by simply adding v## to the end of the file name. In this case, I added the VPC Endpoint in v9 of my template so I ran a stack update using v8 of my CloudFormation template. As you can see below, performing this update would Remove the VPC Endpoint resource.
Second Lesson Learned: Keep “versions” of your CloudFormation templates.
- When the update completes, you should see a DELETE_COMPLETE event showing the removal of the VPC Endpoint as shown below:
- With the VPC Endpoint properly removed, edit your CloudFormation template (in this case I edited my v9 template) to set the correct VPC Endpoint statements/options and save it. Update the CloudFormation stack with the v9 template to see that a VPC Endpoint will now be added.
- When the stack update completes, a new VPC Endpoint should be created and available as shown below:
In wake of my mistake, I suggest the following when using CloudFormation to deploy AWS resources:
- Document your CloudFormation templates and share with your team
- Don’t develop and deploy CloudFormation templates without communicating to your team their purpose and what they create. Maybe you are the type who would never delete a resource created by CloudFormation, but you may have team members who will. In this example, when I manually deleted the VPC Endpoint, I wasn’t prompted that it was a resource created by CloudFormation and that deleting it could have consequences….so you’ve got to let people know what you’re doing and creating with CloudFormation templates
- This may not even need to be said, but remember to manage and/or modify resources created by CloudFormation with stack updates, not manually
- Use some method of versioning your templates, notating the changes so as to make it easier to recover from manual deletions of resources.
If you manually delete resources created by CloudFormation, don’t immediately jump to despair and the conclusion that you have just completed a resume generating event. Take a breath, evaluate the situation, and then execute the solution. In this example, I was able to perform multiple CloudFormation updates without affecting the availability of my EC2 instances.
I created a template to create an S3 bucket. I manually deleted the bucket, and now when I try to update the Cloud Formation Stack with a renamed file, I get the error message: “No Updates are to be performed.” Any ideas about this?
Hey Amir, so the template file itself….though its renamed, are the contents the same? Are you trying to re-create an S3 bucket with the same name as the first template file?
Yes. Same exact bucket
Hey Amir, I saw the same thing you did and to this point, haven’t been able to re-create the bucket using the same CloudFormation stack. I tried deleting the bucket and then adding a retention policy attribute to the bucket and though that upgraded stack completed successfully, it did not recreate the bucket. You’ve probably done the same, but all I’ve been able to do is delete the current stack and deploy a new one to re-create an S3 bucket with the same name. I have been looking at how to use CF to delete a bucket so that maybe you could upgrade the stack to include a template that begins by deleting the bucket so that CF is more “aware” of the deletion but to this point I’ve not been able to successfully do that. Thanks for the comments and please keep in touch.
Thanks! you saved my time 😀
Thanks!! This saved me a lot of trouble. I had deleted a role manually and then my stack update wouldn’t fix it no matter what I tried. Following your example, I fixed it by changing the name of the resource (from “LambdaRole” to “MyLambdaRole”) and it basically did exactly the same thing – first deleted “LambdaRole” and then created “MyLambdaRole”.
T’Challa, I’m glad this post helped you out. Thanks for visiting, reading, and sharing a comment!
It saved my day. I was literally sweating once I accidentally manually deleted my TargetGroup. Removing this and its references got it working. Thanks man!
Thank God I found this 😀
Ideally, AWS will make it so CloudFormation isn’t so brittle. This is a bad design problem where you have to be extra careful with what you do, lest you corrupt your stack. Assigning these special logical IDs to everything is a mistake. It should reference generic tags that anyone can update or modify.
That way, when something goes wrong, you can fix it without pulling your hair out. Production changes become much less stressful.