I must be getting tired. Honestly, I thought posting my session notes would be a pretty straightforward task but because I love documentation so much, I’ve made it harder than it needs to be. My note taking problem is that I try to jot down everything as opposed to what may ultimately matter. Anyway, on this post I’m going to give you my “raw” notes…unedited, unfiltered, and unspellchecked (is that even a word). I thought this was one of the best sessions I attended and hope you will gain at least a little insight in regards to AWS best practices by reading these notes.
- anti-pattern successful in short term but can turn into a fault, can also lead to best practices
- best practices come from the investigation of anti-patterns
- best practices are learned and often earned
- explain the value of a particular best practice
- we can learn from the behavior of others
- we don’t invent best practices sitting around and thinking
Anti-Pattern #1: Loss of Control / Poor IAM Access Key Controls
- AP’s lead to real outages
- loss of control of an AWS account
- AWS reference architecture (https://github.com/awslabs/aws-refarch-wordpress)
- using API you can create multiple well-architected infrastructures / it’s easy
- can operate AWS API from anywhere, authenticated with IAM
- we create accounts for humans to administer the account and give them out (broad permission sets, use root account, etc.)
- security can end up leaving our control because they are given to users
- can lose control because you don’t know what all of the various accounts do
- many accounts become persistent when they don’t need to be
- temporary credentials can be intercepted by a user and use those credentials
- an intruder can shut down the entire infrastructure, an IAM user can be everything in a single IAM user
- making backups, CloudTrail, etc….we can understand the scope of the event but can still locked out, remove access to my backups
- ways to mitigate this AP
- create multiple AWS account / credentials are scoped to an account
- don’t put your prod and backup using the same user
- account A can write to an AWS acct. but can’t delete
- if acct. A deleted, use B to backup to C
- cost neutral
- IAM best practices http://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html
- establish separate administrative domains
Anti-Pattern #2: Control Gaps
- AWS CloudTrail is awesome
- get user, IP, etc.
- AWS config
- point-in-time snapshot of AWS inventory / inventory in DC can span weeks with a mostly accurate view of what’s in the DC
- What’s wrong 167.55.180.10/0 (give access to about 5% of the internet)
- can look for /0 – detect and automate
- easy to see with compliance automation
- Making S3 buckets public temporarily to get around security but forget to change it back
- Is it meant to be public or should it not be?
- can be hard to tell
- try to find gaps in the automation
- things on right hand side happen after changes have happened (Config, CloudTrail, S3, etc)
- augment aws config with aws managed rules – “managed rules for config” has a rule to check if an S3 bucket is public
- consider eventing to SQS and Lambda to investigate changes and if it’s not compliant, Lambda will revert it back to original config
- Amazon Macie – look in S3 bucket and use machine learning to determine what’s there / helps you understand how data is being accessed and if it’s being access the way you want it
- consider change control (right-side stuff happens after a change)
- another pair of eyeballs looking at changes is essential
- use CloudFormation to automate stack
- can actually automate Change Control / intercept changes on the front end
- don’t use AWS mgmt. console to manage AWS resources / fantasict way to break automation / don’t use console for read-only access
- Utility of auditors / what is the next gap? there will always be a next gap
- continual audits key to identifying the next gap
- partner applications to probe resources, make sure rules are enforced, active penetration testing
Anti-Pattern #3: Automating Outages
- can easily automate a deployment using CloudFormation/Chef, etc
- TerraForm to build automation
- start thinking about blue/green deployments to push new code into production
- but if using CloudFormation, you could delete your database
- easy to get things out of step in AWS
- need to start using CLoudFormation
- decouple the infrastructure as it relates to their use in production
- decouple web server from DBs
- carve out functional components, especially the stateful components
- decouple the infrastructure as it relates to their use in production
- AWS Management Console
- limit interactive access to infrastructure
- don’t make a change one may forget to undo tags
- version number, like an app version number
- is change associated with valid version of my app
- make multiple cloudformation environments
- change test to make sure it’ll work in prod
Anti-Pattern #4: Schrodinger’s Backup
- “You don’t have backups if you don’t test them.”
- Schrodinger 1935
- unless you measure what happens, how do you know what happens?
- my business is the data
- a 0KB backup is useless
- “a backup is just data until you test it”
- are files getting bigger everyday? backups rarely get smaller
- are EBS snapshots going up? are old ones being deleted?
- backup failures never happen on unimportant files
- today, you can write lambda code to check backups, or the size of the backups
- automate but monitor the backups
- EBS snapshots are not great for snapshotting a hot database, use the native tools
- replication is not a backup
Establishing Best Practices
- want to learn from other’s failures
- learn as much as possible prior to putting things in production
- war gaming – sit down at round table and toss out scenarios and ask “what happens if we lose control of our root account?”
- do every quarter / paper exercise
- prioritize based on risk
- use AWS services / AWS Trusted Advisor
- use Security Partner Solutions
- can help you look at your cloudtrail / external penetration testing
- review the AWS Well-Architected
- collection of published best practices
- 56 questions in the whitepaper, read and consider them
- Perform reviews to build a prioritized list
For more information, check out the Amazon Web Services YouTube channel….