As promised through Twitter, here are my notes from my second Thursday morning session from last week’s AWS Public Sector Summit in Washington, DC. This session, on AWS Cost Optimization led by Matt Johnson, was packed! Honestly, I was a little surprised by that initially though I’m not sure why….I mean, I have my AWS SA and SysOps Associate certifications so I obviously know everything there is to know about AWS cost optimization so didn’t need to be there, but I attended hoping to glean a little nugget here or there. I’m kidding….really I am, but what was abundantly obvious is that many people are hungry to learn and implement AWS cost optimization best practices and I believe Matt did a great job of providing us information to “chew on” (even as I’m writing this I’m still chewing on this session) in regards to optimizing our AWS costs.
Say your organization has just completed a “lift and shift” of its virtual machines from an on-prem datacenter to AWS, what comes next? Many times what comes next is the realization that you are either not saving as much money as you thought you would OR you are spending more than you had planned. What do you do now? Though Matt spoke for a few minutes on comparative TCO analysis between on-prem and AWS environments, I want to focus on his “Pillars of Cost Optimization”. If an organization does the work and really sinks its teeth into these considerations, it may be possible to significantly reduce the monthly AWS bill.
Pillars of AWS Cost Optimization
- Right-size EC2 Instances
“Choose the cheapest instance available while meeting performance requirements”
We’ve all probably seen it….a new application calls for a high-powered SQL server with extraordinary CPU and RAM requirements so on the VMware infrastructure, the VM admin creates a hefty VM with 8vCPUs and 32GB RAM and let’s assume when the VM is migrated to AWS, its configured as an m4.2xlarge EC2 instance. An m4.2xlarge on-demand EC2 instance running Windows will cost $562.18 per month.
Does the SQL server really need to be an m4.2xlarge instance? If you could scale it down even 1 level to an m4.xlarge on-demand EC2 instance, you’d cut the AWS bill for this machine by 50%!
That’s significant! And if you have dozens or hundreds or thousands of instances running in AWS, right-sizing the instances based on actual performance data/metrics has the potential to save an organization a lot of money.
Now, to save money should an organization simply go into the AWS console and downsize all EC2 instances by one type? Certainly they could, but this organization would run the risk of negatively impacting performance. The better solution for determining which EC2 instances are candidates for a type scale down is to leverage CloudWatch metrics and alarms. CloudWatch will prove itself invaluable in regards to evaluating and right-sizing EC2 instances.
- Strive for Elasticity
Typically on-prem datacenters are massively underutilized because organizations plan for peak loads as nobody wants to be the IT manager responsible for hitting a capacity wall, thus causing downtime and losing the organization money. When this is the primary concern, servers with dozens of CPU cores and GBs and GBs or RAM are ordered along with the latest FLASH array filled to capacity. This environment may never reach its resource capacity limit, easing the immediate concerns of the IT manager, but being that its resource capacity limit is never reached means there exists a resource surplus and though this waste may be hard to quantify as a cost, it is waste nonetheless.
What if the environment could be configured in such a way as to meet demand “on-demand”? AWS provides organizations elasticity, the ability to monitor resource demand and automatically increase or decrease deployed resources accordingly, through the use of auto-scaling groups. The most commonly used example for auto-scaling groups are webservers. As an example, an auto-scaling group that starts with a single webserver can be configured to deploy a second webserver when CPU utilization of the first instance reaches 60%. The auto-scaling group could also be configured to terminate the second instance should CPU utilization fall below 60%; thus auto-scaling groups provide a means of eliminating resource surplus and waste….use them whenever possible.
Additionally, when building auto-scaling groups, the idea is to use the smallest instance possible that meets the minimal demand. This is perfectly acceptable since auto-scaling can accommodate changing demand. As an example, if the on-prem servers hosting an organizations website Windows VMs running IIS with 2vCPU and 8GB RAM, these can equate to m5.large EC2 instances on AWS. An m5.large on-demand Windows instance will cost you $137.62 per month. But does an organization need an m5.large instance to serve as the base of an auto-scaling group? Certainly “it depends”, but it may be more cost effective to support more smaller instances as opposed to fewer larger instances. As the table below shows, (2) Windows m1.small instances and (4) Linux t2.medium instances are cheaper per month than a single Windows m5.large instance. The point is simply to consider smaller instances in auto-scaling groups since they cost less AND can dynamically meet the environments resource demand.
- Pick the right pricing model
The idea here is to consider workload scheduling….do your instances need to run 24/7? If they do, then you should consider purchasing reserved instances to save a significant amount of money.
A 100% on-demand Windows-based m5.large instance will cost $275.24 per month. A 1yr, all upfront reserved instance for this type would cost $2615, making the effective monthly cost $217.92. A 3yr, all upfront reserved instance for this type would cost $6772, making the effective monthly cost $188.12. So, reserved instances can save you money….shouldn’t you buy reserved instances for all EC2 resources? The answer is simply ‘NO, NO, NO!!!’, and go find some friends who like to do math.
The decision to purchase a reserved instance is a no brainer if you know the instance will stay on 24/7 for the entirety of 1 year. But what if you only need this instance to run 24/7 for 6 months OR maybe it’ll run all year, but only from 8am-5pm (whatever). Use the AWS cost calculator to determine the various costs for each scenario prior to purchasing reserved instances. Using the 24/7 for 6 months example, in an on-demand pricing model an m5.large Windows instance would cost $1,651.44, almost $1,000 cheaper than the reserved instance price.
“Right size and then reserve”
I cannot overstate the importance of right-sizing your instances prior to purchasing reserved instances. Do NOT purchase an m5.large reserved instance if your workload has no performance issues when deployed using m1.small instances. As stated above, a 1yr. all upfront RI for a m5.large is $2615 whereas the same RI for an m1.small is $419.
- Match Storage Needs to Storage Class
AWS provides its customers many storage options such as S3, Glacier, EBS (block storage), Storage Gateways, EFC (managed NFS), and CloudFront. Though a detailed analysis of the storage options and their use cases is beyond my intended scope on this post, know that to optimize AWS costs, an organization must rightly discern their storage needs.
For example, when you build an EC2 instance, there are several storage options to choose from. Two of them are General Purpose SSD and Provisioned IOPs. Provisioned IOPs provides faster performance, but it’s more expensive, and odds are that you would not need its capabilities for every EBS virtual disk you create.
Also, if an organization will be storing backup data to an S3 bucket via a storage gateway, use backup retention policies or S3 lifecycle management rules to move older data to Glacier storage to save on storage costs.
- Use Serverless Architecture wherever possible
Serverless….everybody is talking about serverless and for good reason. Instead of spinning up servers to host applications, “serverless” allows you to execute code as a service. You input your code, the function is performed, output is generated, and then the process shuts down. With serverless, an organization does not pay for underutilized resources but only for those resources used during the execution of its code so serverless can have a dramatic impact on an organizations AWS costs; serverless functions have the capability to “massively reduce your AWS cost” to paraphrase Matt.
In addition to saving you a ton of money, serverless provides the following benefits:
- Flexible scaling
- No instances to manage
- The API Gateway costs you nothing if nobody uses it
- High Availability by design
- Great for bursty workloads
- No Idle capacity
- Measure and Monitor to ensure costs are optimized
Basically an organization needs to understand that AWS cost optimizations aren’t a set of best practices that are setup once and then forgotten. To ensure your AWS costs are optimized, you must measure and monitor the AWS environment to ensure your instances are right-sized, that you are using the best pricing model for your instances/resources, that you are using the best storage for your storage needs/use cases, etc., and this takes work.
Fortunately, AWS provides you with tools to help. As an organization builds their cloud environment, use the following tools to help with cost optimization:
- Amazon CloudWatch
- monitor AWS resources (EC2, Lambda, API GW, RDS, etc)
- set alarms and determine how to proactively react to them
- define custom performance/usage metrics
- view graphs and statistics
- AWS Trusted Advisor
- monitors an AWS account and makes recommendations based on best practices
- AWS Cost Explorer
- dig deep into the AWS bill
- AWS Reserved Instance Scheduler
- can be used to specify custom start/stop schedules for EC2 instances
Conclusion
Understanding and optimizing your AWS costs will take effort, there’s no getting around that. It’s not a one-time set and forget task but one that requires continuous evaluation but your work will not be in vain. If followed, these 6 cost optimization pillars have the potential to dramatically reduce the monthly AWS bill. If you don’t know where to start, consider the following questions:
- Are my EC2 instances right-sized?
- Can I deploy smaller instances by using auto-scaling groups?
- How many instances need to be running 24/7/365?
- Will I save costs by utilizing reserved instances?
- Can I start taking advantage of serverless architecture?
I just checked YouTube to see if Matt’s session has been posted but could not find it. I suggest checking periodically so you can hear it from him first hand.
I’d love to hear any comments you may have on this topic so please leave a comment or send us an email through our Contact page.