During their conception, in terms of infrastructure many software projects usually start with a single environment (usually a staging/QA one), and most of the effort focuses on the development of functionalities, user experience, and the ability to scale in order to move to a productive environment as soon as possible. In this phase it is expected to execute the provision of the cloud components that will be part of the software/platform architecture and an initial cost estimation.
After the software/platform reaches production and becomes stable, in general it is a good practice to do a full end-to-end review of the cloud resources used in production and check if any further optimization is required, now with real user base and usage stats.
In this article we are going to review some small easy-to-apply tips that can help reduce infrastructure costs. Of course, the impact of what is discussed here will vary depending on each project's characteristics. The important outcome is that, with little time these small adjustments can be configured, achieving an immediate impact on current and future costs. At Ensolvers, we've collected a lot of know-how throughout years of experience working with AWS in several projects.
In AWS, the de-facto solution for log storage and search is Cloudwatch. Cloudwatch can store logs of practically all of AWS services: ECS clusters, Lambdas, CI/CD pipelines, etc. While logs are key for engineering teams as a troubleshooting or analysis tool, in most of the cases after a certain time these logs become irrelevant, so the key is to identify which is the window of time that we want to keep.
In AWS Cloudwatch Logs, a retention rule can be automated for each group of logs in Cloudwatch to automatically delete logs that are no longer needed:
- Go to Cloudwatch service in AWS console
- Go to Menu Logs -> Log Group
- Search the log group to change and click in retention policy column link
- Select the new retention policy and save changes.
Let's show a visual example to preserve only logs for the following 12 months
In this case, if we apply this to all log groups within the project that have been running for more than 2 years, an immediate 50% cost reduction would be achieved, and the cost of storing logs would be prevented from increasing in the future as long as the behavior of the system is similar. It is important to remember that a project can have several dozen Log Groups, so instead of saving a few GBs we could be talking about several TBs of information.
The most relevant groups of log streams in one of our projects generated around 581 GB per year, being that the cost of storage per month of them is 581 * 0.03 = 17.43 USD . Since the project was running for several years, we've reduced the retention policy to 12 months, saving 34.86 USD monthly. Not only is it an immediate reduction in spending that annualized would be about 418.32 USD, but it would prevent it from continuing to increase over time.
EC2 is the core computing service that AWS offers and for it we can apply the same kind of practice, specifically for the snapshots of the VM instances. EC2 does not have a snapshot retention or generation, snapshot generation can be a manual process or automated externally through scripts or Lambdas for example. Automatic snapshots are usually created when the volumes are created, and in some other specific situations, to give an example, with the use of elastic beanstalk.
It will not be covered in this article and it is important to comment that AWS only charges for the differences between one snapshot and another, for instance if we have a 100 GB storage device with 100 snapshots but only one GBs varies from one snapshot to another, the 100GB of the original snapshot and 1GB x 100 snapshots would be charged, that is, 200GB billed.
S3 (Simple Storage Service) is the main AWS service used to store information in a key/value fashion. In some cases it is used internally by AWS for storing information from other services like Cloudwatch logs.
Over time, S3 buckets (the most basic object grouping that the service offers) tend to increase their size increasing the billing. In a lot of situations it is possible to reduce the bill for S3 using the features that it provides like lifecycle, versioning and different types of storage categories.
The lifecycle is a series of rules that the admin could configure and that define actions that Amazon S3 applies to a group of objects. With these rules is possible to configure:
To configure the S3 lifecycle:
- Go to S3 service
- Open the bucket
- Click on management tab
- Click on create lifecycle rule button
- Configure rules
- Save changes
Using these rules is possible to drastically reduce the cost on S3, in our case ,we reduce around 90% of storage used.
After applying lifecycle rules, we've reduced the use of the service to 10% of the original use by consuming less than 400GB. In our case, we decided to remove all old content like legacy build binaries or old build cache versions. The final cost was as described below:
Annual cost = $9.2 * 12= $110.4
Reduction of more than $1,100 annually.
Another service in which we can apply a retention policy is RDS, in particular in relation to database backups. In all projects it is essential to have a database backup policy in order to be able to recover from a possible critical issue. However, these backups can become very large and depending on the frequency with which they are generated, we could have hundreds or thousands of copies.
If you use AWS Aurora as we do in most of our projects, there is no additional charge for backup storage of up to 100% of your total database storage for each DB cluster. There is also no additional charge for backup storage if your backup retention period is 1 day and you don't have any snapshots beyond the retention period (although this is not a very realistic scenario).
In our project we implemented a custom way to generate the DB backups: we implemented a script in a Lambda and this is triggered everyday. This script makes the dump, compresses it and moves it to S3. To remove old backups we use the S3 LifeCycle policies explained below in this article. This allows us to satisfy a customer request of having at least 1 year of DB backups.
However, if we want to use the classic RDS approach (limited to 30 days timespan), we can configure the snapshot strategy in the RDS service homepage by doing to following:
- Go to Database menu
- Select the database cluster/database instance and click on Modify button
- Scroll to Additional configuration and configure backup periods.
- Save changes
This is indirectly another successful use of S3 LifeCycle feature. For one of our customers we have a total DB storage used of 17 GB. With a 30-day retention period, we have 510 GB per month in snapshots.
510 GB Exported to S3 every month * $0.023 per GB Stored in S3 represents $11.73 of cost added per month. That means that, since we are not deleting old snapshots, we accumulate more and more costs over time. So, if we consider this annually, as we show in the table and chart below, our customer will be paying more and more over time since we are adding more consumed storage at a constant pace.
By setting up S3 retention policy in this case so logs will be retained for no more than a year, we limited the cost to $915 yearly by avoiding storing snapshots for more time than needed.
In AWS, with a few small adjustments that take no more than a few minutes we can apply some configurations and policies to achieve immediate cost reductions, which will be permanent from the moment they are applied. These policies and configurations apply to most of the domains and contexts so we were able to apply them to many of our customers.
In this post we are not considering other aspects like the nature and usage of some particular environments that have a custom infrastructure. For instance, the rules to be applied in the Staging environment vs. those applied to a Production environment can be very different. For example, in a Production environment it might be required to keep logs for six months to be able to audit past periods while in Staging one month would be more than enough since that information will be used only for bug fixing or troubleshooting.
In the next article in this series we will address other techniques that can be applied to improve cost reduction in AWS.