AWS EC2 Spot Instances are a neat way to optimize compute spend on AWS by up to 90 %. As always, there are numerous options available for using Spot Instances. Understanding the implications of using Spot Instances can help customers use them in more ways than is prevalent currently and greatly reduce costs without compromising on availability.
What are Spot Instances and Why are they discounted so much?
Spot Instances are spare compute capacity available at massively discounted prices as compared to OnDemand Instances. It is a way for AWS to monetize spare capacity.
In terms of capabilities, there are a few key differences that can be summarized as follows.
The following table lists the key differences between Spot Instances and On-Demand Instances.
|Spot Instances||On-Demand Instances|
|Available capacity||Spot Requests can continue to look for spot instances if the capacity is not available immediately until they are fulfilled.||if capacity is not available when you make a launch request, you get an insufficient capacity error (ICE).|
|Varies based on demand per Availability Zone||Constant per Region|
|Instance interruption||Start/Stop operations are only possible by EC2 Spot Service when capacity is not available/ price is more than maximum bid price/ demand for Spot Instances Increase with a 2- minute notice||User controlled or via CLI/SDK. AWS will not terminate instances irrespective of demand.|
|Launch Time||As of Nov 2017, these would launch immediately and are similar to OnDemand instances except as noted in this table||Launch Immediately|
So What does this mean?
- Spot Instances are a natural fit for compute pools that handle stateless workloads like Big Data / Distributed/HPC/Batch computing as the software framework is inherently fault tolerant.
- Spot Instances have exactly the same performance characteristics as OnDemand workloads – Which means…
- Spot Instances can be a great fit for many types of applications if they handle the spot instance interruption characteristics effectively. This puts Dev / Test instances / Production workloads that do not have any state information residing on an instance memory or cache in the domain of Spot Instances.
- Since the price is varying independently per Availability zone and instance type, it is very helpful to build some automation to analyze and provision the best price/performance spot instance that may be available across multiple availability zones. In the below example, 5 different prices exist for the same instance type based on time.
5. Spot Instance termination notices are available as Cloud watch events, instance metadata that can be queried and suitable actions configured. Typically these include, removing spot instances from the instance pool and draining the connections, taking snapshots of the instance volumes, preserving the private IP, cloning a new machine, restoring the snapshots and private IP and adding it back to the instance pool.
Best Practices using Spot Instances
It is essential to the address the issue of termination of instances and improve fault tolerance of the infrastructure by following these best practices
- Go Multi-AZ: Provision instances in multiple availability zones. This helps in reducing the risk of losing multiple instances because of sudden demand spikes
- Use multiple instance types in each availability zone – This further reduces the risk of losing compute capacity due to sudden demand spikes for a particular instance type in one availability zone.
- Monitor for spot termination to improve fault tolerance and reprovisioning. A great sample available here from AWS
- Use Spot Advisor to understand the likelihood of instance termination for the particular instance type. Typically, the newer instance types are in higher demand. As you can see, the frequency of interruption can be as low as < 5% for certain instance types.
5. Critically evaluate your workloads if they can live with a small risk of data loss during spot instance terminations – especially in the dev/test environments and after incorporating best practices.
6. Automate the provisioning of infrastructure to effectively leverage the pricing data of spot instances, monitor for spot interruption notices and cloning of instances.
Have your cake and eat it too – Defined duration workloads.
Still not convinced Spot Instances can be useful for your workloads? AWS provides an additional option to use Spot blocks for workloads that are less than 6 hours that are guaranteed not be interrupted and available up to 50% off! It was found that nearly 50% of EMR jobs run for less than 6 hours. This makes it an ideal candidate for using spot blocks for these types of jobs.
Application Patterns for Guaranteed Availability
While Spot Instances provide great savings, to ensure the application does not suffer from complete outage ( due to loss of all spot instances at the same time), a good pattern for provisioning your cluster is to have a mix of on-demand instances that serve the minimum required compute capacity and use spot instances for the scalable demand.
Another option is to use Spot Blocks for tasks that are known to be less than 6 hours. e.g. many CI/CD jobs, automated build verification tests, unit/perf test suites ( provided they take less than 6 hours)
Apart from the above mentioned, there are additional options in using spot fleets, ec2 fleets, persistent spot requests are more for varied use cases. Spot instances are integrated with EC2, EMR, ECS, Batch, Auto Scaling Groups and Application Autoscaling Groups.
We will cover them in greater detail in future posts and understand how best to use Spot with each of the services. Stay Tuned!
Spot Instances can become a valuable tool in your arsenal in the cost optimization journey. Once you understand the nuances of Spot Instances, they can provide ongoing cost reductions. Many enterprises have successfully used Spot Instances to gain sizable cost savings and giving wings to their digital aspirations.