• Home
  • Author: Shivaranjani Ramakrishnan

Diciphering the arcane aws price list API

AWS released their price list API around 4 years ago, yet, it seems so esoteric in nature. With the SKU values, product codes, and 100 variants of the same product, it isn’t exactly easy to use. As my company deals with AWS cost optimization the price list API is quite imperative for us to understand.

Personally for me, and as I read online for a lot of others too, the API just seems to have too much going on with it for people to use it easily and effectively. For starters, the output is absolute JSON hell. Despite all this, the API comes quite handy when used correctly.

Say you’re a data scientist who wants to know the cost incurred by all the EC2 instances in your EMR cluster, or you’re a DevOps engineer who wants a ballpark figure of all the EC2 instances being provisioned by your development team, this API is a handy addition to your toolkit.

Initially, I tried looking up some examples on how to use the API and I stumbled upon this pricing module written by Lyft, but it wasn’t all that helpful.  After tinkering around with the API for about two weeks I realized how much time I could have saved had I known a more systematic way to get the prices. Since I didn’t find any content online of much use, I thought I’d write a post about it in hopes that it would help someone else.

Now, time to get our hands dirty, but not too dirty, because we just want a quick way to get the price of a product.

Getting the price of an EC2 instance

Creating the pricing client

pricing = boto3.client('pricing')

Filters

The filters I used were:

  1. Operating System
  2. Region Name
  3. Instance Type
  4. Tenancy
  5. Product Family
  6. Usage Type

OS : The os filter takes the values ‘SUSE’, ‘RHEL’, ‘Windows’, ‘Linux’ only.

Region: The regions filter must use the actual region name. For example, for N. Virginia, the filter should be ‘US East (N. Virginia)’ and not ‘us-east-1’. I created this handy dictionary that maps the different names of the regions accordingly.

region_mapping_dict = {'us-east-2': 'US East (Ohio)',
                            'us-east-1': 'US East (N. Virginia)',
                            'us-west-1': 'US West (N. California)',
                            'us-west-2': 'US West (Oregon)',
                            'ap-south-1': 'Asia Pacific (Mumbai)',
                            'ap-northeast-2': 'Asia Pacific (Seoul)',
                            'ap-southeast-1': 'Asia Pacific (Singapore)',
                            'ap-southeast-2': 'Asia Pacific (Sydney)',
                            'ap-northeast-1': 'Asia Pacific (Tokyo)',
                            'ca-central-1': 'Canada (Central)',
                            'cn-north-1': 'China (Beijing)',
                            'cn-northwest-1': 'China (Ningxia)',
                            'eu-central-1': 'EU (Frankfurt)',
                            'eu-west-1': 'EU (Ireland)',
                            'eu-west-2': 'EU (London)',
                            'eu-west-3': 'EU (Paris)',
                            'eu-north-1': 'EU (Stockholm)',
                            'sa-east-1': 'South America (Sao Paulo)'}

Instance Type: The instance type is mentioned in this field. For example, t2.micro, t2.large, etc.

Tenancy: AWS by default offers Shared tenancy unless otherwise specified by the user for it to be Dedicated.

Product Family: The product family in case of EC2 is ‘Compute Instance’.

Usage Type: To be more specific about the kind of instance you want to get the price of the usage type filter is required. Usage types are the units that each service uses to measure the usage of a specific type of resource. For example, the BoxUsage:t2.micro(Hrs) usage type filters by the running hours of Amazon EC2 t2.micro instances. The reason to add this filter is to ensure that you get only on-demand instances as part of our output. This might seem redundant when you see the JSON output as we get this by default without using the filter. But if any changes are made with the pricing and something new is added by AWS this filter still ensures that you get on-demand instances. As of now, this field can be skipped and it would still retrieve the right data but it wouldn’t be advisable to. The filter is as follows,

usage_type = box_usage[region] + instance_type

Again, this field changes region wise. After looking at the pricing API I came up with this simple dictionary that helped solve the region issue for this filter.

box_usage = {'us-east-2': 'USE2-BoxUsage:',
             'us-east-1': 'BoxUsage:',
             'us-west-1': 'USW1-BoxUsage:',
             'us-west-2': 'USW2-BoxUsage:',
             'ap-south-1': 'APS3-BoxUsage:',
             'ap-northeast-3': 'APN3-BoxUsage:',
             'ap-northeast-2': 'APN2-BoxUsage:',
             'ap-southeast-1': 'APS1-BoxUsage:',
             'ap-southeast-2': 'APS2-BoxUsage:',
             'ap-northeast-1': 'APN1-BoxUsage:',
             'ca-central-1': 'CAN1-BoxUsage:',
             'eu-central-1': 'EUC1-BoxUsage:',
             'eu-west-1': 'EUW1-BoxUsage:',
             'eu-west-2': 'EUW2-BoxUsage:',
             'eu-west-3': 'EUW3-BoxUsage:',
             'eu-north-1': 'EUW1-BoxUsage:',
             'sa-east-1': 'SAE1-BoxUsage:'}

Getting the data

Applying the above-mentioned filters to the get_products API we get the JSON data for all the variants that qualify.

data = pricing.get_products(ServiceCode='AmazonEC2', Filters=
[
    {'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': 'RHEL'},
    {'Type': 'TERM_MATCH', 'Field': 'location', 'Value': 'Asia Pacific (Mumbai)'},
    {'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': 't2.micro'},
    {'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'Shared'},
    {'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
    {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}
])

The above boto3 code gives us the price t2.micro Red Hat Enterprise Linux instance without any pre-installed software(the ‘preInstalledSw’ is NA) for the Mumbai region.

Getting the price from JSON data

Now, the output of this is not pretty, to say the least. Parsing through the complex JSON alone took me so much time until I stumbled upon this extremely elegant solution to this problem.

def extract_values(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
               if arr not None:
                break
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    results = extract(obj, arr, key)
    return results

The above code snippet is taken from https://hackersandslackers.com/extract-data-from-complex-json-python/

The above code parses through the JSON to give us the price of the instance without any additional features that are present in the JSON result. In the above code, ‘obj’ is the JSON data result and the ‘key’ is ‘USD’.

Putting it all together

Here is the sample code I wrote which puts all the above pieces of the puzzle together and gives us a view of the bigger picture.

import boto3
import json


class CostEstimation:

    def __init__(self):
        self.region_mapping_dict = {'us-east-2': 'US East (Ohio)',
                                    'us-east-1': 'US East (N. Virginia)',
                                    'us-west-1': 'US West (N. California)',
                                    'us-west-2': 'US West (Oregon)',
                                    'ap-south-1': 'Asia Pacific (Mumbai)',
                                    'ap-northeast-3': 'Asia Pacific (Osaka-Local)',
                                    'ap-northeast-2': 'Asia Pacific (Seoul)',
                                    'ap-southeast-1': 'Asia Pacific (Singapore)',
                                    'ap-southeast-2': 'Asia Pacific (Sydney)',
                                    'ap-northeast-1': 'Asia Pacific (Tokyo)',
                                    'ca-central-1': 'Canada (Central)',
                                    'cn-north-1': 'China (Beijing)',
                                    'cn-northwest-1': 'China (Ningxia)',
                                    'eu-central-1': 'EU (Frankfurt)',
                                    'eu-west-1': 'EU (Ireland)',
                                    'eu-west-2': 'EU (London)',
                                    'eu-west-3': 'EU (Paris)',
                                    'eu-north-1': 'EU (Stockholm)',
                                    'sa-east-1': 'South America (Sao Paulo)'}

        self.pricing = boto3.client('pricing', region_name='us-east-1')

    def extract_values(self, obj, key):
        """Pull all values of specified key from nested JSON."""
        arr = []

        def extract(obj, arr, key):
            """Recursively search for values of key in JSON tree."""
            if isinstance(obj, dict):
                for k, v in obj.items():
                    if arr != []:
                        break
                    if isinstance(v, (dict, list)):
                        extract(v, arr, key)
                    elif k == key:
                        arr.append(v)
            elif isinstance(obj, list):
                for item in obj:
                    extract(item, arr, key)
            return arr

        results = extract(obj, arr, key)
        return results

    def get_instance_price(self, os, instance_type, region):

        for key, value in self.region_mapping_dict.items():
            if (region == key):
                region_name = value
                break
        price = 0
        try:

            box_usage = {'us-east-2': 'USE2-BoxUsage:',
                         'us-east-1': 'BoxUsage:',
                         'us-west-1': 'USW1-BoxUsage:',
                         'us-west-2': 'USW2-BoxUsage:',
                         'ap-south-1': 'APS3-BoxUsage:',
                         'ap-northeast-3': 'APN3-BoxUsage:',
                         'ap-northeast-2': 'APN2-BoxUsage:',
                         'ap-southeast-1': 'APS1-BoxUsage:',
                         'ap-southeast-2': 'APS2-BoxUsage:',
                         'ap-northeast-1': 'APN1-BoxUsage:',
                         'ca-central-1': 'CAN1-BoxUsage:',
                         'cn-north-1': 'BoxUsage:',
                         'cn-northwest-1': 'BoxUsage:',
                         'eu-central-1': 'EUC1-BoxUsage:',
                         'eu-west-1': 'EUW1-BoxUsage:',
                         'eu-west-2': 'EUW2-BoxUsage:',
                         'eu-west-3': 'EUW3-BoxUsage:',
                         'eu-north-1': 'EUW1-BoxUsage:',
                         'sa-east-1': 'SAE1-BoxUsage:'}

            usage_type = box_usage[region] + instance_type
            pricing = boto3.client('pricing', region_name='us-east-1')

            data = pricing.get_products(ServiceCode='AmazonEC2', Filters=
            [
                {'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': os},
                {'Type': 'TERM_MATCH', 'Field': 'location', 'Value': region_name},
                {'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': instance_type},
                {'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'Shared'},
                {'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
                {'Type': 'TERM_MATCH', 'Field': 'capacitystatus', 'Value': 'used'},
                {'Type': 'TERM_MATCH', 'Field': 'usagetype', 'Value': usage_type},
                {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}
            ])

            for value in data['PriceList']:
                json_value = json.loads(value)
            price = self.extract_values(json_value, 'USD')
            ec2_price_per_hour = price[0]

        except Exception as e:
            print(str(e))

        return ec2_price_per_hour


def main():
    os = 'Linux'
    instance_type = 't2.micro'
    region = 'us-west-1'

    test = CostEstimation()
    value = test.get_instance_price(os, instance_type, region)
    cost_for_a_day = float(value) * 24 * 30
    print(cost_for_a_day)


main()

The above code gives the cost of running an ec2 instance of type t2.micro in the N. California region for a month.

The above method to programmatically get the cost of a product can also be applied to other services like load balancers, volumes, etc, with a couple of modifications.

For example, to get the price of a load balancer, you would just have to change the filters to the following

Filters=[
    {'Type': 'TERM_MATCH', 'Field': 'productFamily',
     'Value': 'Load Balancer'},
    {'Type': 'TERM_MATCH', 'Field': 'location',
     'Value': region_name},
    {'Type': 'TERM_MATCH', 'Field': 'groupDescription',
     'Value': 'Standard Elastic Load Balancer'}
]

Once the above filters are applied, you can get the price of the load balancer by following the same steps and mentioned above.

 

The new A1 EC2 Instance

At re: Invent 2018 AWS  introduced A1 instances. Compared to the other instance types they cost 40% lesser ‘per core’. Given that they use the Nitro Hypervisor, they give a better performance as well compared to traditional Xen Hypervisor based instances.

However, before you go and move all your instance types, it is good to know when you can use these instance types and the effort required in moving your workload to use these instances.

ms-clipboard-file://C:/Users/Geetha/AppData/Local/Packages/Microsoft.Office.OneNote_8wekyb3d8bbwe/TempState/msohtmlclip/clip_image001.png

A1 instances have ARM-based processors. If your workloads compile to native code using x86 architecture, then you would need to recompile for ARM platform before you can run on them.

For scripting based languages, it could be negligible ( as long as they do not use some module that is native in their dependency chain).

If you use Docker containers, It is relatively quick as mentioned in https://blog.boltops.com/2018/12/16/ec2-a1-instance-with-aws-homegrown-arm-processor-easy-way-to-save-40.

Amazon Linux, Ubuntu Linux, and Red Hat Enterprise Linux are the initial operating systems with ARMv8 support on EC2.

A1 instances have slightly dated ARM A72 processors (released in 2015. The current generation is A76) that are aimed at high-end smartphones and tablets. So they aren’t meant for same workloads such as Xeon E5 series powering the Cx series instance types for servers. In terms of benchmarks by Phoronix https://www.phoronix.com/scan.php?page=article&item=ec2-graviton-performance&num=1, both Intel and AMD based instances far outperform the current gen A1 instances.

Interestingly enough is the price/performance per dollar

ms-clipboard-file://C:/Users/Geetha/AppData/Local/Packages/Microsoft.Office.OneNote_8wekyb3d8bbwe/TempState/msohtmlclip/clip_image002.png

Courtesy: Phoronix benchmark

In terms of ‘real world’ test of hosting website, they still underperform by about 3.5x ( albeit at a lower cost). (https://www.theregister.co.uk/2018/11/27/amazon_aws_graviton_specs/)

Currently, A1 instances are not meant for general purpose workloads. However, owing to the Nitro system based hypervisor, they will be very useful as part of scale-out workloads, lightweight web servers,

containerized micro-services, caching fleets and such.

However, there is a larger trend at play which will be beneficial to customers in the long run. Amazon bringing their own processor into the mix with Intel and AMD will improve choices and hopefully reduce costs in the long run.

Amazon’s new and Intelligent S3 storage class:

What is S3?

Amazon Simple Storage Service (Amazon S3) is a global Infrastructure as a Service (IaaS) solution. Amazon S3 facilitates highly scalable, secured and low-latency data storage from the cloud.

Yet another storage class?

Prior to this, S3 objects had the following storage classes:

  1. Standard – For frequently accessed data.
  2. Standard-IA – For long-lived, infrequently accessed data.
  3. One Zone-IA – For long-lived, infrequently accessed, non-critical data.
  4. Glacier – For long-lived, infrequently accessed, archived critical data.

By default, the Standard storage class contains all S3 objects unless explicitly specified. Standard-Infrequent Access comes into picture when your object is infrequently accessed. Standard-IA has a lower storage cost associated with it. This may seem ideal initially, Standard offers a storage cost of $0.023 per GB for the first 50 TB data stored per month. In contrast, Standard-IA offers a meager charge of $0.0125 per GB for all the storage per month.

When juxtaposed, the choices seem fairly obvious, right?

Wrong.

The caveat with storing objects in the Standard-IA class is it’s considerably high data retrieval and request prices. Initially, you might store an object in Standard-IA under the assumption that you would barely access it, but as time progresses the need to use the object increases and along with it comes the high data request charges tied to the Standard-IA class. This would presumably lead to a spike in your bill. But, it is also not feasible to keep track of the usage patterns of objects and switch them between the storage classes frequently. Hence, using Standard-IA might end up being more than what you bargained for.

This is where the S3 Intelligent-Tiering storage class comes into play.

What is S3 Intelligent-Tiering Storage?

At re: Invent 2018 AWS introduced the Intelligent-Tiering Storage.

The S3 Intelligent-Tiering storage class is designed for customers who want to optimize their storage costs automatically when data access patterns change, without performance impact or operational overhead. The new storage class offers the same high durability, low latency, and high throughput of S3 Standard or Standard-IA and inherits all of the existing S3 features including security and access management, data life-cycle policies, cross-region replication, and event notifications.

How does it work?

Initially, the S3 objects are by default present in the Frequent Access tier for a period of 30 days. After monitoring data access patterns, objects that have not been accessed for a period of 30 days are moved to the Infrequent Access tier. Once accessed, they’re moved back to the Frequent Access tier.

How are Life-Cycle Policies affected by Intelligent-Tiering?

Source: https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-transition-general-considerations.html

  • The above waterfall model shows the possible storage class transitions supported by S3 life-cycle policies.
  • Transitions supported for Intelligent-Tiering are only to Glacier and OneZone_IA.
  • AWS does not transition objects that are smaller than 128kb to Intelligent-Tiering storage as it is not cost-effective.

Is this right for you?

  • If your data retrieval patterns are erratic and cannot be predetermined then the Intelligent Tiering storage class is for you, if not, the other four classes might be more suited.

What should you keep in mind before using Intelligent-Tiering?

  • This class should contain objects that are present for a minimum of 30 days. Hence, it is not suitable for short-lived objects. Rather, it is more suited for objects whose access patterns change over a longer period of time.
  • If the objects present are deleted, overwritten or transitioned to a different class before the minimum storage duration of 30 days there is a prorated charge per GB for the remaining days.
  • Objects smaller than 128kb could be stored in this storage class but it is not advisable to, because :
    1. Auto-Tiering is applicable only to objects of size greater than 128kb. Hence, the object will always be present in the Frequent Access tier irrespective of change in access patterns.
    2. The pricing for this storage will always be corresponding to that of the Frequent Access tier as it never moves to the Infrequent Access tier.

How to choose the Intelligent-Tiering storage class?

While uploading the object you can choose the Intelligent-Tiering storage class in the Set Properties step.

How is it priced?

(Pricing of US-East(N Virginia) considered in the above image. Pricing varies with region.)

From the above image, we can see that,

  • The billing for storage in the Frequent Access Tier is the same as S3 Standard.
  •  The billing for storage in the Infrequent Access is the same as S3 Standard-Infrequent Access.
  • There is a monthly fee of $0.0025 per 1,000 objects when using this storage class due to monitoring and automation.
  •  There is no cost associated while moving from Infrequent Tier to Frequent Tier and vice-versa.
  • The request pricing is as same as that of S3 Standard.