Diciphering the arcane aws price list API

AWS released their price list API around 4 years ago, yet, it seems so esoteric in nature. With the SKU values, product codes, and 100 variants of the same product, it isn’t exactly easy to use. As my company deals with AWS cost optimization the price list API is quite imperative for us to understand.

Personally for me, and as I read online for a lot of others too, the API just seems to have too much going on with it for people to use it easily and effectively. For starters, the output is absolute JSON hell. Despite all this, the API comes quite handy when used correctly.

Say you’re a data scientist who wants to know the cost incurred by all the EC2 instances in your EMR cluster, or you’re a DevOps engineer who wants a ballpark figure of all the EC2 instances being provisioned by your development team, this API is a handy addition to your toolkit.

Initially, I tried looking up some examples on how to use the API and I stumbled upon this pricing module written by Lyft, but it wasn’t all that helpful.  After tinkering around with the API for about two weeks I realized how much time I could have saved had I known a more systematic way to get the prices. Since I didn’t find any content online of much use, I thought I’d write a post about it in hopes that it would help someone else.

Now, time to get our hands dirty, but not too dirty, because we just want a quick way to get the price of a product.

Getting the price of an EC2 instance

Creating the pricing client

pricing = boto3.client('pricing')

Filters

The filters I used were:

  1. Operating System
  2. Region Name
  3. Instance Type
  4. Tenancy
  5. Product Family
  6. Usage Type

OS : The os filter takes the values ‘SUSE’, ‘RHEL’, ‘Windows’, ‘Linux’ only.

Region: The regions filter must use the actual region name. For example, for N. Virginia, the filter should be ‘US East (N. Virginia)’ and not ‘us-east-1’. I created this handy dictionary that maps the different names of the regions accordingly.

region_mapping_dict = {'us-east-2': 'US East (Ohio)',
                            'us-east-1': 'US East (N. Virginia)',
                            'us-west-1': 'US West (N. California)',
                            'us-west-2': 'US West (Oregon)',
                            'ap-south-1': 'Asia Pacific (Mumbai)',
                            'ap-northeast-2': 'Asia Pacific (Seoul)',
                            'ap-southeast-1': 'Asia Pacific (Singapore)',
                            'ap-southeast-2': 'Asia Pacific (Sydney)',
                            'ap-northeast-1': 'Asia Pacific (Tokyo)',
                            'ca-central-1': 'Canada (Central)',
                            'cn-north-1': 'China (Beijing)',
                            'cn-northwest-1': 'China (Ningxia)',
                            'eu-central-1': 'EU (Frankfurt)',
                            'eu-west-1': 'EU (Ireland)',
                            'eu-west-2': 'EU (London)',
                            'eu-west-3': 'EU (Paris)',
                            'eu-north-1': 'EU (Stockholm)',
                            'sa-east-1': 'South America (Sao Paulo)'}

Instance Type: The instance type is mentioned in this field. For example, t2.micro, t2.large, etc.

Tenancy: AWS by default offers Shared tenancy unless otherwise specified by the user for it to be Dedicated.

Product Family: The product family in case of EC2 is ‘Compute Instance’.

Usage Type: To be more specific about the kind of instance you want to get the price of the usage type filter is required. Usage types are the units that each service uses to measure the usage of a specific type of resource. For example, the BoxUsage:t2.micro(Hrs) usage type filters by the running hours of Amazon EC2 t2.micro instances. The reason to add this filter is to ensure that you get only on-demand instances as part of our output. This might seem redundant when you see the JSON output as we get this by default without using the filter. But if any changes are made with the pricing and something new is added by AWS this filter still ensures that you get on-demand instances. As of now, this field can be skipped and it would still retrieve the right data but it wouldn’t be advisable to. The filter is as follows,

usage_type = box_usage[region] + instance_type

Again, this field changes region wise. After looking at the pricing API I came up with this simple dictionary that helped solve the region issue for this filter.

box_usage = {'us-east-2': 'USE2-BoxUsage:',
             'us-east-1': 'BoxUsage:',
             'us-west-1': 'USW1-BoxUsage:',
             'us-west-2': 'USW2-BoxUsage:',
             'ap-south-1': 'APS3-BoxUsage:',
             'ap-northeast-3': 'APN3-BoxUsage:',
             'ap-northeast-2': 'APN2-BoxUsage:',
             'ap-southeast-1': 'APS1-BoxUsage:',
             'ap-southeast-2': 'APS2-BoxUsage:',
             'ap-northeast-1': 'APN1-BoxUsage:',
             'ca-central-1': 'CAN1-BoxUsage:',
             'eu-central-1': 'EUC1-BoxUsage:',
             'eu-west-1': 'EUW1-BoxUsage:',
             'eu-west-2': 'EUW2-BoxUsage:',
             'eu-west-3': 'EUW3-BoxUsage:',
             'eu-north-1': 'EUW1-BoxUsage:',
             'sa-east-1': 'SAE1-BoxUsage:'}

Getting the data

Applying the above-mentioned filters to the get_products API we get the JSON data for all the variants that qualify.

data = pricing.get_products(ServiceCode='AmazonEC2', Filters=
[
    {'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': 'RHEL'},
    {'Type': 'TERM_MATCH', 'Field': 'location', 'Value': 'Asia Pacific (Mumbai)'},
    {'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': 't2.micro'},
    {'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'Shared'},
    {'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
    {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}
])

The above boto3 code gives us the price t2.micro Red Hat Enterprise Linux instance without any pre-installed software(the ‘preInstalledSw’ is NA) for the Mumbai region.

Getting the price from JSON data

Now, the output of this is not pretty, to say the least. Parsing through the complex JSON alone took me so much time until I stumbled upon this extremely elegant solution to this problem.

def extract_values(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
               if arr not None:
                break
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    results = extract(obj, arr, key)
    return results

The above code snippet is taken from https://hackersandslackers.com/extract-data-from-complex-json-python/

The above code parses through the JSON to give us the price of the instance without any additional features that are present in the JSON result. In the above code, ‘obj’ is the JSON data result and the ‘key’ is ‘USD’.

Putting it all together

Here is the sample code I wrote which puts all the above pieces of the puzzle together and gives us a view of the bigger picture.

import boto3
import json


class CostEstimation:

    def __init__(self):
        self.region_mapping_dict = {'us-east-2': 'US East (Ohio)',
                                    'us-east-1': 'US East (N. Virginia)',
                                    'us-west-1': 'US West (N. California)',
                                    'us-west-2': 'US West (Oregon)',
                                    'ap-south-1': 'Asia Pacific (Mumbai)',
                                    'ap-northeast-3': 'Asia Pacific (Osaka-Local)',
                                    'ap-northeast-2': 'Asia Pacific (Seoul)',
                                    'ap-southeast-1': 'Asia Pacific (Singapore)',
                                    'ap-southeast-2': 'Asia Pacific (Sydney)',
                                    'ap-northeast-1': 'Asia Pacific (Tokyo)',
                                    'ca-central-1': 'Canada (Central)',
                                    'cn-north-1': 'China (Beijing)',
                                    'cn-northwest-1': 'China (Ningxia)',
                                    'eu-central-1': 'EU (Frankfurt)',
                                    'eu-west-1': 'EU (Ireland)',
                                    'eu-west-2': 'EU (London)',
                                    'eu-west-3': 'EU (Paris)',
                                    'eu-north-1': 'EU (Stockholm)',
                                    'sa-east-1': 'South America (Sao Paulo)'}

        self.pricing = boto3.client('pricing', region_name='us-east-1')

    def extract_values(self, obj, key):
        """Pull all values of specified key from nested JSON."""
        arr = []

        def extract(obj, arr, key):
            """Recursively search for values of key in JSON tree."""
            if isinstance(obj, dict):
                for k, v in obj.items():
                    if arr != []:
                        break
                    if isinstance(v, (dict, list)):
                        extract(v, arr, key)
                    elif k == key:
                        arr.append(v)
            elif isinstance(obj, list):
                for item in obj:
                    extract(item, arr, key)
            return arr

        results = extract(obj, arr, key)
        return results

    def get_instance_price(self, os, instance_type, region):

        for key, value in self.region_mapping_dict.items():
            if (region == key):
                region_name = value
                break
        price = 0
        try:

            box_usage = {'us-east-2': 'USE2-BoxUsage:',
                         'us-east-1': 'BoxUsage:',
                         'us-west-1': 'USW1-BoxUsage:',
                         'us-west-2': 'USW2-BoxUsage:',
                         'ap-south-1': 'APS3-BoxUsage:',
                         'ap-northeast-3': 'APN3-BoxUsage:',
                         'ap-northeast-2': 'APN2-BoxUsage:',
                         'ap-southeast-1': 'APS1-BoxUsage:',
                         'ap-southeast-2': 'APS2-BoxUsage:',
                         'ap-northeast-1': 'APN1-BoxUsage:',
                         'ca-central-1': 'CAN1-BoxUsage:',
                         'cn-north-1': 'BoxUsage:',
                         'cn-northwest-1': 'BoxUsage:',
                         'eu-central-1': 'EUC1-BoxUsage:',
                         'eu-west-1': 'EUW1-BoxUsage:',
                         'eu-west-2': 'EUW2-BoxUsage:',
                         'eu-west-3': 'EUW3-BoxUsage:',
                         'eu-north-1': 'EUW1-BoxUsage:',
                         'sa-east-1': 'SAE1-BoxUsage:'}

            usage_type = box_usage[region] + instance_type
            pricing = boto3.client('pricing', region_name='us-east-1')

            data = pricing.get_products(ServiceCode='AmazonEC2', Filters=
            [
                {'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': os},
                {'Type': 'TERM_MATCH', 'Field': 'location', 'Value': region_name},
                {'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': instance_type},
                {'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'Shared'},
                {'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
                {'Type': 'TERM_MATCH', 'Field': 'capacitystatus', 'Value': 'used'},
                {'Type': 'TERM_MATCH', 'Field': 'usagetype', 'Value': usage_type},
                {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}
            ])

            for value in data['PriceList']:
                json_value = json.loads(value)
            price = self.extract_values(json_value, 'USD')
            ec2_price_per_hour = price[0]

        except Exception as e:
            print(str(e))

        return ec2_price_per_hour


def main():
    os = 'Linux'
    instance_type = 't2.micro'
    region = 'us-west-1'

    test = CostEstimation()
    value = test.get_instance_price(os, instance_type, region)
    cost_for_a_day = float(value) * 24 * 30
    print(cost_for_a_day)


main()

The above code gives the cost of running an ec2 instance of type t2.micro in the N. California region for a month.

The above method to programmatically get the cost of a product can also be applied to other services like load balancers, volumes, etc, with a couple of modifications.

For example, to get the price of a load balancer, you would just have to change the filters to the following

Filters=[
    {'Type': 'TERM_MATCH', 'Field': 'productFamily',
     'Value': 'Load Balancer'},
    {'Type': 'TERM_MATCH', 'Field': 'location',
     'Value': region_name},
    {'Type': 'TERM_MATCH', 'Field': 'groupDescription',
     'Value': 'Standard Elastic Load Balancer'}
]

Once the above filters are applied, you can get the price of the load balancer by following the same steps and mentioned above.