• Home
  • Category: boto3

Diciphering the arcane aws price list API

AWS released their price list API around 4 years ago, yet, it seems so esoteric in nature. With the SKU values, product codes, and 100 variants of the same product, it isn’t exactly easy to use. As my company deals with AWS cost optimization the price list API is quite imperative for us to understand.

Personally for me, and as I read online for a lot of others too, the API just seems to have too much going on with it for people to use it easily and effectively. For starters, the output is absolute JSON hell. Despite all this, the API comes quite handy when used correctly.

Say you’re a data scientist who wants to know the cost incurred by all the EC2 instances in your EMR cluster, or you’re a DevOps engineer who wants a ballpark figure of all the EC2 instances being provisioned by your development team, this API is a handy addition to your toolkit.

Initially, I tried looking up some examples on how to use the API and I stumbled upon this pricing module written by Lyft, but it wasn’t all that helpful.  After tinkering around with the API for about two weeks I realized how much time I could have saved had I known a more systematic way to get the prices. Since I didn’t find any content online of much use, I thought I’d write a post about it in hopes that it would help someone else.

Now, time to get our hands dirty, but not too dirty, because we just want a quick way to get the price of a product.

Getting the price of an EC2 instance

Creating the pricing client

pricing = boto3.client('pricing')

Filters

The filters I used were:

  1. Operating System
  2. Region Name
  3. Instance Type
  4. Tenancy
  5. Product Family
  6. Usage Type

OS : The os filter takes the values ‘SUSE’, ‘RHEL’, ‘Windows’, ‘Linux’ only.

Region: The regions filter must use the actual region name. For example, for N. Virginia, the filter should be ‘US East (N. Virginia)’ and not ‘us-east-1’. I created this handy dictionary that maps the different names of the regions accordingly.

region_mapping_dict = {'us-east-2': 'US East (Ohio)',
                            'us-east-1': 'US East (N. Virginia)',
                            'us-west-1': 'US West (N. California)',
                            'us-west-2': 'US West (Oregon)',
                            'ap-south-1': 'Asia Pacific (Mumbai)',
                            'ap-northeast-2': 'Asia Pacific (Seoul)',
                            'ap-southeast-1': 'Asia Pacific (Singapore)',
                            'ap-southeast-2': 'Asia Pacific (Sydney)',
                            'ap-northeast-1': 'Asia Pacific (Tokyo)',
                            'ca-central-1': 'Canada (Central)',
                            'cn-north-1': 'China (Beijing)',
                            'cn-northwest-1': 'China (Ningxia)',
                            'eu-central-1': 'EU (Frankfurt)',
                            'eu-west-1': 'EU (Ireland)',
                            'eu-west-2': 'EU (London)',
                            'eu-west-3': 'EU (Paris)',
                            'eu-north-1': 'EU (Stockholm)',
                            'sa-east-1': 'South America (Sao Paulo)'}

Instance Type: The instance type is mentioned in this field. For example, t2.micro, t2.large, etc.

Tenancy: AWS by default offers Shared tenancy unless otherwise specified by the user for it to be Dedicated.

Product Family: The product family in case of EC2 is ‘Compute Instance’.

Usage Type: To be more specific about the kind of instance you want to get the price of the usage type filter is required. Usage types are the units that each service uses to measure the usage of a specific type of resource. For example, the BoxUsage:t2.micro(Hrs) usage type filters by the running hours of Amazon EC2 t2.micro instances. The reason to add this filter is to ensure that you get only on-demand instances as part of our output. This might seem redundant when you see the JSON output as we get this by default without using the filter. But if any changes are made with the pricing and something new is added by AWS this filter still ensures that you get on-demand instances. As of now, this field can be skipped and it would still retrieve the right data but it wouldn’t be advisable to. The filter is as follows,

usage_type = box_usage[region] + instance_type

Again, this field changes region wise. After looking at the pricing API I came up with this simple dictionary that helped solve the region issue for this filter.

box_usage = {'us-east-2': 'USE2-BoxUsage:',
             'us-east-1': 'BoxUsage:',
             'us-west-1': 'USW1-BoxUsage:',
             'us-west-2': 'USW2-BoxUsage:',
             'ap-south-1': 'APS3-BoxUsage:',
             'ap-northeast-3': 'APN3-BoxUsage:',
             'ap-northeast-2': 'APN2-BoxUsage:',
             'ap-southeast-1': 'APS1-BoxUsage:',
             'ap-southeast-2': 'APS2-BoxUsage:',
             'ap-northeast-1': 'APN1-BoxUsage:',
             'ca-central-1': 'CAN1-BoxUsage:',
             'eu-central-1': 'EUC1-BoxUsage:',
             'eu-west-1': 'EUW1-BoxUsage:',
             'eu-west-2': 'EUW2-BoxUsage:',
             'eu-west-3': 'EUW3-BoxUsage:',
             'eu-north-1': 'EUW1-BoxUsage:',
             'sa-east-1': 'SAE1-BoxUsage:'}

Getting the data

Applying the above-mentioned filters to the get_products API we get the JSON data for all the variants that qualify.

data = pricing.get_products(ServiceCode='AmazonEC2', Filters=
[
    {'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': 'RHEL'},
    {'Type': 'TERM_MATCH', 'Field': 'location', 'Value': 'Asia Pacific (Mumbai)'},
    {'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': 't2.micro'},
    {'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'Shared'},
    {'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
    {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}
])

The above boto3 code gives us the price t2.micro Red Hat Enterprise Linux instance without any pre-installed software(the ‘preInstalledSw’ is NA) for the Mumbai region.

Getting the price from JSON data

Now, the output of this is not pretty, to say the least. Parsing through the complex JSON alone took me so much time until I stumbled upon this extremely elegant solution to this problem.

def extract_values(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
               if arr not None:
                break
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    results = extract(obj, arr, key)
    return results

The above code snippet is taken from https://hackersandslackers.com/extract-data-from-complex-json-python/

The above code parses through the JSON to give us the price of the instance without any additional features that are present in the JSON result. In the above code, ‘obj’ is the JSON data result and the ‘key’ is ‘USD’.

Putting it all together

Here is the sample code I wrote which puts all the above pieces of the puzzle together and gives us a view of the bigger picture.

import boto3
import json


class CostEstimation:

    def __init__(self):
        self.region_mapping_dict = {'us-east-2': 'US East (Ohio)',
                                    'us-east-1': 'US East (N. Virginia)',
                                    'us-west-1': 'US West (N. California)',
                                    'us-west-2': 'US West (Oregon)',
                                    'ap-south-1': 'Asia Pacific (Mumbai)',
                                    'ap-northeast-3': 'Asia Pacific (Osaka-Local)',
                                    'ap-northeast-2': 'Asia Pacific (Seoul)',
                                    'ap-southeast-1': 'Asia Pacific (Singapore)',
                                    'ap-southeast-2': 'Asia Pacific (Sydney)',
                                    'ap-northeast-1': 'Asia Pacific (Tokyo)',
                                    'ca-central-1': 'Canada (Central)',
                                    'cn-north-1': 'China (Beijing)',
                                    'cn-northwest-1': 'China (Ningxia)',
                                    'eu-central-1': 'EU (Frankfurt)',
                                    'eu-west-1': 'EU (Ireland)',
                                    'eu-west-2': 'EU (London)',
                                    'eu-west-3': 'EU (Paris)',
                                    'eu-north-1': 'EU (Stockholm)',
                                    'sa-east-1': 'South America (Sao Paulo)'}

        self.pricing = boto3.client('pricing', region_name='us-east-1')

    def extract_values(self, obj, key):
        """Pull all values of specified key from nested JSON."""
        arr = []

        def extract(obj, arr, key):
            """Recursively search for values of key in JSON tree."""
            if isinstance(obj, dict):
                for k, v in obj.items():
                    if arr != []:
                        break
                    if isinstance(v, (dict, list)):
                        extract(v, arr, key)
                    elif k == key:
                        arr.append(v)
            elif isinstance(obj, list):
                for item in obj:
                    extract(item, arr, key)
            return arr

        results = extract(obj, arr, key)
        return results

    def get_instance_price(self, os, instance_type, region):

        for key, value in self.region_mapping_dict.items():
            if (region == key):
                region_name = value
                break
        price = 0
        try:

            box_usage = {'us-east-2': 'USE2-BoxUsage:',
                         'us-east-1': 'BoxUsage:',
                         'us-west-1': 'USW1-BoxUsage:',
                         'us-west-2': 'USW2-BoxUsage:',
                         'ap-south-1': 'APS3-BoxUsage:',
                         'ap-northeast-3': 'APN3-BoxUsage:',
                         'ap-northeast-2': 'APN2-BoxUsage:',
                         'ap-southeast-1': 'APS1-BoxUsage:',
                         'ap-southeast-2': 'APS2-BoxUsage:',
                         'ap-northeast-1': 'APN1-BoxUsage:',
                         'ca-central-1': 'CAN1-BoxUsage:',
                         'cn-north-1': 'BoxUsage:',
                         'cn-northwest-1': 'BoxUsage:',
                         'eu-central-1': 'EUC1-BoxUsage:',
                         'eu-west-1': 'EUW1-BoxUsage:',
                         'eu-west-2': 'EUW2-BoxUsage:',
                         'eu-west-3': 'EUW3-BoxUsage:',
                         'eu-north-1': 'EUW1-BoxUsage:',
                         'sa-east-1': 'SAE1-BoxUsage:'}

            usage_type = box_usage[region] + instance_type
            pricing = boto3.client('pricing', region_name='us-east-1')

            data = pricing.get_products(ServiceCode='AmazonEC2', Filters=
            [
                {'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': os},
                {'Type': 'TERM_MATCH', 'Field': 'location', 'Value': region_name},
                {'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': instance_type},
                {'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'Shared'},
                {'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
                {'Type': 'TERM_MATCH', 'Field': 'capacitystatus', 'Value': 'used'},
                {'Type': 'TERM_MATCH', 'Field': 'usagetype', 'Value': usage_type},
                {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}
            ])

            for value in data['PriceList']:
                json_value = json.loads(value)
            price = self.extract_values(json_value, 'USD')
            ec2_price_per_hour = price[0]

        except Exception as e:
            print(str(e))

        return ec2_price_per_hour


def main():
    os = 'Linux'
    instance_type = 't2.micro'
    region = 'us-west-1'

    test = CostEstimation()
    value = test.get_instance_price(os, instance_type, region)
    cost_for_a_day = float(value) * 24 * 30
    print(cost_for_a_day)


main()

The above code gives the cost of running an ec2 instance of type t2.micro in the N. California region for a month.

The above method to programmatically get the cost of a product can also be applied to other services like load balancers, volumes, etc, with a couple of modifications.

For example, to get the price of a load balancer, you would just have to change the filters to the following

Filters=[
    {'Type': 'TERM_MATCH', 'Field': 'productFamily',
     'Value': 'Load Balancer'},
    {'Type': 'TERM_MATCH', 'Field': 'location',
     'Value': region_name},
    {'Type': 'TERM_MATCH', 'Field': 'groupDescription',
     'Value': 'Standard Elastic Load Balancer'}
]

Once the above filters are applied, you can get the price of the load balancer by following the same steps and mentioned above.

 

Configure AWS Lamba programatically with Boto3

Serverless computing has generated a lot of interest in the development community. AWS Lambda is the AWS version of Serverless. In this post, we will look at configuring AWS Lambda programmatically with boto3.  We will configure a lambda function that connects to a Postgres DB on an EC2 instance in a private VPC using sqlalchemy. This would need packaging of the dependencies of sqlalchemy along with the lambda function.

By default, the instance where AWS lambda is running has Boto installed (AWS Python SDK). So, importing a boto3 package in python code will work without any other packaging. But importing a package like “import sqlalchemy” will lead to python error “Unable to import module sqlalchemy”. This article shows how to achieve the same.

Step 1: Write the lambda function

First we need to write the python code which will run as a lambda. This below code creates a python file and zips it. Note down, the python code that we wanted to execute should contain a method called lambda_handler which serves as an entry point for the lambda to start the execution.

One quick aside: SqlAlchemy complains about not being able to find the right provider when connecting to postgres from lambda. So might have to add postgressql + pscycopg2 as part of connection string. Don't forget to open the port in the security group and enable incoming connections from all in the host file of the Postgres server. This can be a cause of a lot of heartaches.
Also, remember if you the system you package the code is windows, then you are in some trouble as the pip will pick windows related drivers. So you need to run this on lambda compatible OS (read as Amazon Linux or compatible OS). Else, you will end up with a lot of wierd run time errors.

Step 2:  Create the lambda function

Now use boto3 client API create_function to map it to the above code. If you take a close look, Handler=‘lambda_function.lambda_handler’ means that it will look for the python file lambda_function.py and inside it it will look for lambda_handler function. Also, you need to pass the right arn for this API to work.

Step 3: Set the frequency of lambda 

Set the rule and the frequency at which you want the lambda to run, it can be a simple expression like rate(1 minute) or rate(5 minutes) or a cron expression too. Also, set the rule target and grant permission for the lambda to execute. In this case, we are configuring the rule to run every minute.

Step 4: Additional Package Deployment

Do this step if we need other packages for our python code to run. For creating deployment folders, you can use this python script create_deployment.py and requirements.txt. Basically, when the script file runs, it picks the packages that need to be installed from requirements.txt and does a pip -install -t and which creates a folder, pull all the packages and zips them. If you need to have a particular version you can say sseclient==0.0.11 or you can omit the version to pull the latest.

Final Step: Putting it together

Now putting everything together yields the following output in the AWS CloudWatch logs, no import module error this time. This can be modified to use whichever package you need for your development purposes.

Hope this saves a few hours trying to jump a few hoops which weren’t really apparent at the onset!

References:

AWS Lambda Functions Made Easy

AWS Lambda: Programmatically scheduling a CloudWatchEvent