Cleaning up orphaned AWS EBS volumes with Lambda, Events and Python.

March 23, 2017   

Often service managers will come hunting once in a while complaining about overspend in AWS. In particular, perhaps they’ve noticed some environments (usually a legacy one, that still has regular deployments) within AWS that have alot of oprhaned EBS volumes. Due to the sheer volume and size of these volumes, they calculate a net cost of around £2,000 per month is wasted on just not doing the admin work to clean them up - gasps.

Having done a lot of work with Lambda recently, I decided f*ck it, lets stop being awkward and entertain the request. Lets not pretend, it is, quite frankly a GREAT BIG waste of money.

Lets tackle the problem in the following three main focus areas;

Up-front caveats;

  • You’ll need to replace the dummy acc-id: 999999999999 in all requests and response ARNS yourself.
  • Pay special attention to the names of events and lambda functions given in api calls as these are holding things together between AWS api requests.

The AWS event trigger

Lambda handily comes with the ability to respond to cloudwatch events, which in turn can be expressed in cron() or rate() formats.

I choose for testing purposes to use the rate(1 minutes) but for production would recommend using cron expression like cron(0 0 * * Sun)

aws events put-rule \
--name test-event \
--schedule-expression 'rate(1 minute)'

This rule will be selectable for other lambda functions in the same region etc.

When this rule triggers your python handler will receive an event. E.G.

{
    "version": "0",
    "id": "53dc4d37-cffa-4f76-80c9-8b7d4a4d2eaa",
    "detail-type": "Scheduled Event",
    "source": "aws.events",
    "account": "123456789012",
    "time": "2015-10-08T16:53:06Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:events:us-east-1:999999999999:rule/my-scheduled-rule"
    ],
    "detail": {}
}

AWS IAM permissions (role)

In order for our lambda function to be able to make calls against the AWS ec2 api we must grant it access via a role and correct policy-document.

Copy this json policy-document to a file locally, name it -> ec2-volume-assume-role-policy.json.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Next we’ll create the role.

aws iam create-role \
--role-name EC2-Volume-Cleaner \
--assume-role-policy-document file://ec2-volume-assume-role-policy.json

Expected response should be something like the below:

{
    "Role": {
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "sts:AssumeRole",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "lambda.amazonaws.com"
                    }
                }
            ]
        },
        "RoleId": "AROAIYVU6PREEAMD6ASSW",
        "CreateDate": "2017-04-24T16:22:45.837Z",
        "RoleName": "EC2-Volume-Cleaner",
        "Path": "/",
        "Arn": "arn:aws:iam::999999999999:role/EC2-Volume-Cleaner"
    }
}

We’ll then need to create a policy document granting permissions to ec2:DescribeVolumes & ec2:DeleteVolume.

Write a file name ec2-volume-policy.json with the below json contents.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DeleteVolume"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}
aws iam create-policy \
--policy-name ec2-volume-describe-delete \
--policy-document file://ec2-volume-policy.json

If successful you will get a response like below.

{
    "Policy": {
        "PolicyName": "ec2-volume-describe-delete",
        "CreateDate": "2017-03-30T10:24:41.549Z",
        "AttachmentCount": 0,
        "IsAttachable": true,
        "PolicyId": "ANPAIG4KVDVJP7HAWLMJU",
        "DefaultVersionId": "v1",
        "Path": "/",
        "Arn": "arn:aws:iam::999999999999:policy/ec2-volume-describe-delete",
        "UpdateDate": "2017-03-30T10:24:41.549Z"
    }
}

Grab the Arn arn:aws:iam::999999999999:policy/ec2-volume-describe-delete

We’ll need to attach a policy to the IAM role.

aws iam attach-role-policy \
--role-name EC2-Volume-Cleaner \
--policy-arn arn:aws:iam::999999999999:policy/ec2-volume-describe-delete

We’ll use this later to add permission for lambda function to run this.

The lambda function

We’ve chosen to do this with Python. So you will need to create a Python Lambda function.

The below api call will create the lambda function for you with the code in orphans.zip. Download the zip below.

Download orphans.zip

aws lambda create-function \
--function-name "test_function" \
--runtime python2.7 \
--handler orphans.lambda_handler \
--zip-file fileb://static/files/orphans.zip \
--description "Demo function for AWS Lambda orphaned EC2 volume cleaner." \
--role "arn:aws:iam::999999999999:role/EC2-Volume-Cleaner"

Upon success, AWS should return.

{
    "CodeSha256": "OSRaNa2YYoOQUa4RgD7mXewdj7941XHahsqIjkc8POw=",
    "FunctionName": "test_function",
    "CodeSize": 897,
    "MemorySize": 192,
    "FunctionArn": "arn:aws:lambda:eu-west-1:999999999999:function:test_function",
    "Version": "$LATEST",
    "Role": "arn:aws:iam::999999999999:role/EC2-Volume-Cleaner",
    "Timeout": 20,
    "LastModified": "2017-03-28T07:44:18.843+0000",
    "Handler": "orphans.lambda_handler",
    "Runtime": "python2.7",
    "Description": "Demo function for AWS Lambda orphaned EC2 volume cleaner."
}

Lambda add permissions

Corresponds to the aws-cli call aws lambda add-permission.

Remember the very first section the trigger

aws lambda add-permission \
--function-name test_function \
--statement-id a-schedule-event \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.com \
--source-arn "arn:aws:events:eu-west-1:999999999999:rule/test-event"

You should get a response like below.

{
    "Statement": "{\"Sid\":\"a-schedule-event\",\"Resource\":\"arn:aws:lambda:eu-west-1:602074169667:function:test_function\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"events.amazonaws.com\"},\"Action\":[\"lambda:InvokeFunction\"],\"Condition\":{\"ArnLike\":{\"AWS:SourceArn\":\"arn:aws:events:eu-west-1:999999999999:rule/test-event\"}}}"
}

Add the AWS event targets

In order for your event to invoke the new lambda function you must now tell the event to trigger your lambda ARN. Easily done with the following api call.

aws events put-targets \
--rule "test-event" \
--targets "Id=1,Arn=arn:aws:lambda:eu-west-1:999999999999:function:test_function"

The response should read;

{
    "FailedEntryCount": 0,
    "FailedEntries": []
}

Done

That’s all thats required.

You can now head to your lambda service page, follow the link to Cloudwatch from the monitoring tab and see the output from Lambda function.

As you may have noticed i’ve left commented out the delete.volume cmd as you should be using it at your own peril. Not mine :)

You could layer this up by perhaps looking at the time a given volume has been orphaned for etc. For now i’ll leave it at this and add extra blog links to any future permutations.

Lambda Python function code (for reference)

Just incase you missed it and what to take a peak at it yourself then heres the python code inside the orphans.zip.

Please note; the boto session is hard-coded here to use eu-west-1.

import boto3
import logging
from datetime import *
import pdb

# setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.WARNING)

# Get a boto session ready
session = boto3.session.Session(region_name="eu-west-1")
ec2 = session.client('ec2')

def lambda_handler(event, context):

    # Report header.
    report = "The Following Volumes were found as Orphaned: \n"
    volume_count = 0
    no_iops = False

    # Start a pagination object for the describe_volumes
    paginator = ec2.get_paginator('describe_volumes')

    # Create filter for only available therefore deemed 'orphaned' volumes.
    filters = [
        {
            'Name': 'status',
            'Values': ['available']
        },
    ]
    operation_parameters = {
        'Filters': filters,
    }

    # Unpack operation parameters with the filters
    page_iterator = paginator.paginate(**operation_parameters)

    # Loop each page of results
    for page in page_iterator:
        # Loop each volume in each page.
        for volume in page['Volumes']:
            if volume['State'] == 'available':
                # Register with the counter
                volume_count = volume_count + 1
                # Report addition
                try:
                    volume['Iops']
                except KeyError:
                    no_iops = True
                    pass
                report = report + "VolumeId: {} | State: {} | Size: {} | VolumeType: {} | Iops: {} | CreateTime: {}\n".format(
                    str(volume['VolumeId']),
                    str(volume['State']),
                    str(volume['Size']),
                    str(volume['VolumeType']),
                    '' if no_iops else str(volume['Iops']),
                    str(volume['CreateTime'])
                )
                # Take some action?
                # ec2.delete_volume(
                #     VolumeId=volume['VolumeId']
                # )

    if volume_count == 0:
        print("Nothing to report")
    else:
        print(report)

And that’s it. Beyond being quite useful in its self, it shows a potentially useful Build Serverless architecture for admin related tasks. Which is quite nice!


blog comments powered by Disqus