Frequently terminating EC2 via AWS Lambda

3 min readOct 7, 2019

In todays distributed, highly-available systems we see often times chaos-monkey style tooling to randomly kill/stop services/processes/nodes to ensure the overall systems stability and resilience.

For the impatient, here the github repo :D

This is a 2-part post, where this first part considers the Lambda function itself and the environment. The second part focusses on deploying the lambda function to AWS by using Terraform, according to Infrastructure-As-Code.
This small series doesn’t focus on explaining the basics of AWS Lambda or Terraform, but more on actually how to use it.

For starters, here some good intros:
* AWS Lambda : run your code without dealing with underlying resources (“serverless”). AWS takes care of the required infrastructure to execute your function
* Terraform : declarative way to provision infrastructure, on various providers (AWS, GCloud, Vagrant, …)

Environment
Recently I had to implement some “kind of chaos monkey” for an AWS environment consisting of:

AWS (classic) Loadbalancer
3 ASGs (Autoscaling groups), configured as targets of the Loadbalancer
a bunch of EC2 instances in each ASG

The EC2 instances include a user_data section, from where at initial launch an application is getting set up.

Requirement(s)
To prove resilience, I wanted to frequently terminate the “oldest running” EC2 instance which is part of an ASG behind a classic ELB, but just if all desired number of instances from the autoscaling-groups are “in service”. This just as a safety measure, not to terminate instances if others are out-of-service as well.

How?
Since I didn’t want to launch an additional EC2 just to run a cronjob for terminating other instances, I went for AWS Lambda , which is a perfect fit to cover my needs. I decided to go with Python as runtime, since I am most comfortable compared to the other runtimes.
To keep the usage as simple as possible, I only want to provide the name of the loadbalancer as input to the lambda function. The function shall determine which EC2 is going to be terminated at the end.
In addition to that, and since I am highly addicted to Infrastructure-As-Code, I wanted to have the deployment in declarative way, and automated….via Terraform (but, as already mentioned, is content of Part II of this blog series)

The Lambda function
The first kind of challenge was to dive into the Boto3 doc, to figure out the ways to gather information from the different AWS resources I need. Very well documented and self-explanatory it quickly turned out to structure like the following:

calculating the sum of desired number of instances from all the involved ASGs was not that straightforward as I initially thought.
1. I had to fiddle through all ASGs
2. filtering the ones which are assigned to the loadbalancer, then
3. sum up the desired count of EC2 instances.
The loadbalancer which should be considered, is given as parameter to this function

as entry point for the Lambda execution, there is the function lambda_handler , where the EC2 instance is being discovered, which will be terminated at the end.
The function
- checks parameter DRY_RUN, to determine if the command to terminate shall be executed really, or just simulated
- loops over the EC2 instances which are behind the loadbalancer as per the provided parameter “LB_NAME”
- checks if the EC2 instance status is “InService”, then determines the oldest EC2 instance based on their “launch_time”
- if enough healthy instances are available, the oldest EC2 instance will be terminated

The full implementation of the lambda function is shown below (which can be used to paste into the AWS Lambda management console and triggered for testing purposes)

To test the function you can simply use AWS Lambda console, providing the parameter manually and click button “Test” to trigger the execution.

All the log outputs (print statements) are being transferred to CloudWatch. You can have a look by navigating to Cloudwatch console and open the related CloudWatch LogGroup, by default /aws/lambda/ec-terminator

This was part one of the story. Next step is to use Terraform to deploy our lambda function to AWS, since we want to apply Infrastructure-As-Code best practices ;)

How this is implemented outlines part II of this series.

Have fun playing around with Lambda and Terraform. Hope you enjoyed reading !!

Frequently terminating EC2 via AWS Lambda

Written by Gerd Koenig