In our previous blogs, we got introduced to the Amazon elastic file system and the Amazon elastic load balancer. In addition to this, we were successful in setting up our first EFS and ELB with EC2 instances in the later blogs. The purpose of elastic load balancer is to distribute the incoming traffic across multiple targets so that our server endures in high traffic scenarios. ELB ensures high availability and high tolerance. Also, EFS acts as a centralized file directory that can be accessed over web by hundred and thousands of EC2 instances or containers. It helps the targets or, in our case, EC2 instances under load balancer to access same filesystem which ensure data consistency and data persistency, even after failure of the instances. Although these services ensures high availability and data consistency, but setting up an architecture with these services is not a cost effective approach.
Let us assume that we have set up 3 ec2 instances in different availability zones with in a region, each having EFS mounted as a centralized file system. Also, we are going through high incoming traffic phase when all three of our EC2 instances are getting traffic all around the globe via load balancer. Now, what happens when incoming traffic will get lowered. Our 2 instances would be enough to handle incoming requests and our 3rd instance is of no use and resources would get wasted. We would paying for unnecessary and unused resources. Also, if there comes a time when all of 3 instances would not be sufficient to attend massive god-level traffic, then our architecture will fall. What we need here is flexibility, or moreover, need of automated scaling of our EC2 instances depending upon our need.
Amazon Auto Scaling
Amazon Web Services provides Auto Scaling which ensures that you have the correct number of Amazon EC2 instances available to handle the load for your application. It is a web service designed to launch or terminate Amazon EC2 instances automatically based on user-defined policies, schedules, and health checks. It can easily scale in and scale out instances as per defined policies. An auto scaling group is created which has thresholds for minimum and maximum no. of instances within the group. It ensures that your auto scaling group retains predefined minimum instance counts when scaling in occurs and stops the launching of further instances during scaling out process when maximum instance limit is achieved. Auto scaling provides features to attach load balancer with it so that traffic gets evenly distributed to instances within auto scaling group. Since newly auto scaled instances would have to access same content, so we can provision mounting of EFS on EC2 instances. Using Auto scaling in our architecture, we can achieve:
- Better fault Tolerance
- High Availability
- Better Cost Management
Auto scaling group can contain EC2 instances in one or more availability zones within the same region. It cannot span over regions. Every auto scaling group has a launch configuration associated with it. Launch configuration is a template that Auto scaling group uses to launch EC2 instances. It contains information about AMI ID, instance type, one or more security groups, subnet(s), key-pair, user data etc. User data contains set of commands that we want to get executed on the instances launched by auto scaling group. This launch configuration defines the attributes of your EC2 instances. We can also define metrics and custom scaling policies on our auto scaling groups. Scaling policies defines the policy by which instances will get scaled out or scale in. But before deploying scaling policies, we need to define cases over which scaling will occur.
Amazon CloudWatch, here, comes in action. CloudWatch monitors operational and performance metrics for AWS cloud resources and applications. Using it, we can define thresholds, crossing which an alarm will get triggered. Scaling Policies use these CloudWatch alarms to define the scaling actions. These alarms can be set on either of the metrics available like EC2 metrics, ELB metrics, RDS metrics etc. Then we can define thresholds for the metric (CPU utilization, RAM utilization, HealthyHostCount, Latency etc) in our alarm. When alarm is triggered, auto scaling group deploys scaling policies. For example, our auto scaling group will check for CPU utilization percentage at fixed interval. If it exceeds above 80 % for certain amount of time, then it triggers the alarm. Consequently, auto scaling group will launch a new instance. Also, if CPU utilization remains below 70 % for a fixed period then another alarm will be triggered and one instance will be terminated. Take a note that, despite the scaling policies, auto scaling group will always retain desired minimum instance count and will not exceed above maximum instance count limit.
EC2 instance Lifecycle
EC2 instances in auto scaling group has a path or lifecycle that differs from that of other EC2 instances. The lifecycle starts when the auto scaling group launches an instance and puts it into service. The lifecycle ends when you terminate the instance, or the auto scaling group takes the instance out of service and terminates it. We can also add lifecycle hooks to auto scaling groups. It enables us to perform custom action whenever a instance get launched or terminated. Following flowchart is from AWS docs illustrating the lifecycle of EC2 instances,
It has several other interactive features. We can receive notifications via Amazon Simple Notification Service whenever we use CloudWatch alarms to initiate auto scaling actions, or when auto scaling completes an action. We can also run On-Demand or Spot Instances, including those inside your VPC or high performance computing clusters.
This has been a brief introduction about AWS Auto Scaling, it features and applicability. In our next blog, we will begin with setting up auto scaling and understanding its various terminologies.