Using Azure Availability Set to Increase SLA of Your VMs
Today’s users expect that any application is operational all the time. What could be easier achievable for some desktop or mobile application, could be an issue for web or SaaS applications. Most of Azure services implement high availability feature by design. For instance Azure SQL has 3 replicas in one datacenter by default, if some one replica dies 2 are still available and 3rd is immediately creating. But how the same could be achieve for VMs in IaaS model?
For this purpose Azure has service called Availability Set. In this article I will describe what the service is and how to use it. You will find there is nothing difficult to setup and you can even add the feature for already existing VMs as well.
What is Availability Set
The detailed information about Availability Set can be found on Azure documentation e.g. Manage the availability of virtual machines. As the title of article prompts the feature is designed to support high availability for Azure IaaS. But what it exactly means?
If you have only 1 instance of your VM, you are at the risk that the application running on the server will not be reachable if the VM dies for various reasons. Even if there are situations when this is acceptable, for instance if you run post processing tool on the server like listener of messaging queue, for most of applications it is a disaster. To cover such VM failures you have to run 2 or more instance of the same VM. But is it still the enough?
Each datacenter is organizing into number of Clusters, each Cluster contains number of racks, each rack contains number of servers. And finally each server runs number of VMs depends on size of each VM.
In terms of VM availability needs to take in account all components on the way to your server, including network routers and switches as well. From the picture above is obvious that the datacenter routers and aggregation routes and load balancers are scaled in several instances on each level. if one failed anther will take the work.
There is a risk within an Azure Rack that the hardware components like Top of Rack Switches or Power Distribution Units fail. And because you have no control where the VMs are created, it could happen all of them are in the same rack. This is solved by Availability Set.
The another situation is periodic patching and updates of underlying Azure platform. This is usually without any impact upon your virtual machines, however some of patches could require reboot of VMs to apply required updates to Azure platform. And Availability Set manages the reboots of your VMs to minimize impact on performance of your application.
Availability Set contains 2 configuration variables:
- Fault Domains – define VMs from the different fault domains are located in different rack, so if one rack failed there is a probability other rack will perform until VMs are recreated in different rack. Current setting is 1-3 FD groups.
- Update Domains – group your VMs for reboot purpose raised by platform update. The VMs in the one update domain will be restarted at the same time. The Azure restarts only one update domain at the time. Current setting is 1-20 UD groups.
The VMs are automatically placed into FD or UD simply by 1st VM is in the 1st FD, 2nd in 2nd FD, 3rd in 3rd FD and 4th VM is in 1st FD and so forth. Similarly for UDs.
Creating Availability Set
In the Portal just browse to Availability Set view and press Create button. Define name of the Availability Set resource, select Resource Group and Location. You can stay with the default values of 3 FDs and 5 UDs that are sufficient for the most applications.
Using the Portal for creating number of exactly same VMs is unproductive, but it takes just time. Browse to Virtual Machines view and press Create button, on next view select the Operating System (Ubuntu Server 14.04 LTS in my case). On the 1st page give the VM unique name, set user credentials, select Resource Group and Location. On 2nd page select VM size and finally on 3rd page setup networking and at very bottom select the Availability Set. Repeat as many times as many VMs you want to create.
Now you can switch into your Availability Set and see how your VMs are placed into FDs and UDs.
Of course creating such big deployment using the Portal is inefficient. And thus I will suggest to define Resource Manager Template(s) to doing all the job. It takes a little time to create (for instance using Visual Studio project) and running the deployment script creates all the resources.
If many instances of your VM cover software failure of operating system or application, Availability Set covers most possible hardware failures and minimizes impact on your application. It is really simple to create and define Fault or Update Domains that cover common issues with Azure platform and help your application remain healthy.