CloudMonix allows to automatically scale Azure Cloud Roles, Azure Web Apps, SQL Azure Databases and SQL Data Warehouses by adjusting the number of instances/tier-level to the actual or expected demand. Auto-scaling can be configured in the Scale Adjustments and Scale Ranges tabs in the Resource Configuration dialog.


It’s also possible to automatically (vertically) scale VMs, however the mechanism is different from described in this article. VMs are scaled UP/DOWN by executing resizing Actions.


Auto-scaling is available in the Ultimate plan or during the initial trial period.


If you have a scenario not covered by the article below, please open a support ticket with the CloudMonix team at http://support.cloudmonix.com we frequently assist customers with their specific nuanced scenarios.




How auto-scaling works in CloudMonix?


Auto-scaling is configured in Scale Adjustments and Scale Ranges tabs in the Resource Configuration dialog.



Scale Adjustments define how rapidly CloudMonix should change the number of instances (or tier levels), e.g. by 1 or by 5 at a time. It also specifies under what conditions the number of instances/tier-levels should be changed.


Scale Ranges complement Scale Adjustments by defining scaling boundaries, i.e. the minimum and maximum number of instances or upper/lower boundary for tier levels that can run at once. It is possible that the maximum and minimum numbers/tiers are equal, which then defines precise number of instances will be run or tier that will be allocated.


Often there are multiple Ranges defined for a single resource, e.g. scale boundaries might vary depending on the time of the day, day of the week or current metrics values. Therefore it’s also possible to specify under what conditions the specific Scale Range should be applied.


For example, many websites have higher traffic during the day, while not many visitors access them at night. There might be two Scaling Ranges defined “Day” and “Night”. During the “Day” CloudMonix can use Scale Range to limit instance counts from 3 to 7 at that time, and between 1 to 3 during the “Night” scaling range. The exact number of running instances can be adjusted automatically by Scale Adjustments based on the actual load conditions, but it’ll always respect the boundaries defined by active Scale Range.  



See the “Rule Evaluation in Detail” section for more information.




What can you auto-scale by?


Metrics


Scaling adjustments and ranges can be based on expressions that are made up of metrics. Any Metric available in any of the resources tracked by CloudMonix can be used for auto-scaling condition, including:

  • Standard metrics tracked for the scaled resource, including Aggregated and Derived metrics.

  • Metrics tracked for other resources within the same Account, such as Azure Service Bus Queues and Topics, Storage Queues, and other custom data from SQL Azure, JSON APIs, etc. Metrics tracked by other resources need to be imported into the scaling resource via Linked Metrics approach.


The auto-scaling can be either based on reactive or proactive indicators, depending on what metrics are used for defining conditions.


Auto-scaling based on reactive indicators


In case of the auto-scaling based on reactive indicators the number of instances is adjusted to the actual, observed load. The most popular metrics used for determining current load include CPU usage, free memory and number of Requests/sec.


For example, the following configuration is used to automatically spin up new instances when the 15-min. average CPU usage is higher than 50%.



Typically, the rules for adding instances have the corresponding rules for removing instances. So there should be another rule responsible for removing the instance, e.g. when the 15-min. average CPU usage gets lower than 25% and stays on that level for 10 min.



Auto-scaling based on proactive indicators


In case of the auto-scaling based on proactive indicators the number of instances is adjusted to the expected, predicted load. The most popular leading indicators used by CloudMonix customers, include Azure Service Bus queues lengths and custom data in SQL Azure obtained using SQL scripts.


Refer to the Can I auto-scale my Azure Cloud Services based on Azure Queues or Service Bus queues or topics? article to learn more.


The suitable metrics can be found by analyzing the system and determining what’s the first thing that shows the increasing demand. In many cases it’s helpful to have a look at historical data from the system, e.g. using CloudMonix Historical Dashboard.




Rule evaluation in detail


Rules execution order


During evaluation of auto-scaling rules, CloudMonix first evaluates the auto-scaling ranges (in order from first to last) to determine which auto-scaling range qualifies. The first qualifying auto-scaling range determines the boundaries for instance counts. Afterwards, CloudMonix evaluates conditions for auto-scaling adjustments (in order from first to last) to determine which adjustment needs to be applied.


In both cases, CloudMonix only picks the first qualifying scaling range and adjustment. The subsequent entries are ignored.


Users can manually adjust the order of their ranges and adjustments by clicking and holding the drag icon.


Once auto-scaling range and adjustment are found, CloudMonix will evaluate how many instances the resource should have.  If that number is different from the current number of instances, scaling action will occur and an alert with defined severity will be published.



Cooling period


Additionally, the user can control how rapidly the number of instances can change by specifying Cooling period.


It can be defined in the Definition tab of the Recourse Configuration dialog by providing Scale-DOWN and Scale-UP cooling periods. Those values represent the minimal wait period between successive scaling operations.


With the settings shown below, CloudMonix won’t add instances more often than every 15 min. and won’t remove them more often than every 30 min. The number of instances added and removed at once is specified in Scale Adjustments. Those are the default values provided by CloudMonix for all resources templates.

The cooling period is honored even when the condition specified by Scale Adjustments is fulfilled, which allows the system to stabilize after the recent scaling adjustment before CloudMonix decides if further changes are needed. If both Scale-DOWN and Scale-UP cooling periods were set to zero, then the scaling conditions would be evaluated in each monitoring cycle (every 1 min. for Professional and Ultimate plans). In most scenarios that wouldn’t be sufficient as the impact of the previous adjustment wouldn’t be yet reflected in metrics values.


Sustained period field


The Sustained period specifies for how long the condition must be true, before the action will be executed. That allows to prevent premature scaling adjustments caused by temporary spikes in metrics values.


In the case of scaling typically a shorter sustained period is used for scaling up operations, as it’s safer to have too many instances running to handle the load, and longer sustained period is used for removing instances.