Overview

Information in this article is related to using CloudMonix for monitoring Azure Windows VM Scale Sets.  Since Azure Service Fabric and Azure Batch services run on top of VM Scale Sets, they can also be monitored thru VM Scale Set monitoring functionality. 


The article covers the following topics:

  • common use cases where CloudMonix can help with monitoring and automation

  • what is needed to connect to and monitor an Azure Windows VM Scale Set

  • what metrics CloudMonix tracks, visualizes and monitors

  • what automated actions can be executed by CloudMonix


Why use CloudMonix for Azure Windows VMSS and Service Fabric?

Popular usages of CloudMonix include the following examples:

  • Track server key performance metrics based on any Windows Performance Counters and System/Application Event Logs

  • Automatically adjust the number of VMs based on the actual demand or according to schedule

  • Restart all VMs in a set once per day, one at a time to keep them fresh

  • Shutdown Scale Sets during off hours

  • Ensure VMs availability

  • Reboot individual VMs if they run low on memory

Configuration

Azure VM Scale Set monitoring can be configured either via Setup Wizard or by using the “Add New” button in the dashboard. It’s highly recommended to use Setup Wizard when configuring permissions for the first time, as that will simplify authorization. Learn more about authorizing with Setup Wizard here.




During configuration it’s necessary to specify the Resource Group, Resource Name, and if available Deployment Id for the monitored resource. It’s also necessary to select the Storage Account that should be used for storing data from Diagnostics Extensions, if the account is not populated automatically.


CloudMonix will automatically configure Azure Diagnostics to start tracking the metrics.


Do NOT modify Diagnostics configuration checkbox: If required, it’s possible to prevent CloudMonix modifying the Diagnostics configuration, however in such scenario users are fully responsible for managing configuration and updating it for all nodes every time. Learn more here.


Do NOT auto-update my computer nodes checkbox: Azure doesn't automatically deploy configuration changes to all nodes in the Scale Set, therefore CloudMonix will ensure that all nodes use the same configuration by propagating the changes. If required, it's possible to prevent CloudMonix automatically propagating configuration changes to all nodes in the Scale Set, but it's important to understand the potential consequences of doing so. Learn more here.


To use CloudMonix’s auto-scaling feature, users should disable native Azure auto-scaling in the Azure portal.  It is also highly recommended to disable Azure’s Over-Provision feature, if the user intends to use auto-scaling from CloudMonix.  Over-Provision feature deploys extra VMs during a scale event and then removes unneeded ones, learn more about it here.  The extra VM counts can conflict with tracking of current instance quantities in CloudMonix.




Metrics

Every diagnostic data point that CloudMonix retrieves from the monitored resource is considered a metric in CloudMonix. Refer to the Metrics article to learn more about metrics in general.


CloudMonix provides the default templates for monitoring Azure VM Scale Set:


  • Sample configuration for basic Windows Azure VM Scale Set, 

  • Sample configuration for IIS farm on Azure VM Scale Set



The metrics can be added, removed and customized in the Metrics tab in the resource configuration dialog.


Built-in Metrics

ResourceStatus

Tracks the overall running status of the monitored instances within Scale Set. This is a critical metric that is captured for most types of resources that CloudMonix tracks. It is used for Uptime reports and should not be removed.

  • Data Type: string

  • Possible values: Ready, Down, Unknown

  • Included in sample profile: yes, in both profiles tracked as a metric called Status

  • Included in default alerts: yes, in an alert:

    • ResourceOutage (Error): Raises an alert when monitored server is reported as not-Ready by Azure of if no metrics come through from diagnostic agents, for at least 5 min.

Statuses are determined according to the following rules:


  • Ready - successfully connected to the resource

  • Down - there was an error when trying to retrieve data from the resource

  • Unknown - can’t connect to the resource (e.g. because of invalid credentials)


WindowsPerformanceCounter

Windows Performance Counter is one of the most popular metric types. Windows OS and applications running on it publish a large number of performance counters that highlight various aspects of performance indicators, health, uptime, etc. In order to learn more about the most popular counters refer to the Monitor Windows Server with Performance Counters article. The Performance Counter class documentation explains how to consume and define custom counters, should there be a need for CloudMonix to track user-generated diagnostic data. 

CloudMonix can track any published performance counter. Each performance counter that CloudMonix should track must be defined as an individual metric in the Resource Configuration dialog.


  • Data Type: double

  • Included in sample profile: yes:

  • Performance Counter Metrics included in both sample templates:

    • CPUTime: Processor(_Total)\ % Processor Time

    • CpuTime30MinAverage: CPUTime aggregated over 30 min.

    • DiskFreeSpaceTotal: LogicalDisk(_Total)\Free Megabytes

    • DiskIdleTime: PhysicalDisk(_Total)\% Idle Time

    • DiskReadSpeed: PhysicalDisk(_Total)\Avg. Disk sec/Read 

    • DiskWriteSpeed: PhysicalDisk(_Total)\Avg. Disk sec/Write

    • MemoryCommittedPct: Memory\% Committed Bytes In Use

    • MemoryFree: Memory\Available MBytes

  •  Metrics included in the Sample configuration for IIS farm on Azure VM ScaleSet template

    • AspNetApplicationRestarts: ASP.NET\Application Restarts

    • AspNetBytesOut: ASP.NET Applications(__Total__)\Request Bytes Out Total

    • AspNetErrors: ASP.NET Applications(__Total__)\Errors Total/Sec

    • AspNetRequests: ASP.NET Applications(__Total__)\Requests/Sec

    • AspNetRequestsQueued: ASP.NET\Requests Queued

    • AspNetRequestsRejected: ASP.NET\Requests Rejected

    • AspNetRequestWaitTime: ASP.NET\Request Wait Time

  • Included in default alerts: yes:

    Alerts included in both sample templates:

    • High CPU (Warning): Raises an alert when CPU utilization is over 70% for the last 5 minutes sustained

    • Low Memory (Warning): Raises an alert if the amount of available physical memory on a specific instance, falls below 100MBs for the last 2 monitoring cycles sustained

    • Low Disk Space (Warning): Raises an alert when any of the disks has less than 1GB of free space left

  • Alerts included in the Sample configuration for IIS farm on Azure VM ScaleSet template:

    • Requests are Queueing Up (Warning): Raises an alert when the number of queued requests exceeds 10, for 5 minutes sustained.  Queued requests indicate that IIS or backened processes are not able to process the requests quickly enough



WindowsPerformanceCounterMultiInstance

  • Data Type: double

  • Included in sample profile: yes, tracked as a metric:

    • DiskFreeSpace: LogicalDisk\Free Megabytes

  • Included in default alerts: yes, included in both profiles:

    • Low Disk Space (Warning): Raises an alert when any of the disks has less than 1GB of free space left


AzureVirtualMachineOperations

  • Data Type: array of objects with the following properties:

  • Name (string): Operation name.

  • Category (string): Event category.

  • Description (string): Event description.

  • Caller (string): Caller.

  • EventName (string): The event name.This value should not be confused with operation name.

  • Level (string): Event level.

  • Status (string): The event status. Possible values include: Started, Succeeded, Failed.

  • SubStatus (string): The event sub status. Most of the time, when included, this captures the HTTP status code.

  • ExtendedInfo (string):The values of all properties of the EventData object displayed as Key-Value pairs, where keys are property names.

  • EventTimestamp (DateTime): The occurrence time of an event.

  • Can be accessed only through aggregation using Expressions described in the Working with Expressions article in Evaluating data in sets\arrays (advanced) section.

  • Included in sample profile: no

  • Included in default alerts: no


AzureVmssInstanceDetails

Tracks detailed information about Azure VM instances as a list.

  • Data Type: an array of objects with the following properties:

    • Instance (string)

    • Size (string)

    • ProvisioningState  (string)

    • PowerState  (string)

    • AgentState (string)

    • StateDetails (string)

  • Included in sample profile: no

  • Included in default alerts: no


ResourceInstanceCount

Tracks the current number of VMs in the scale set.

  • Data Type: double

  • Included in sample profile: no

  • Included in default alerts: no


WindowsEventLogEntry

  • Data Type: double

  • Included in sample profile: yes, in both profiles tracked as metrics called ApplicationsEventLogs, SystemEventLogs

  • Included in default alerts: no



Alerts

Users can create alerts based on changes in any value tracked by CloudMonix (including custom metrics). Each resource template includes alerts which are suitable for a given resource.

Refer to the Alerts article to learn more. The predefined alerts for Azure Windows VM Scale Set are listed in the Metrics section.


Alerts are available during the Trial period or in Professional and Ultimate plans only.



Automation

Automation features (Actions) allow users to set up powerful reactive, proactive and scheduled actions. CloudMonix can execute actions when a specific monitoring condition occurs or according to a schedule. Refer to the Actions article to learn more about automating VM Scale Sets reboots.


Automation features are available during the Trial period or in the Ultimate plan only.


As a general rule, every new action should specify the appropriate Suspended period and Sustained period values. See Automating Actions article to learn more about those settings.



Built-in Actions


AzureVmScaleSetInstanceReboot

CloudMonix will request Azure to reboot the specified VM. 


Evaluated and executed on an individual VM level. Available when “Evaluate this condition by individual instance?” is set to true.


  • Included in the default profiles in the following actions, which have to be explicitly enabled:

    • Daily reboot (Warning): reboots VMSS instances one per day, one instance at a time. 

    • Low Ram Reboot (Warning): Reboot VMSS  instance if available memory drops below 100MB for 5 minutes sustained.  This action will not be executed more than once per hour due to Suspended period setting.


AzureVmScaleSetInstanceReimage

CloudMonix will request Azure to re-image the specified VM. 


Evaluated and executed on an individual VM level. Available when “Evaluate this condition by individual instance?” is set to true.



AzureVmScaleSetStart

CloudMonix will request that a particular VMs Scale Set is started. 


Evaluated and executed on a Scale Set level. Available when “Evaluate this condition by individual instance?” is set to false.


AzureVmScaleSetStopDeallocate

CloudMonix will request that a particular VMs Scale Set is shutdown and deallocated (i.e. resources are released). Deallocating VMs helps to lower the costs as Azure doesn’t charge for deallocated resources.


Evaluated and executed on a Scale Set level. Available when “Evaluate this condition by individual instance?” is set to false.


Auto-scaling

Auto-scaling allow users to set up powerful reactive, proactive and scheduled auto-scaling rules. CloudMonix can execute scale adjustments when a specific monitoring condition occurs or according to a schedule. See the Auto-scaling article to learn more about Auto-scaling VM Scale Sets.


To use CloudMonix’s auto-scaling feature, users should disable the Azure auto-scaling flag in the Azure portal. Every resource that uses CloudMonix auto-scaling should define Scale-down cooling period and Scale-up cooling period or Sustained period values.


Auto-scaling features are available during the Trial period or in the Ultimate plan only.


Built-in Auto-scaling rules


ScaleDown Action

Adds the specified number of instances to the Scale Set.

  • Included in the default profiles in the following Scale Adjustments rule, which has to be explicitly enabled:

    • Scale Down (CPU) (Warning): Scales down VMSS when 30-minute average CPU utilization across all instances has been under 20% for 10 minutes sustained.


ScaleUp Action

Removes the specified number of instances from the Scale Set.

  • Included in the default profiles in the following Scale Adjustments rule, which has to be explicitly enabled:

    • Scale Up (CPU) (Warning): Scales up the number of instances by 1 when 30-minute CPU utilization average across all instances within monitored VMSS exceeds 70%.


Scale Ranges 

Allow defining the overall limits on the number of instances.

  • Included in the default profiles in the following Scale Adjustments rule, which has to be explicitly enabled:

    • Overall Scaling Limit (Warning): Basic scaling range that enforces a minimum of 3 instances for the monitored VMSS.