CloudMonix works by gathering metrics from monitored resources, pre-aggregating them if needed, evaluating them against conditions in alerts and actions if needed, and showing them on dashboards. CloudMonix supports a wide variety of data types that are output by various resources. Learn more here about specific metrics supported by specific resources.
Metrics are defined on the Metrics tab of a particular resource. During initial setup of a resource, it is likely that a template with various defaults was used, so when editing a particular resource it is very likely that some metrics have already been defined.
When defining metrics, some configuration needs to be provided for CloudMonix to effectively capture or surface them.
|Name||Metric's name is important. It is what's used in Alert's and Action's expressions. It is also how the metric is displayed in charts, dashboards, and reports. Renaming of a metric that has been utilized in Alert's or Action's expressions can lead to malfunction of effected alerts or actions. Always "Test" the resource before saving to ensure that all expresions are working and metrics are being captured.|
|Enabled||Metrics can be disabled when they no longer need to be captured but when their configuration details are important to preserve for future|
|Method||Web method that should be used to ping the website. Certain URLs may report a POST or a PUT instead of traditional GET|
|Category||Every monitored resource supports metrics from multiple different categories. Category field tells CloudMonix what type of a metric needs to be captured.|
|Metric Specific Fields||After specifying category, certain metric types may require additional parameters. For example, Performance Counters require Category and Counter names. SQL Query metrics require SQL script, etc.|
|Units||While not required, specifying Units will allow CloudMonix to visualize metrics in a more friendly way on dashboards and reports|
|Dashboard Highlight||CloudMonix, when requested, can highlight metrics on dashboards. Up to 3 numeric metrics can be highlighted as gauges and any number of non-numeric metrics as additional views on an individual resource dashlet|
|Best and Worst possible values||When numeric metric is requested to be highlighted on a dashboard, best and worst values are needed to faciliate proper color display in a gauge. CloudMonix will adapt display properly when Best is bigger than Worst, or when Worst is bigger then Best.|
Metric Data Types
In order to work with metrics in expressions of alerts and actions, it is important to understand their data types. CloudMonix understands metrics of the following data types: numeric values, dates, strings, booleans, complex objects, and lists/arrays of previously mentioned types. It does not, however, understand metrics with a hierarchy.
For example, a CPU utilzation reading is a numeric metric. The date of the oldest message in the queue is a date metric. Availability status of a resource is a string metric. An event log entry is a complex (but flat) object. And finally, a list of processes is an array of process metrics.
When evaluating metrics in expressions, it is important to know which data types are being evaluated, so that proper expressions can be built. Learn more about building expressions here.
CloudMonix is able to surface metrics of any type as either charts (numeric) or separate views (all other) on the dashboard. Certain common non-numeric metrics, such as event logs, process lists, connection lists, etc are displayed with their own icon on the dashboard, while others are displayed with a generic star icon.
When evaluating expressions within the context of a particular resource, it maybe helpful to know what is the state of a different resource. For example, when evaluating the need to scale up an Azure Worker Role, it maybe important to evaluate not only performance characteristics of individual workers inside the Role but also the depth of the processing Queue that is distributing work. Since Queue depths metrics are typically captured in Queue-specific resources (such as Azure Storage or Azure Service Bus), CloudMonix allows for linking of metrics from one resource to another via LinkedMetric metric type.
Linking metrics is a powerful CloudMonix feature that allows to evaluate conditions across entire environment holistically.
When evaluating conditions for alert or actions, it maybe important to evaluate metrics over a period of time, so that alerts or actions are not too reactive to intermittent fluctuations of metrics and but trigger on sustained values. One way to do this is to use Sustained Timeout setting on alerts and actions. Another way, is to aggregate metric values across a period of time and possibly across multiple instances of the same resource, for such resources as Azure Cloud Role, Azure VM Availability Set, and Server Farm
Maximum period for aggregation is 60 minutes. Aggregation methods depend on data type of the metric. Because metric data types in addition to being numeric, can also be text, date, or even arrays of complex numbers, CloudMonix has a sophisticated aggregation engine to help get at and evaluate the right data for purposes of alert and action execution
There are four core components to metric aggregation
Source metric - raw metric that is being aggregated
Time period - period over which data is analyzed, maximum of 60mins
Filter - optional expression that prefilters gathered data before aggregation takes place
Aggregation method - approach to aggregation (ie: average, max, sum, min, last, previous, count, etc)
There is a number of possible ways to aggregate
Average - calculates an average of collected metrics over period of time and that have been optionally pre-filtered. Works on numeric metrics, or numeric properties of complex metrics. Returns numeric value.
Sum - totals collected metrics over period of time and that have been optionally pre-filtered. Works on numeric metrics, or numeric properties of complex metrics. Returns numeric value.
Min - finds smallest value from collected metrics over period of time and that have been optionally pre-filtered. Works on numeric metrics, or numeric properties of complex metrics. Returns numeric value.
Max - finds largest value from collected metrics over period of time and that have been optionally pre-filtered. Works on numeric metrics, or numeric properties of complex metrics. Returns numeric value.
Last - finds last value from collected metrics over period of time and that have been optionally pre-filtered. Works on any metric type. Returns single instance of aggregated metric.
Previous - finds one-before-last value from collected metrics over period of time and that have been optionally pre-filtered. Works on any metric type. Returns single instance of aggregated metric.
Count - finds number of collected metrics over period of time and that have been optionally filtered. Works on any metric type. Returns integer.
- AnyMatch - finds if any metrics within period of time fall within filter expression. Works on any metric type. Returns boolean.
AllMatch - finds if all metrics within period of time fall within filter expression. Works on any metric type. Returns boolean.
- NoneMatch - finds if no metrics within period of time fall within filter expression. Works on any metric type. Returns boolean
How many times in the last X minutes was my resource not Ready (aggregate metric: Status, aggregation method: Count, filter: Status != "Ready", time period: X minutes)
What is the previous value of my metric so I can compare it to current value to see if it has changed? (aggregation method: PreviousValue, filter: none, time period: 5-10 minutes)
- Were there any event logs in the last few minutes that came from a particular application and had a severity of Error (aggregate metric: EventLogs, aggregation method: AnyMatch, filter: metric.EntryType == "Error" && metric.Source == "SomeApp")
- Is the process X running? More precisely, is the process X present in the ProcessList metric? (aggregate metric: ProcessList, time period: Current Cycle Only, filter: metric.Name == "X")