Netreo Essentials works by gathering metrics from monitored resources, pre-aggregating them if needed, evaluating them against conditions in alerts and actions if needed, and showing them on dashboards. Netreo Essentials supports a wide variety of data types that are output by various resources. Learn more here about specific metrics supported by specific resources.
Metrics are defined on the Metrics tab of a monitored resource. During initial setup of a resource it is likely that a template with various defaults was used, so when editing a particular resource it is very likely that some metrics have already been defined.
When defining metrics, some configuration needs to be provided for Netreo Essentials to effectively capture or surface them.
|Name||A metric's name is important. It is what is used in the expressions for Alerts and Actions. It is also how the metric is labeled in charts, dashboards and reports. Renaming of a metric that has been utilized in the expressions of an Alert or Action can lead to the malfunctioning of affected alerts or actions. Always test a resource using the "Test" button before saving to ensure that all expressions are working and that metrics are being captured.|
|Enabled||Metrics can be disabled when they no longer need to be captured, but when their configuration details are still important to preserve for future.|
|Method||The Web method that should be used to ping the website. Certain URLs may report a POST or a PUT instead of traditional GET.|
|Category||Every monitored resource supports metrics from multiple different categories. The Category field tells Netreo Essentials what type of a metric needs to be captured.|
|Metric Specific Fields||After specifying a category certain metric types may require additional parameters. For example, Performance Counters require Category and Counter names. SQL Query metrics require SQL script, etc.|
|Units||While not required, specifying Units will allow Netreo Essentials to visualize metrics in a more friendly way on dashboards and reports.|
|Dashboard Highlight||Netreo Essentials, when requested, can highlight metrics on dashboards. Up to 3 numeric metrics can be highlighted as gauges and any number of non-numeric metrics as additional views on an individual resource dashlet.|
|Best and Worst possible values||When a numeric metric is requested to be highlighted on a dashboard, best and worst values are needed to facilitate proper color display in a gauge. Netreo Essentials will adapt the display properly when Best is bigger than Worst or when Worst is bigger then Best.|
Metric Data Types
In order to work with metrics in the expressions of alerts and actions, it is important to understand their data types. Netreo Essentials understands metrics of the following data types:
- Numeric values
- Complex objects
- Lists/arrays of the previously mentioned types
- It does not, however, understand metrics with a hierarchy
For example, a CPU utilization reading is a numeric metric. The date of the oldest message in a queue is a date metric. Availability status of a resource is a string metric. An event log entry is a complex (but flat) object. And finally, a list of processes is an array of process metrics.
When evaluating metrics in expressions it is important to know which data types are being evaluated so that proper expressions can be built. Learn more about building expressions here.
Netreo Essentials is able to surface metrics of any type as either charts (numeric) or separate views (all other) on the dashboard. Certain common non-numeric metrics, such as event logs, process lists, connection lists etc. are displayed with their own icon on the dashboard, while others are displayed with a generic star icon.
When evaluating expressions within the context of a particular resource, it may be helpful to know the state of a different resource. For example, when evaluating the need to scale up an Azure Worker Role, it may be important to evaluate not only performance characteristics of individual workers inside the role, but also the depth of the processing queue that is distributing work. Since queue depth metrics are typically captured in queue-specific resources (such as Azure Storage or Azure Service Bus), Netreo Essentials allows for linking of metrics from one resource to another via the "LinkedMetric" metric type.
Linking metrics is a powerful Netreo Essentials feature that allows you to evaluate conditions across entire environments holistically.
When evaluating conditions for alert or actions, it may be important to evaluate metrics over a period of time, so that alerts or actions are not too reactive to intermittent fluctuations of metrics but do trigger on sustained values. One way to do this is to use the Sustained Timeout setting on alerts and actions. Another way is to aggregate metric values across a period of time, and possibly across multiple instances of the same resource, for such resources as Azure Cloud Role, Azure VM Availability Set and Server Farm.
The maximum period for aggregation is 60 minutes. Aggregation methods depend on the data type of the metric. Because metric data types in addition to being numeric, can also be text, date or even arrays of complex numbers, Netreo Essentials has a sophisticated aggregation engine to help get at and evaluate the right data for the purposes of alert and action execution.
There are four core components to metric aggregation:
- Source metric - the raw metric that is being aggregated.
- Time period - the period over which the data is analyzed (maximum of 60 minutes).
- Filter - an optional expression that pre-filters gathered data before aggregation takes place.
- Aggregation method - the approach to aggregation (e.g., average, max, sum, min, last, previous, count, etc.).
There are a number of possible ways to aggregate:
- Average - Calculates the average of collected metric values over a period of time and that may have optionally been pre-filtered. Works on numeric metrics or the numeric properties of complex metrics. Returns a numeric value.
- Sum - Totals the collected metric values over a period of time and that may have optionally been pre-filtered. Works on numeric metrics or the numeric properties of complex metrics. Returns a numeric value.
- Min - Finds the smallest value within metrics collected over a period of time and that may have optionally been pre-filtered. Works on numeric metrics or the numeric properties of complex metrics. Returns a numeric value.
- Max - Finds the largest value within metrics collected over a period of time and that may have optionally been pre-filtered. Works on numeric metrics or the numeric properties of complex metrics. Returns a numeric value.
- Last - Finds the last value within metrics collected over a period of time and that may have optionally been pre-filtered. Works on any metric type. Returns a single instance of an aggregated metric.
- Previous - Finds the one-before-last value within metrics collected over a period of time and that have optionally been pre-filtered. Works on any metric type. Returns a single instance of an aggregated metric.
- Count - Calculates the number of metrics collected over a period of time and that may have optionally been filtered. Works on any metric type. Returns an integer.
- AnyMatch - Detects if any metrics collected within a period of time fall within a filter expression. Works on any metric type. Returns a boolean.
- AllMatch - Detects if all metrics collected within a period of time fall within a filter expression. Works on any metric type. Returns a boolean.
- NoneMatch - Detects if no metrics collected within a period of time fall within a filter expression. Works on any metric type. Returns a boolean.
- How many times in the last x minutes was my resource not Ready? (aggregate metric: Status, aggregation method: Count, filter: Status != "Ready", time period: x minutes)
- What is the previous value of my metric so I can compare it with the current value to see if it has changed? (aggregation method: PreviousValue, filter: none, time period: 5-10 minutes)
- Were there any event logs in the last few minutes that came from a particular application and had a severity of Error? (aggregate metric: EventLogs, aggregation method: AnyMatch, filter: metric.EntryType == "Error" && metric.Source == "SomeApp")
- Is process x running? More precisely, is process x present in the ProcessList metric? (aggregate metric: ProcessList, time period: Current Cycle Only, filter: metric.Name == "X")