One of the biggest challenges network operators face is detecting and diagnosing transient issues. These are short-term and sporadic degradations in network performance that quite often go unnoticed, until the customer complains. Microbursts are especially problematic; these are very short spikes of packets received during a very small time interval, at a much higher rate than the configured bandwidth for a given queue.
Microbursts are typically experienced when there is a significant mismatch in speeds between ingress and egress interfaces, or when there are multiple sources firing off packets to a single queue that is shaped at a lower rate. There is, however, a fine balance between microburst avoidance through adequate buffer sizing, and incurring network performance penalties (such as additional latency) if buffers are sized too large.
Traditional network monitoring techniques, such as polling via SNMP or CLI, are not adequate in detecting microbursts. Even when polling via SNMP at a typical maximum frequency of once every 5 minutes, microbursts are too transient and short in nature that they are quite often missed. Microbursts can be costly for network operators if SLA penalties are tied to their underlying services, or if they over-engineer capacity in order to mitigate the problem. Operations time is also wasted if they are trying to troubleshoot an issue they can’t detect.
This is where the Junos Telemetry Interface (JTI) “QMON” Sensor, introduced in Junos 17.1R1 on MX series routers on MPC7E, MPC8E and MPC9E line cards, can help. It provides millisecond granularity and high watermark measurements for peak buffer occupancy to accurately detect microbursts.
The following video showcases a simple demo setup that captures how to stream and collect QMON sensor stats in order to detect the elusive microbursts.