Using OpenNTI As A Collector For Streaming Telemetry From Juniper Devices: Part 3

This is the third part of a series of blog posts that are meant to serve as a Quick Start Guide for getting up and running with streaming real-time telemetry data from Juniper devices and collecting it using an open-source tool called OpenNTI.

In Part 1 of this blog series we covered the two types of telemetry formats supported by Junos, namely Native and OpenConfig Streaming, and discussed how to install OpenNTI and verify that it’s up and running.  In Part 2, we looked at the Junos configurations required to support Native Streaming, as well as verification checks in InfluxDB to ensure data is being collected, along with some basic steps in getting a Grafana dashboard up and running to visual the telemetry data.  In this third and last part of the blog series, we will focus on the OpenConfig flavor of telemetry streaming.

OpenConfig Format (gRPC Streaming)

Unlike Native Streaming, the OpenConfig format for Juniper telemetry does not require that the collector (OpenNTI) be reachable via inband connectivity.  Instead, OpenConfig gRPC sensor subscriptions and telemetry data are transmitted to and from the Routing Engine (RE) via out-of-band connectivity viz. the router’s management interface (eg. fxp0).  This is depicted in Figure 1 below.

openconfig-out-of-band
Figure 1:  Streaming OpenConfig Telemetry via Out-of-Band

Prerequisites:  OpenConfig & Network Agent Packages

The first step in enabling gRPC telemetry streaming from the Junos device is to ensure that a couple of prequisite packages have been installed:

  1. OpenConfig package:  OpenConfig is an industry effort to express network configuration and operational data (including telemetry) using vendor-neutral data models written in YANG.  The OpenConfig package includes a full set of these YANG data models, as well as scripts to translate OpenConfig schemas to Junos OS schemas for each supported release.  The package, named “OpenConfig Package (JUNOS)”, can be found on the Juniper software download site here.
  2. Network Agent package:  The Network Agent is a component that is independent of Junos, and it implements the gRPC server for the external clients (eg. gRPC clients, such as the OpenNTI Telegraf Collector) to communicate with and subscribe to various sensors.  In addition, the Agent implements a backend configuration channel to allow the gRPC server to communicate with Junos via NETCONF, and it also implements an internal message bus which Junos uses to publish all telemetry data.  The package, named “JUNOS Network Agent”, can also be found on the Juniper software download site.  For example, for an MX960, the package can be found here.

To check if the device already has the OpenConfig package installed, issue the following command, where the output should display the package name and version.

root@techmocha1> show version | grep openconfig
JUNOS Openconfig [0.0.0.3]

root@techmocha1>

Similarly, to check if the device already has the Network Agent package installed, issue the following command below.

root@techmocha1> show version | grep "na telemetry"
JUNOS na telemetry [20161109.201405_builder_junos_161_r3]

root@techmocha1>

In the event that neither of the prerequisite packages are installed, the first step is to download them from the Juniper software download page(s) as described above, and ensure that they both have been copied onto the device filesystem in directory /var/tmp, as shown below:

root@techmocha1:~ # cd /var/tmp
root@techmocha1:/var/tmp # ls -l *.tgz
-rw-r--r-- 1 root wheel 495918 Aug 20 12:05 junos-openconfig-x86-32-0.0.0.3.tgz
-rw-r--r-- 1 root wheel 2018717 Aug 20 11:48 network-agent-x86-32-16.1R3.10-C1.tgz
root@techmocha1:/var/tmp #

Once the packages have been copied over, simply issue the “request system software add” command for each package.  Below is the screen capture from installing the Network Agent package:

root@techmocha1> request system software add network-agent-x86-32-16.1R3.10-C1.tgz 
NOTICE: Validating configuration against network-agent-x86-32-16.1R3.10-C1.tgz.
NOTICE: Use the 'no-validate' option to skip this if desired.
Verified network-agent-x86-32-16.1R3.10-C1 signed by PackageProductionEc_2016
Adding na-telemetry-x86-32-20161109.201405_builder_junos_161_r3 ...
Initializing...
Mounting os-libs-10-x86-64-20160927.337663_builder_stable_10
[... CONTENT OMITTED FOR BREVITY ...]
Mounting fips-mode-x86-32-20161109.020516_builder_junos_161_r3
Hardware Database regeneration succeeded
Validating against /config/juniper.conf.gz
mgd: commit complete
Validation succeeded
Mounting na-telemetry-x86-32-20161109.201405_builder_junos_161_r3
Rebuilding schema and Activating configuration...
mgd: commit complete
Restarting MGD ...

WARNING: cli has been replaced by an updated version:
CLI release 16.1R3.10 built by builder on 2016-11-09 04:29:34 UTC
Restart cli using the new version ? [yes,no] (yes) yes 

Restarting cli ...
root@techmocha1>

Below is the screen capture from installing the OpenConfig package:

root@techmocha1> request system software add junos-openconfig-x86-32-0.0.0.3.tgz
NOTICE: Validating configuration against junos-openconfig-x86-32-0.0.0.3.tgz.
NOTICE: Use the 'no-validate' option to skip this if desired.
Verified junos-openconfig-x86-32-0.0.0.3 signed by PackageDevelopmentEc_2017
Adding junos-openconfig-x86-32-0.0.0.3 ...
Initializing...
[... CONTENT OMITTED FOR BREVITY ...]
Mounting junos-openconfig-x86-32-0.0.0.3
Scripts syntax validation : START
openconfig-bgp.slax: script check succeeds
openconfig-interface.slax: script check succeeds
openconfig-lldp.slax: script check succeeds
openconfig-local-routing.slax: script check succeeds
openconfig-mpls.slax: script check succeeds
openconfig-policy.slax: script check succeeds
Scripts syntax validation : SUCCESS
[... CONTENT OMITTED FOR BREVITY ...]
Mounting fips-mode-x86-32-20161109.020516_builder_junos_161_r3
Hardware Database regeneration succeeded
Validating against /config/juniper.conf.gz
mgd: commit complete
Validation succeeded
Mounting junos-openconfig-x86-32-0.0.0.3
WARNING: invalid option: -norender
TLV generation: START
TLV generation: SUCCESS
Rebuilding schema and Activating configuration...
mgd: commit complete
Restarting MGD ...

WARNING: cli has been replaced by an updated version:
CLI release 16.1R3.10 built by builder on 2016-11-09 04:29:34 UTC
Restart cli using the new version ? [yes,no] (yes) yes 

Restarting cli ...
root@techmocha1>

Junos Configuration

Configuring OpenConfig format telemetry on a Juniper router is even simpler than Native format in that it only really requires the same three lines of Junos configuration (to configure the gRPC server) each and every time:

set system services extension-service request-response grpc clear-text port 50051
set system services extension-service request-response grpc skip-authentication
set system services extension-service notification allow-clients address 0.0.0.0/0

NOTE:  Unlike Native format, where all aspects of telemetry streaming (collector server, export profile, sensors) are configured via Junos CLI, OpenConfig format only requires the gRPC server params to be configured.  The actual sensor subscriptions are performed via gRPC subscription requests made by external client applications (eg. OpenNTI Telegraf Collector).

Telegraf Collector Configuration

Recall from Part 1 of this blog series, in the section titled “Verifying Installation“, that OpenNTI is actually a collection of Docker containers.  One of these containers, called “opennti_input_oc“, is dedicated to the Telegraf collector which supports gRPC streaming.  In order to subscribe to sensors on a Junos device via gRPC and enable telemetry collection, there is a Telegraf configuration file within this opennti_input_oc container which must first be modified with the appropriate device and sensor related parameters.

The first step is to verify that the opennti_input_oc container is actually up and running, as shown below:

jag@techmocha1:~$ sudo docker ps --filter "name=opennti_input_oc"

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                      NAMES
40e9ac626ffc        opennti_input-oc    "/source/start-input-"   3 days ago          Up 26 minutes       0.0.0.0:50051->50051/udp   opennti_input_oc

Next, we locate the Telegraf configuration file called “telegraf.tmpl“, which can be found in the “open-nti/plugins/input-oc” directory, as shown below:

jag@techmocha1:~$ cd ~/open-nti/plugins/input-oc
jag@techmocha1:~$ pwd
/root/open-nti/plugins/input-oc

jag@techmocha1:~$ ls -l
-rw-r--r-- 1 root root  574 Feb  6 08:37 Dockerfile
-rw-r--r-- 1 root root  334 Feb  6 08:37 start-input-oc.sh
-rw-r--r-- 1 root root 6138 Feb  6 08:37 telegraf.tmpl

Telegraf is a plugin-driven application, where input plugins are used to ingest data from external sources and output plugins are used to write/persist the data to various destinations. Within the Telegraf configuration file, there are a couple of sections to take note of:

  1. A section called “OUTPUT PLUGINS“, where we define the specifics about the InfluxDB time-series database to which we will persist our telemetry data to. This section can left untouched with the default values for the various parameters.
  2. A section called “INPUT PLUGINS“, where we define the specifics about the “jti_openconfig_telemetry” plugin used to initiate telemetry subscription requests and ingest the incoming data via gRPC. It is in this section where we need to make some changes.

Open the telegraf.tmpl configuration file, using a text editor of your preference, and scroll down to the “INPUT PLUGINS” section, as shown below.  


###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

# Read OpenConfig Telemetry from listed sensors

[[inputs.jti_openconfig_telemetry]]

  ## List of device addresses to collect telemetry from.
  servers = ["10.102.183.59:50051"]

  ## Frequency to get data, eg. "5s" or "5000ms".
  sample_frequency = "5000ms"

  ## Sensors to subscribe for
  ## A identifier for each sensor can be provided in path by separating with space
  ## Else sensor path will be used as identifier
  ## When identifier is used, we can provide a list of space separated sensors.
  ## A single subscription will be created with all these sensors and data will
  ## be saved to measurement with this identifier name
  ## We allow specifying sensor group level reporting rate. To do this, specify the
  ## reporting rate in Duration at the beginning of sensor paths / collection
  ## name. For entries without reporting rate, we use configured sample frequency
  sensors = [
   "/junos/system/linecard/interface/",
   "10000ms /junos/system/linecard/cpu/memory/"
  ]

  ## Each data format has it's own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  ##data_format = "influx"

The four parameters that need to be configured are:

  1.  servers:  Here, we specify the management IP address and port number (always 50051) of the target Junos device(s) that we will be monitoring. Although a single device is shown in the example, multiple devices can be specified simply by listing them in a comma-separated list (eg. [“1.1.1.1:50051”, “2.2.2.2:50051”]).
  2.  sample_frequency:  This parameter specifies the frequency with which the telemetry data is obtained. The frequency can be specified in either seconds (eg. “5s”) or in milliseconds (eg. “5000ms”).
  3.  sensors:  This is where we list all the sensors we wish to subscribe to.  Note that the sensors are listed in a string array format.  In the example above, we are subscribing to two sensors: “/junos/system/linecard/interface/” and “/junos/system/linecard/cpu/memory/“. Note that if desired, individual sampling frequencies can be specified for each sensor simply by specifying the reporting rate (in seconds or milliseconds) before the sensor path, followed by a space. In the example above, we are subscribing to the “/junos/system/linecard/cpu/memory/” sensor with 10s frequency, and subscribing to the “/junos/system/linecard/interface/” sensor with the default 5s frequency, as specified by the “sample_frequency” parameter.
  4.  data_format:  Note that this is a deprecated parameter used in older versions of the plugin. As shown in the example above, please make sure to comment this parameter out (using “##”) to prevent errors when the OpenNTI “input-oc” container is started.

It is important to note that if you want to subscribe to different telemetry sensors across different baskets of devices, then each of these baskets needs to be configured in its own dedicated “[[inputs.jti_openconfig_telemetry]]” section within the Telegraf configuration file.

The final step in the configuration process is to restart the opennti_input_oc Docker container in order for the changes to take effect. We do this by changing our directory to the root OpenNTI location and issue a make restart-oc command, as shown below:

jag@techmocha1:~$ sudo docker restart opennti_input_oc
opennti_input_oc
jag@techmocha1:~$

Verifying Telemetry Data Is Being Streamed

Once the above Telegraf configuration file has been modified and the opennti_input_oc Docker container has been restarted, there are a few quick checks we can perform to verify that the telemetry data is being streamed to the collector.

The first check is to issue the “show agent sensors” command that was introduced in Part 2.  As shown below, we can verify that we have indeed subscribed to the “/junos/system/linecard/interface/” and“/junos/system/linecard/cpu/memory/” sensors.

root@techmocha1> show agent sensors 

Sensor Information : 
    Name                                    : sensor_1000           
    Resource                                : /junos/system/linecard/interface/ 
    Version                                 : 1.1                  
    Sensor-id                               : 2657203               
    Subscription-ID                         : 1000                 
    Parent-Sensor-Name                      : Not applicable       
    Component(s)                            : PFE                   

    Profile Information : 
        Name                                : export_1000           
        Reporting-interval                  : 5                     
        Payload-size                        : 5000                  
        Format                              : GPB                   

Sensor Information : 
    Name                                    : sensor_1001           
    Resource                                : /junos/system/linecard/cpu/memory/ 
    Version                                 : 1.0                  
    Sensor-id                               : 2657202               
    Subscription-ID                         : 1001                 
    Parent-Sensor-Name                      : Not applicable       
    Component(s)                            : PFE                   

    Profile Information : 
        Name                                : export_1001           
        Reporting-interval                  : 5                     
        Payload-size                        : 5000                  
        Format                              : GPB              

A second check we can perform on the router is to check the gRPC subscriptions in the “ephemeral configuration”, ie. the configuration that has been committed to the default instance of the ephemeral configuration database, as shown below:

root@techmocha1> show ephemeral-configuration 

## Last changed: 2017-08-20 18:38:43 PDT
services {
    analytics {
        export-profile export_1000 {
            transport grpc;
        }
        export-profile export_1001 {
            transport grpc;
        }
        sensor sensor_1000 {
            export-name export_1000;
            resource /junos/system/linecard/interface/;
            reporting-rate 5;
            subscription-id 1000;
        }
        sensor sensor_1001 {
            export-name export_1001;
            resource /junos/system/linecard/cpu/memory/;
            reporting-rate 5;
            subscription-id 1001;
        }
    }
}

Querying The Data In InfluxDB

Recall from Part 2 of this blog series, when running a “show measurements” query against the “juniper” database, the root measurement for all subsequent queries is “jnpr.jvision“.  It is important to note that this is only the case for Native format; for OpenConfig format, sensor subscriptions are listed sequentially without a root container, as shown in Figure 2 below.

opennti_part3_pic1
Figure 2:  No Root Container For OpenConfig Sensors

We can drill down into specific telemetry measurements that are part of the sensor(s) we subscribe to.  For example, let’s say we want to list all the “in-unicast-pkts” measurements in the last 5 seconds for a particular device and interface; we would use the following query, the results of which are shown in Figure 3 below.

SELECT “/oc-path/interfaces/interface/counters/in-unicast-pkts”,”/oc-path/interfaces/interface/@name” FROM “/junos/system/linecard/interface/” WHERE device=’10.164.1.82′ AND time > now() – 5s

opennti_part3_pic2
Figure 3: Selecting All In-Unicast-Pkts Measurements in the Last 5 Seconds for a Device & Interface

This completes this blog series on using OpenNTI as a collector for streaming telemetry from Juniper devices.

2 comments

Leave a Reply