All Courses
Prometheus Toolkit

Introduction to Prometheus Toolkit in DevOps

Prometheus is an open-source system monitoring and alerting toolkit originally developed in SoundCloud. Now it is a standalone open-source project. Prometheus joins Cloud Native
Computing Foundation in 2016 as the second host project after Kubernetes.

Features of Prometheus Toolkit

  1. A multi-dimensional data model with time series data identified by metric name and key/value pairs
  2. PromQL, a flexible query language to take advantage of this dimension
  3. No reliance on distributed storage. Individual server nodes are autonomous
  4. Time series collection is done using a pull model over HTTP
  5. Time series push is supported through the intermediate gateway
  6. Targets are discovered via service discovery or static configuration
  7. Supports multiple graphs and dashboard modes

Components of Prometheus Toolkit

The Prometheus ecosystem consists of several components, many of which are optional:

  1. Prometheus main server to scrape and store time-series data
  2. Client library for instrumenting application code
  3. Push gateway to support short-lived jobs
  4. Special-purpose exporter for services such as HAProxy, StatsD, Graphite, and so on…
  5. Alertmanager for handling alerts
  6. Various support tools.

The architecture of Prometheus Toolkit

Prometheus Toolkit - architecture

Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediate push gateway for short-term work. Save all the samples you get locally and run rules against that data to aggregate and record new time series from existing data or generate alerts. You can use Grafana or other API consumers to visualize the collected data.

When does Prometheus Toolkit fit?

  • Prometheus is suitable for recording purely numerical time series.
  • Suitable for both machine-centric monitoring and highly dynamic service-oriented architecture monitoring.
  • In the world of microservices, multidimensional data collection and queries are a particular strength.
  • Prometheus is designed to provide reliability as a system to switch when stopped and you can quickly diagnose the problem.
  • Each Prometheus server is self-contained and independent of network storage and more remote services.
  • If the rest of your infrastructure fails and does not fail, you can trust it, and to use it, you need not set up a large infrastructure.

When does Prometheus Toolkit not fit?

  • Prometheus values reliability.
  • You can always view what statistics are available about your system, even under failure conditions.
  • If you need 100% accuracy, such as for per-request billing, Prometheus is not a good choice as the collected data will likely not be detailed and complete enough. In such a case you would be best off using some other system to collect and analyze the data for billing, and Prometheus for the rest of your monitoring.

Prometheus Toolkit Installation

Prometheus is a monitoring platform that collects metrics from monitored targets through scraping metrics the HTTP endpoints for these targets.
Download and extract the latest version of Prometheus for your platform:

https://prometheus.io/download/

Download Prometheus:

curl -LO https://github.com/prometheus/prometheus/releases/download/v2.9.2/prometheus-
2.9.2.linux-amd64.tar.gz

Then use the sha256sum command to generate a checksum for the downloaded file.

$ sha256sum prometheus-2.9.2.linux-amd64.tar.gz
Download Prometheus Toolkit

then extract tar file

$ tar xvf prometheus-2.9.2.linux-amd64.tar.gz

This will create a directory named Prometheus-2.9.2.linux-amd64 containing the two binaries. (Prometheus and promtool), consoles and console_libraries directories including the web Interface files, licenses, notifications, and some sample files.

 create a directory

Copy the two binaries to the /usr/local/bin directory.

$ sudo cp prometheus-2.9.2.linux-amd64/prometheus /usr/local/bin/
$ sudo cp prometheus-2.9.2.linux-amd64/promtool /usr/local/bin/

Configuring Prometheus Toolkit

Open the prometheus.yml file that contains enough information to run Prometheus for the first time.

Prometheus Toolkit configuration

The global settings define the default interval for scraping metrics. Beware of Prometheus will apply these settings to all exporters unless the individual exporter’s own settings override the global settings.

This scrape_interval value tells Prometheus to collect metrics from its exporters every 15
seconds, which is long enough for most exporters.
The evaluation_interval option controls how often Prometheus will evaluate rules. Prometheus
uses rules to create new time series and to generate alerts.
The rule_files block specifies the location of any rules we want the Prometheus server to load. For now, we’ve got no rules.

The final block, scrap_configs, controls the resources that Prometheus monitors. Since then Prometheus also exposes data about itself as an HTTP endpoint that can be evaluated and monitored its own health. In the default configuration, there is a single job called Prometheus that scrapes the time-series data provided by the Prometheus server. The job contains a single statically configured target, the localhost on port 9090. Prometheus expects metrics to be available on the target in the path of /metrics. Therefore, this default job scrapes through the URL.
http: // localhost: 9090 / metrics.

The time-series data returned shows the health and performance of the Prometheus server.

Starting Prometheus Toolkit

~/prometheus-2.9.2.linux-amd64# ./prometheus --config.file=prometheus.yml
Starting Prometheus Toolkit

Prometheus should start. You also need to be able to go to the status page about itself http://localhost:9090. It takes about 30 seconds to collect data about itself from its HTTP metrics endpoint.

You can also go to Prometheus to see if Prometheus provides metrics about itself by navigating to its own metrics endpoint: http://localhost:9090/metrics.

Using the expression browser

As you can gather from http://localhost:9090/metrics, one metric that Prometheus exports
about itself is called promhttp_metric_handler_requests_total (the total number of /metrics
requests the Prometheus server has served). Go ahead and enter this into the expression
console

promhttp_metric_handler_requests_total

You can use this query if you are only interested in requests with HTTP code 200. To get this information:

promhttp_metric_handler_requests_total{code="200"}
To count the number of returned time series, you could write:
count(promhttp_metric_handler_requests_total)

Using the graphing interface

To graph the formula, go to http://localhost:9090/graph and use the “Graph” tab.

For example, enter the following formula to graph the HTTP request rate per second and the self-scraped Prometheus returns status code 200.

rate(promhttp_metric_handler_requests_total{code="200"}[1m])
Prometheus Toolkit using GUI

MONITORING LINUX HOST METRICS WITH THE NODE EXPORTER

Prometheus Node Exporter displays various related hardware and kernels metrics.

Note: Prometheus Node Exporter targets * nix systems, but WMI exporter for windows that serves a similar purpose.

Installing and running the Node Exporter

Prometheus Node Exporter is a single static binary that can be installed via a tarball.

https://prometheus.io/download/#node_exporter download using this link.

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-
0.17.0.linux-amd64.tar.gz

then extract

$ tar xvzf node_exporter-0.17.0.linux-amd64.tar.gz
$ cd node_exporter-0.17.0.linux-amd64/
$ ./node_exporter &

The Node Exporter is running and you should see output similar to the following, indicating that it is displayed metrics for port 9100

Node expoters

Node Exporter metrics

After installing and running Node Exporter, you can verify that the metrics have been exported by cURLing the /metrics endpoint.

$ curl http://localhost:9100/metrics

You will the output like below

Node Exporter Metrics in Prometheus Toolkit

finished…! The node exporter has been successfully installed. Node exporter is now available Metrics that Prometheus can scrape, including the following various system metrics. Output (with node_ prefix). Here’s how to view these metrics (with help and type information).

$ curl http://localhost:9100/metrics | grep "node_"

Configuring your Prometheus instances

Prometheus instances running locally must be properly configured for access Node export metrics. The following scrap_config block (in the prometheus.yml configuration file) tells the Prometheus instance to scrape from the node exporter over the local host: 9100

Configuring Prometheus Toolkit instance

Go to the Prometheus installation directory, modify Prometheus.yml and run it

$ ./prometheus --config.file=./prometheus.yml &

Exploring Node Exporter metrics through the Prometheus expression browser

Prometheus is scraping the metric from a running NodeExporter instance so you can do that Examine these metrics using the Prometheus UI (also known as Expression Browser). Invite create a graph with localhost: 9090 / in your browser and use the main expression bar at the top of the page to enter the expression. The expression bar looks like this

Exploring Node Exporter

Metrics specific to the Node Exporter are prefixed with node_ and include metrics like
node_cpu_seconds_total and node_exporter_build_info.

  • rate(node_cpu_seconds_total{mode=”system”}[1m]) -> The average amount of CPU time spent in system mode, per second, over the last minute (in seconds)
  • node_filesystem_avail_bytes -> The filesystem space available to non-root users (in bytes)
  • rate(node_network_receive_bytes_total[1m]) -> The average network traffic received, per second, over the last minute (in bytes)
  • node_load15 -> load average in 15 seconds
Prometheus Toolkit Metrics
Prometheus Toolkit Metrics

If you want to configure multiple hosts, you can add prometheus.yml as below

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this
config.
- job_name: 'node'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['192.168.x.x:9100']
- targets: ['192.168.x.y:9100']
- targets: ['192.168.x.z:9100']

GRAFANA

Grafana supports Prometheus queries. Includes Prometheus Grafana data source Grafana 2.5.0 (2015-10-28) or later.

Installing Grafana

Download and unzip Grafana from the binary tar.

$ wget https://dl.grafana.com/oss/release/grafana-6.1.6.linux-amd64.tar.gz
$ tar -zxvf grafana-6.1.6.linux-amd64.tar.gz

Start Grafana.

cd grafana-2.5.0/
./bin/grafana-server web

By default, Grafana will be listening on http://localhost:3000. The default login is “admin” /
“admin”.

Creating a Prometheus data source

To create a Prometheus data source:

  1. Click on the Grafana logo to open the sidebar menu.
  2. Then, Click on “Data Sources” in the sidebar.
  3. Click on “Add New”.
  4. Select “Prometheus” as the type.
  5. Set the appropriate Prometheus server URL (for example, http://localhost:9090/)
  6. Adjust other data source settings as desired (for example, turning the proxy access off).
  7. Click “Add” to save the new data source.

The following shows an example data source configuration:

Creating Prometheus Toolkit Data source

If you select Prometheus, the configuration page will be redirected. Add the required details Click Save and Test.

Prometheus Toolkit Configuratin

Creating a Prometheus graph

Follow the standard way to add a new Grafana graph. So:

  1. Go to panel bar click + mark create dashboard.
  2. Check on query.
  3. Enter a query in query panel.
  4. Under the “Metrics” tab, select your Prometheus data source (bottom right).
  5. To format the legend names of time series, use the “Legend format” input. For example,
    to show only the method and status labels of a returned query result, separated by a
    dash, you could use the legend format string {{method}} – {{status}}.
  6. Tune other graph settings until you have a working graph.
Prometheus Toolkit Configuration
Prometheus Toolkit adding query
Prometheus Toolkit - dashboard

Then save the dash board

Prometheus Toolkit saving dashboard

Now if you want node exporter dash boards into Grafana serach in google like “grafana node
e

If you want to search the node exporter dashboard in Grafana, search for “grafana node Exporter Metrics” in Google. You’ll see some default dashboards like the one below.

https://grafana.com/dashboards/405
https://grafana.com/dashboards/1860
https://grafana.com/dashboards/5174
https://grafana.com/dashboards/9096

If you want to add a node exporter dashboard to Grafana, go to the dashboard, click + mark, and click Import.

Import query

Then add the dashboard ID or URL and click Load.

Upload File

On the new page, select the Prometheus data source and click Import.

Select Prometheus Toolkit

And see the dash board like below…

Dashboard

If you need to troubleshoot, add a large file and check the graph …
Run this command on your system. A 10GB file will be created …

fallocate -l 10G gentoo_root.img

Select the graph for the last 5 minutes, go to the Disk Capacity section and see your changes in the Disk Capacity graph.
This basic net / disk information section.

Capacity Graph

Alerting

When an alert changes state it sends out notifications. Each alert rule can have multiple
notifications. But in order to add a notification to an alert rule you first need to add and
configure a notification channel (can be email, Pagerduty or other integration). This is done
from the Notification Channels page.

Supported Notification Types.

  • Email
  • Slack
  • PagerDuty
  • Webhook

Other Supported Notification Channels

Grafana also supports the following notification channels:

HipChat
VictorOps
Sensu
OpsGenie
Threema
Pushover
Telegram
LINE
DingDing

Notification Channel Setup

On the Alarm Notification Channel page, click the New Channel button to go to the next page.
You can configure and setup new notification channels.

Alerting

If you select email as the alert type, you need to configure the SMTP server details inside Grafana/conf folder.

Go to “conf” directory of your Grafana distribution

Open the configuration file as you did the setup by default. Therefore, use “defaults.ini”. Go to SMTP / Email Settings and update your SMTP details. There is fake SMTP the server runs on localhost and port 25. The structure of my “defaults.ini” is as follows:

#################################### SMTP / Emailing #####################
[smtp]
enabled = true
host = localhost:25
user =
# If the password contains # or ; you have to wrap it with triple quotes. Ex
"""#password;"""
password =
cert_file =
key_file =
skip_verify = false
from_address = admin@grafana.localhost
from_name = Grafana
ehlo_identity =

Add SMTP validation such as username and password host configuration. After confirmation
When you’re done, you’ll need to restart your Grafana server.
If you’re using Gmail as your SMTP server, use the following configuration

SMTP
$ ./bin/Grafana-server web

Finally, now ready to configure email alert type.

Notification Settings

Enter the appropriate values ​​in the Alert Name and Alert Type fields and add your email address
Numbers separated by ';' …. etc. Finally, click the Submit Test button and confirm the following:
Whether you received the email.

Enable images in notifications

Grafana can render the panel associated with the alert rule and include it in the notification. Most notification channels require this image to be published (Slack and PagerDuty for example). To include the image in the alert, Grafana can upload the image to the image store. It currently supports Amazon S3 and Webdav. So to set it you need to configure the external image uploader in the Grafana-server ini configuration file.

Currently, if no external image storage is specified, only the email channel will attach the image. If you want to include images in alerts for other channels, you need to set up an external Image storage channel.

And now test email alert with real example….

Goto dashboard panel and select a graph which we no need add alert.
And go and edit

Edit

Clicking Edit will redirect you to the Configuration tab as shown below.

Alert

Click the bell icon to add an alert setting. Then the Configuration tab is displayed as follows:

Configuration Tab

Now we need to the rule name and evaluate every ( minutes of time ) for when we need to
send mail in for section (for example sec or min –15s or 30s or 1m )
And need to add condition of threshold value here…
Name: Give a suitable name to this alert.

Evaluate every time: The time interval to evaluate this rule. “H” means hours, “m” means minutes and “s” means seconds.

I use 5s (5 seconds). Production environments should set this to a higher value.

In the Conditions section, click the function name to change it to the function you want. We are using “Avg()” to validate the rule against average disk usage.

In function “query (A, 5m, now)”.

The first argument (A) is the query id to be used form all the queries which we have configured
in “Metrics” tab. As we have the usage average query in section “A”.

Second and third arguments define the time range. Our query means 5 minutes come back from now.

You can validate the rule by clicking Test Rule. The rule displays an error if it contains one. My final setup is as follows:

And write meaning full message in the message box and add the appropriate channel to send alert notification.

To warn you of specific thresholds for disk usage, configure as follows:

Disk Usage

Save when the configuration is complete.
In this example, use the following command to create a large file on the file system.

$ fallocate -l 25G gentoo_root.img

Finally, You have now configured 50% of your disk usage. Beyond that, a warning will be displayed as graphical notification.

Graphic Notification
Grafana