#088 Prometheus
Learning the basics of Prometheus and running a demo with Docker and a Sinatra Ruby target application.
Notes
Prometheus is an open-source systems monitoring and alerting toolkit.
the following was written with Prometheus 2.13.0 compiled with go 1.13.1.
Architecture
- the main Prometheus server scrapes and stores time series data
- metrics are pulled from instrumented jobs, either directly or via an intermediary push gateway
Configuration
Configuring Prometheus is done with a YAML file. A minimal configuration will usually include:
global
configuration optionsscrape_configs
defines targets/service discovery
Service Discovery
The Service Discovery component of Prometheus implements the mechanisms for configuring scrape targets.
The basic file_sd_config is most commonly used, unless you are working with a system that has a supported Service Descovery component such as kubernetes_sd_config or ec2_sd_config.
Storage
Sizing: on average, Prometheus uses only around 1-2 bytes per sample. Rough formula:
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
Default storage is local file-based time series database, so limited by space and reliability of local storeage.
The remote write and remote read features of Prometheus allow transparently sending and receiving samples. This is primarily intended for long term storage. It is recommended that you perform careful evaluation of any solution in this space to confirm it can handle your data volumes.
Remote storage integrations allow for:
- write samples that it ingests to a remote URL in a standardized format.
- read (back) sample data from a remote URL in a standardized format
Remote Endpoints and Storage r/w options include:
- Azure Data Explorer: read and write
- Cortex: read and write
- CrateDB: read and write
- InfluxDB: read and write
- IRONdb: read and write
- M3DB: read and write
- PostgreSQL/TimescaleDB: read and write
- Splunk: read and write
- TiKV: read and write
Write-only options add intersting capabilities like global query view, downsampling,
- thanos - global query, unlimited retention, downsampling and compaction (open source)
Metrics
Exposition Formats describes the text format for metrics. Usually these will be generated by an exporter, but it is possible to create by hand.
Each metric is of the form:
# HELP help text to go with the metric
# TYPE metric_name ( counter | gauge | histogram | summary | untyped)
metric_name [
"{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
] value [ timestamp ]
Minimalist:
metric_without_timestamp_and_labels 12.47
metric_with_labels_and_timestamp{method="get",code="200"} 1027 1395066363000
Types:
- counter: cumulative metric; value can only increase or be reset to zero on restart
- gauge: single numerical value that can arbitrarily go up and down
- histogram: samples observations (e.g.request durations or response sizes) and counts them in configurable buckets. It also provides a sum and count of all observed values
- summary: Similar to a histogram, it also calculates configurable quantiles over a sliding time window
Counters
Prometheus will not allow counters to decrement. I’ve yet to understand exactly why this must be a constraint.
Visualization
Options:
- built-in expression browser with basic graphic capability - more for ad-hoc queries
- create console templates of pre-defined metrics and graphs
- run a grafana front-end
- send data to an external service that supports visualization, like wavefront.com
- use the API to query data for use in custom solution
PromQL Queries
The built-in expression browser uses PromQL for querying Prometheus.
http://0.0.0.0:9090/graph?g0.range_input=1h&g0.expr=http_requests_total%7Bjob%3D%22demo_targets%22%7D&g0.tab=0
HTTP API
See API docs
Demo
The demonstration setup has:
- Prometheus running in a Docker container, with file-based service discovery
- a “fake” target application built with Sinatra, also running in Docker
Prometheus Demoserver Configuration
demoserver/prometheus.yml
- configuration filedemoserver/demo_targets.json
- defines service targets, mapped from local file system into the docker container
Configuring Targets
The Sinatra Fake Target (see below) will run on the host IP, and this
needs to be manually updated in demo_targets.json
(I haven’t bothered automating this).
First, find your IP e.g.
$ ipconfig getifaddr en0
192.168.1.144
Then update demo_targets.json
(the sinatra app runs on port 4567 by default):
...
"targets": [
"192.168.1.144:4567"
],
...
Prometheus Demoserver Control
The prometheus_control.sh
script is a simple wrapper for the main operations:
Startup:
$ ./prometheus_control.sh start
Stopping/removing any previous docker container..
demoserver
demoserver
Starting prometheus container with web access on port 9090..
dd324bb7306a45a0b7d59b6ce4da7416984140188efdff9ec42acdd272029ec9
Shutdown:
$ ./prometheus_control.sh stop
Stopping/removing any previous docker container..
demoserver
demoserver
Sinatra Fake Target
The sinatra_target
folder has a simple Sinatra (ruby) application that acts as a fake target for Prometheus.
It has a /metrics endpoint that emits some counters and gauges based on GC stats.
It’s OK to run the app locally if you have a working ruby/rvm environment:
$ cd sinatra_target
$ gem install bundler
$ bundle install
$ ruby app.rb
== Sinatra (v2.0.1) has taken the stage on 4567 for development with backup from Thin
Thin web server (v1.7.2 codename Bachmanity)
Maximum connections set to 1024
Listening on 0.0.0.0:4567, CTRL+C to stop
However for the demo, I’m running it in Docker.
Build the image (needs to be repeated if the application code is changed):
$ docker-compose build
(...etc...)
Successfully built 6ed5b0733dc3
Successfully tagged sinatra_target_web:latest
Startup:
$ docker-compose up
Creating network "sinatra_target_default" with the default driver
Creating sinatra_target_web_1 ... done
Attaching to sinatra_target_web_1
web_1 | == Sinatra (v2.0.1) has taken the stage on 4567 for development with backup from Thin
..
Or to run in the background:
$ docker-compose up -d
Starting sinatra_target_web_1 ... done
$ docker-compose logs
Attaching to sinatra_target_web_1
web_1 | == Sinatra (v2.0.1) has taken the stage on 4567 for development with backup from Thin
$ docker-compose down
Stopping sinatra_target_web_1 ... done
Removing sinatra_target_web_1 ... done
Removing network sinatra_target_default
Testing the Sinatra App
It should be responding with metrics on port 4567:
$ curl http://0.0.0.0:4567/metrics
# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{app="sinatra_target",method="get",code="200"} 2 1573708711850
# HELP gc_heap_slots_count Current GC heap slots.
# TYPE gc_heap_slots_count gauge
gc_heap_slots_count{app="sinatra_target",status="available"} 96191 1573708711850
gc_heap_slots_count{app="sinatra_target",status="live"} 51556 1573708711850
gc_heap_slots_count{app="sinatra_target",status="free"} 44635 1573708711850
gc_heap_slots_count{app="sinatra_target",status="final"} 0 1573708711850
gc_heap_slots_count{app="sinatra_target",status="marked"} 41649 1573708711850
# HELP gc_objects_total Total GC object operations.
# TYPE gc_objects_total counter
gc_objects_total{app="sinatra_target",status="allocated"} 309466 1573708711850
gc_objects_total{app="sinatra_target",status="freed"} 257910 1573708711850
# HELP gc_pages_total Total GC page operations.
# TYPE gc_pages_total counter
gc_pages_total{app="sinatra_target",status="allocated"} 236 1573708711850
gc_pages_total{app="sinatra_target",status="freed"} 0 1573708711850
Browse to http://0.0.0.0:4567 for a simple web interface to the app.
I used sinatra_target/fake_load.sh script to send some load at the server to make GC do some work.
Examining Results in Prometheus
The prometheus web interface is running on http://0.0.0.0:9090/ by default.
Two targets show up - the prometheus server itself, and also the fake sinatra app:
GC Stats - Heap Slots
Heap slots are collected from the Sinatra application as a guage with a status
dimension to distinguish (available | live | free | final | marked).
The guage metric is used, since these values represent the current heap state and can go up and down.
Examining some calculations we can get Prometheus to perform on guage data..
The delta function calculates the difference between the first and last value in each time period, for example over 5m time blocks:
delta(gc_heap_slots_count[5m])
The idelta function calculates the difference between the last two samples, for example over 5m time blocks:
idelta(gc_heap_slots_count[5m])
The deriv function is the per-second derivative of the time series, for example over 5m time blocks:
deriv(gc_heap_slots_count[5m])
The holt_winters function produces a smoothed value for time series, for example over 5m time blocks with smoothing factor 0.2 and trend factor 0.8:
holt_winters(gc_heap_slots_count[5m], 0.2, 0.8)
GC Stats - Objects Total
GC objects collected from the Sinatra application as a counter with a status
dimension to distinguish (allocated | freed).
The counter metric is used, since these values represent the total over time and only go up (at least, for the life of the current process).
Examining some calculations we can get Prometheus to perform on counter data..
The increase function calculates the increase in the time series in the range vector, for example over 5m time blocks:
increase(gc_objects_total[5m])
The rate function calculates the per-second average rate of increase of the time series, for example over 5m time blocks:
rate(gc_objects_total[5m])
The irate function calculates the per-second instant rate of increase of the time series, for example over 5m time blocks:
irate(gc_objects_total[5m])