Prometheus is an open-source monitoring and alerting toolkit originally built by SoundCloud in 2012. It has since become a popular choice for monitoring applications and infrastructure in the cloud native ecosystem.
Key Features:
- Multi-dimensional data model: Prometheus stores metrics as time series data, allowing for efficient querying and analysis.
- Flexible query language: PromQL (Prometheus Query Language) enables users to query and aggregate data in real-time.
- Scalable and efficient: Designed to handle large volumes of data and scale horizontally.
- Active community: Prometheus is a Cloud Native Computing Foundation (CNCF) project, ensuring ongoing development and support.
Common Use Cases:
- Monitoring application performance and latency
- Tracking system resource utilization (e.g., CPU, memory, disk usage)
- Alerting and notification based on custom thresholds and conditions
Architecture Components:
- Prometheus Server: scrapes and stores metrics
- Exporters: expose metrics from applications and systems
- Alertmanager: handles alerts and notifications
1) Install Prometheus
cd /opt/
sudo wget https://github.com/prometheus/prometheus/releases/download/v2.2.1/prometheus-2.2.1.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-2.2.1.linux-amd64/
Quick start for testing:
sudo ./prometheus --web.enable-admin-api --web.enable-lifecycle &
We shall set it up as a systemd service, so you can kill the process for now:
ps -ae | grep "prometheus" # Get PID
kill -9 <pid>
2) Install Blackbox Exporter
sudo wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.12.0/blackbox_exporter-0.12.0.linux-amd64.tar.gz
tar xvfz blackbox_exporter-0.12.0.linux-amd64.tar.gz
cd blackbox_exporter-0.12.0.linux-amd64/
Quick start for testing:
sudo ./blackbox_exporter &
We shall set it up as a systemd service, so you can kill the process for now:
ps -ae | grep "blackbox" # Get PID
kill -9 <pid>
3) Install Alert Manager
sudo wget https://github.com/prometheus/alertmanager/releases/download/v0.15.3/alertmanager-0.15.3.linux-amd64.tar.gz
tar -xvfz alertmanager-0.15.3.linux-amd64.tar.gz
cd alertmanager-0.15.3.linux-amd64.tar.gz/
Quick start for testing:
sudo ./alertmanager --config.file=alertmanager.yml --web.external-url=http://insights.xyz.com:9093
ps -ae | grep "prometheus" # Get PID
kill -9 <pid>
sudo ./prometheus --web.enable-admin-api --web.enable-lifecycle --web.external-url=http://insights.xyz.com:9090 &
We shall set it up as a systemd service, so you can kill the process for now:
ps -ae | grep "alertmanager" # Get PID
kill -9 <pid>
Sample Config Files
blackbox.yml
modules:
http_2xx:
prober: http
http:
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
alert.rules.yml
groups:
- name: alert.rules
rules:
- alert: EndpointDown
expr: probe_success
for: 10m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: '<ip-address-or-domain>:25'
smtp_from: '[email protected]'
smtp_require_tls: false
route:
group_by: ['severity']
group_wait: 5m
group_interval: 5m
repeat_interval: 12h
receiver: xyz-notify
receivers:
- name: 'xyz-notify'
email_configs:
- to: '[email protected]'
send_resolved: true
prometheus.yml
# my global config
global:
scrape_interval: 30s # Set the scrape interval to every 30 seconds. Default is every 1 minute.
evaluation_interval: 30s # Evaluate rules every 30 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- xyz.com:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
- alert.rules.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'blackbox_http'
scrape_interval: 30s
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://xyz.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port
- job_name: 'blackbox_icmp'
scrape_interval: 30s
metrics_path: /probe
params:
module: [icmp] # ICMP.
static_configs:
- targets:
- xyz.com
#- subdomain1.xyz.com
#- <ip-address>
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.
- job_name: 'blackbox_tcp_connect'
scrape_interval: 30s
metrics_path: /probe
params:
module: [tcp_connect] # TCP Ping
static_configs:
- targets:
#- mysql.xyz.com:3306 # MySQL
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.
Setting up Prometheus as a systemd service – to enable automatic startup on system boot
Prometheus-Blackbox Exporter
sudo vim /etc/systemd/system/prometheus-blackbox.service
[Unit]
Description=Prometheus Blackbox Exporter
Wants=network-online.target data.mount prometheus.service
After=prometheus.service
[Service]
WorkingDirectory=/opt/prometheus-2.2.1.linux-amd64/blackbox_exporter-0.12.0.linux-amd64
ExecStart=/opt/prometheus-2.2.1.linux-amd64/blackbox_exporter-0.12.0.linux-amd64/blackbox_exporter
Type=simple
User=root
[Install]
WantedBy=multi-user.target
# Start the service
systemctl start prometheus-blackbox.service
# Check the service status
systemctl status prometheus-blackbox.service
# Enable the service to automatically startup at boot
systemctl enable prometheus-blackbox.service
Prometheus-Alert Manager
sudo vim /etc/systemd/system/prometheus-alertmanager.service
[Unit]
Description=Prometheus Alert Manager
Wants=network-online.target data.mount
After=network-online.target data.mount
[Service]
WorkingDirectory=/opt/prometheus-2.2.1.linux-amd64/alertmanager-0.15.3.linux-amd64
ExecStart=/opt/prometheus-2.2.1.linux-amd64/alertmanager-0.15.3.linux-amd64/alertmanager --config.file=alertmanager.yml --web.external-url=http://insights.xyz.com:9093
Type=simple
User=root
[Install]
WantedBy=multi-user.target
# Start the service
systemctl start prometheus-alertmanager.service
# Check the service status
systemctl status prometheus-alertmanager.service
# Enable the service to automatically startup at boot
systemctl enable prometheus-alertmanager.service
Prometheus
sudo vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Wants=network-online.target data.mount prometheus-alertmanager.service
After=network-online.target data.mount prometheus-alertmanager.service
[Service]
WorkingDirectory=/opt/prometheus-2.2.1.linux-amd64
ExecStart=/opt/prometheus-2.2.1.linux-amd64/prometheus --web.enable-admin-api --web.enable-lifecycle --web.external-url=http://insights.xyz.com:9090
Type=simple
User=root
[Install]
WantedBy=multi-user.target
# Start the service
systemctl start prometheus.service
# Check the service status
systemctl status prometheus.service
# Enable the service to automatically startup at boot
systemctl enable prometheus.service