Prometheus+Grafana监控安装及配置
前言
篇幅较长,纯手打,如有错误,请留言!
一、Prometheus介绍
官网:https://prometheus.io/docs/introduction/overview/
1.什么是Prometheus?
Prometheus是一个开源监控系统,它前身是SoundCloud的警告工具包。从2012年开始,许多公司和组织开始使用Prometheus。该项目的开发人员和用户社区非常活跃,越来越多的开发人员和用户参与到该项目中。目前它是一个独立的开源项目,且不依赖与任何公司。 为了强调这点和明确该项目治理结构,Prometheus在2016年继Kurberntes之后,加入了Cloud Native Computing Foundation。
2.Prometheus的主要特征
- 多维度数据模型
- 灵活的查询语言
- 不依赖分布式存储,单个服务器节点是自主的
- 以HTTP方式,通过pull模型拉去时间序列数据
- 也通过中间网关支持push模型
- 通过服务发现或者静态配置,来发现目标服务对象
- 支持多种多样的图表和界面展示,grafana也支持它
3.组件
Prometheus生态包括了很多组件,它们中的一些是可选的:
- 主服务Prometheus Server负责抓取和存储时间序列数据
- 客户库负责检测应用程序代码
- 支持短生命周期的PUSH网关
- 基于Rails/SQL仪表盘构建器的GUI
- 多种导出工具,可以支持Prometheus存储数据转化为HAProxy、StatsD、Graphite等工具所需要的数据存储格式
- 警告管理器
- 命令行查询工具
- 其他各种支撑工具
多数Prometheus组件是Go语言写的,这使得这些组件很容易编译和部署。
4.架构
Prometheus服务,可以直接通过目标 拉取 数据,或者间接地通过中间网关拉取数据。它在本地存储抓取的所有数据,并通过一定规则进行清理和整理数据,并把得到的结果存储到新的时间序列中,PromQL和其他API可视化地展示收集的数据。
二、安装及配置
由于环境限制,Prometheus需要通过nginx代理才能访问,所以在Prometheus启动命令上加了参数,还需要配置nginx反向代理,详情如下:
1.安装Prometheus
[root@master supp_app]# wget https://github.com/prometheus/prometheus/releases/tag/v3.4.1/prometheus-3.4.1.linux-amd64.tar.gz
[root@master supp_app]# tar zxf prometheus-3.4.1.linux-amd64.tar.gz
[root@master supp_app]# cd prometheus-3.4.1.linux-amd64/
2.文件介绍
[root@master prometheus-3.4.1.linux-amd64]# ls -lrt
total 294972
-rwxr-xr-x 1 1001 docker 154802412 May 31 18:46 prometheus ## Prometheus启动文件
-rwxr-xr-x 1 1001 docker 146211128 May 31 18:46 promtool ## Prometheus工具
-rw-r--r-- 1 1001 docker 3773 May 31 18:58 NOTICE ## 注意事项
-rw-r--r-- 1 1001 docker 11357 May 31 18:58 LICENSE ## 许可证
-rw-r--r-- 1 1001 docker 1877 Aug 22 17:47 prometheus.yml ## Prometheus配置文件
drwxr-xr-x 28 root root 4096 Aug 27 15:00 data ## Prometheus自带数据库TSDB的数据目录
3.配置解析
[root@master prometheus-3.4.1.linux-amd64]# cat prometheus.yml
# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"
更详细的配置参考:https://prometheus.io/docs/prometheus/latest/configuration/configuration/
4.自编启动脚本
由于Prometheus没有启动脚本,每次启停都要输入一长串命令,简单写个脚本方便后期操作
[root@master prometheus-3.4.1.linux-amd64]# vim start.sh
#!/bin/bash
# Author:
# Prometheus start scriptPRO_HOME=$(cd "$(dirname "$0")";pwd)usage(){echo "Usage: /bin/bash start.sh OPTION"echo "OPTIONS:start|stop|restart"
}log_time() {local STRING="${1:-"No message provided"}"local TIME="$(date "+%Y-%m-%d %H:%M:%S").$((`date "+%N"`/1000000))"printf "%s--%s\n" "$TIME" "$STRING"
}start(){PID=`cat $PRO_HOME/pro.pid`if [[ -n "$PID" ]];thenlog_time "Prometheus service has been start!"exit 1fi
## 添加了--web.external-url=prometheus参数,因为机器只能开放80端口,用nginx做了反向代理,具体的参数解释看下面解释nohup $PRO_HOME/prometheus --web.external-url=prometheus --config.file=$PRO_HOME/prometheus.yml >> pro.log 2>&1 &sleep 3ps -ef | grep $PRO_HOME | grep -v grep | awk '{print $2}' > pro.pidlog_time "Prometheus service started!"PID1=`cat $PRO_HOME/pro.pid`log_time "Prometheus process pid: $PID1"
}stop(){PID=`cat $PRO_HOME/pro.pid`if [[ -z "$PID" ]];thenlog_time "Prometheus service don't start!"elsekill $PIDecho -n > $PRO_HOME/pro.pidlog_time "Prometheus service stoped!"fi
}case $1 in
start)start;;
stop)stop;;
restart)stopstart;;
*)usage;;
esac
–web.external-url= The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse proxy). Used for generating relative and absolute links back to Prometheus itself. If the URL has a path portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted, relevant URL components will be derived automatically.
5.启动服务
[root@master prometheus-3.4.1.linux-amd64]# sh start.sh start
2025-08-27 16:30:42.942--Prometheus service started!
2025-08-27 16:30:42.946--Prometheus process pid: 3585681
[root@master prometheus-3.4.1.linux-amd64]# ps -ef | grep 3585681
root 3585681 1 1 16:30 pts/1 00:00:00 /opt/supp_app/prometheus-3.4.1.linux-amd64/prometheus --web.external-url=prometheus --config.file=/opt/supp_app/prometheus-3.4.1.linux-amd64/prometheus.yml
root 3585850 3444503 0 16:31 pts/1 00:00:00 grep --color=auto 3585681
6.配置nginx
[root@master prometheus-3.4.1.linux-amd64]# egrep -v '#|^$' /opt/web_app/nginx-1.16.1/conf/nginx.conf
worker_processes 2;
events {worker_connections 60000;
}
http {include mime.types;default_type application/octet-stream;sendfile on;keepalive_timeout 65;server {listen 80;server_name 127.0.0.1;## 添加下面这段locationlocation /prometheus/ {proxy_pass http://localhost:9090/prometheus/;}location /jenkins {proxy_pass http://localhost:8080;proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;proxy_set_header X-Forwarded-Proto $scheme;}location /status {stub_status;allow 127.0.0.1;deny all;}error_page 500 502 503 504 /50x.html;location = /50x.html {root html;}}
}
重启nginx
[root@master prometheus-3.4.1.linux-amd64]# /opt/web_app/nginx-1.16.1/sbin/nginx -t
nginx: the configuration file /opt/web_app/nginx-1.16.1/conf/nginx.conf syntax is ok
nginx: configuration file /opt/web_app/nginx-1.16.1/conf/nginx.conf test is successful
[root@master prometheus-3.4.1.linux-amd64]# /opt/web_app/nginx-1.16.1/sbin/nginx -s reload
7.浏览器打开Prometheus
三、安装exporter
本次需要安装redis_exporter、node_exporter、nginx_exporter,这几个exporter都可以在Prometheus官网找到。
官网链接:https://prometheus.io/docs/instrumenting/exporters/
1.下载tar包
[root@master supp_app]# wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz
[root@master supp_app]# wget https://github.com/oliver006/redis_exporter/releases/download/v1.69.0/redis_exporter-v1.69.0.linux-amd64.tar.gz
[root@master supp_app]# wget https://github.com/nginx/nginx-prometheus-exporter/releases/download/v1.3.0/nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz
[root@master supp_app]# tar zxf nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz
[root@master supp_app]# tar zxf node_exporter-1.9.1.linux-amd64.tar.gz
[root@master supp_app]# tar zxf redis_exporter-v1.69.0.linux-amd64.tar.gz
2.启动exporter
启动nginx_exporter
[root@master nginx_exporter]# nohup ./nginx-prometheus-exporter -nginx.scrape-uri http://127.0.0.1/status &
[root@master nginx_exporter]# ps -ef | grep nginx-pro
root 481411 1 0 Jul08 ? 00:07:07 /opt/supp_app/nginx_exporter/nginx-prometheus-exporter -nginx.scrape-uri http://127.0.0.1/status
root 3605013 3444503 0 17:23 pts/1 00:00:00 grep --color=auto nginx-pro
启动redis_exporter
[root@master redis_exporter-v1.69.0.linux-amd64]# nohup ./redis_exporter --web.listen-address=localhost:9121 --exclude-latency-histogram-metrics &
[root@master redis_exporter-v1.69.0.linux-amd64]# ps -ef | grep redis_expor
root 3081020 1 0 Aug22 ? 00:13:33 ./redis_exporter --web.listen-address=localhost:9121 --exclude-latency-histogram-metrics
root 3614478 3444503 0 17:49 pts/1 00:00:00 grep --color=auto redis_expor
启动node_exporter
[root@master node_exporter-1.9.1.linux-amd64]# nohup /opt/supp_app/node_exporter-1.9.1.linux-amd64/node_exporter >> node_exporter.log 2>&1 &
[root@master node_exporter-1.9.1.linux-amd64]# ps -ef | grep node_exporter
root 3229621 1 0 Jun19 ? 01:15:01 /opt/supp_app/node_exporter-1.9.1.linux-amd64/node_exporter
root 3616041 3444503 0 17:53 pts/1 00:00:00 grep --color=auto node_exporter
四、配置Prometheus采集数据
1.修改配置
在prometheus.yml文件里面加入如下配置:
[root@master prometheus-3.4.1.linux-amd64]# cat prometheus.yml
# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.metrics_path: '/prometheus/metrics'static_configs:- targets: ['localhost:9090']# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"- job_name: "node"static_configs:- targets: ['localhost:9100']- job_name: "nginx_exporter"static_configs:- targets: ["localhost:9113"]- job_name: 'redis_exporter_targets'file_sd_configs:- files:- targets-redis-instances.jsonmetrics_path: /scraperelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: "localhost:9121"- job_name: redis_exporterstatic_configs:- targets: ['localhost:9121']
2.重启Prometheus
这里我们用刚才写好的启动脚本直接重启
[root@master prometheus-3.4.1.linux-amd64]# sh start.sh restart
2025-08-27 17:57:45.830--Prometheus service stoped!
2025-08-27 17:57:48.860--Prometheus service started!
2025-08-27 17:57:48.865--Prometheus process pid: 3617592
3.通过浏览器查看exporter状态
state全是up就没有问题
4.查看监控指标
可以使用curl命令在服务器上查看各exporter的endpoint
[root@master web_app]# curl http://localhost:9100/metrics > nginx-metric% Total % Received % Xferd Average Speed Time Time Time CurrentDload Upload Total Spent Left Speed
100 67496 0 67496 0 0 5992k 0 --:--:-- --:--:-- --:--:-- 5992k
[root@master web_app]# head -20 nginx-metric
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.0135e-05
go_gc_duration_seconds{quantile="0.25"} 3.8797e-05
go_gc_duration_seconds{quantile="0.5"} 4.7997e-05
go_gc_duration_seconds{quantile="0.75"} 5.0852e-05
go_gc_duration_seconds{quantile="1"} 8.6899e-05
go_gc_duration_seconds_sum 12.599998827
go_gc_duration_seconds_count 284197
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 9.223372036854776e+18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
也可以在浏览器选择一个metric查看数据
五、Grafana介绍
官网:https://grafana.com/docs/grafana/latest/introduction/grafana-enterprise/
1.什么是grafana
Grafana是一款开源的数据可视化与监控分析平台,主要用于连接多种数据源(如Prometheus、InfluxDB等),通过交互式仪表盘实时展示和分析时序数据,广泛应用于IT运维、云服务监控等领域。
2.核心功能
多数据源支持:
- 支持Prometheus、InfluxDB、Elasticsearch、MySQL等30+数据源,无需迁移数据即可统一分析。
- 通过插件扩展兼容性,例如集成AWS、腾讯云等云服务监控数据。
动态仪表盘:
- 提供拖拽式编辑器,可自定义折线图、热力图等10+图表类型,并支持时间范围筛选、阈值告警等交互功能。
- 支持团队协作共享仪表盘,设置细粒度权限控制。
告警与自动化:
- 可配置条件触发告警,通过邮件、Slack等渠道通知,并与PagerDuty等运维工具联动。
3.主要用途
- 系统监控:结合Prometheus监控服务器、容器集群状态,实时展示CPU、内存等指标。
- 业务分析:对接MySQL或日志服务,可视化用户增长、交易数据等业务指标。
- 云原生集成:作为Kubernetes、OpenTelemetry等云原生技术的标准观测工具。
六、安装Grafana
1.下载
[root@master grafana]# wget https://dl.grafana.com/enterprise/release/grafana-enterprise-12.0.2.linux-amd64.tar.gz
[root@master grafana]# tar zxf grafana-enterprise-12.0.2.linux-amd64.tar.gz
2.修改配置文件
因为环境限制,需要nginx做反向代理,所以修改grafana.ini,大概在58行的位置
4950 # The public facing domain name used to access grafana from a browser51 domain = localhost5253 # Redirect to correct domain if host header does not match domain54 # Prevents DNS rebinding attacks55 enforce_domain = false5657 # The full public facing url58 root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana/5960 # Serve Grafana from subpath specified in `root_url` setting. By default it is set to `false` for compatibility reasons.61 serve_from_sub_path = true6263 # Log web requests64 router_logging = false6566 # the path relative working pathroot_url
2.配置启动文件
这里是把grafana注册成系统服务了
[root@master grafana]# cp -r grafana-v12.0.2/ /usr/local/grafana/
[root@master grafana]# vim /etc/systemd/system/grafana-server.service
[Unit]
Description=Grafana Server
After=network.target[Service]
Type=simple
User=grafana
Group=users
ExecStart=/usr/local/grafana/bin/grafana server --config=/usr/local/grafana/conf/grafana.ini --homepath=/usr/local/grafana
Restart=on-failure[Install]
WantedBy=multi-user.target
3.启动grafana
[root@master grafana]# systemctl start grafana-server
[root@master grafana]# systemctl status grafana-server
● grafana-server.service - Grafana ServerLoaded: loaded (/etc/systemd/system/grafana-server.service; disabled; vendor preset: disabled)Active: active (running) since Mon 2025-07-07 17:37:12 CST; 1 months 20 days agoMain PID: 291119 (grafana)Tasks: 23 (limit: 11715)Memory: 173.1MCGroup: /system.slice/grafana-server.service└─291119 /usr/local/grafana/bin/grafana server --config=/usr/local/grafana/conf/grafana.ini --homepath=/usr/local/grafanaAug 27 18:29:24 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:29:24.9597876+08:00 level=info msg="Request >
Aug 27 18:29:39 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:29:39.987415511+08:00 level=info msg="Reques>
Aug 27 18:29:42 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:29:42.96207908+08:00 level=info msg="Request>
Aug 27 18:29:52 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:29:52.907149699+08:00 level=info msg="Reques>
Aug 27 18:30:10 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:30:10.90623326+08:00 level=info msg="Request>
Aug 27 18:30:13 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:30:13.919338485+08:00 level=info msg="Reques>
Aug 27 18:30:19 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:30:19.91992532+08:00 level=info msg="Request>
Aug 27 18:30:25 master grafana[291119]: logger=context userId=1 orgId=1 uname=admin t=2025-08-27T18:30:25.917424907+08:00 level=info msg="Reques>
4.修改nginx配置
前面提到,grafana需要用到反向代理,这里用的nginx跟Prometheus是同一个nginx
[root@master conf]# !egrep
egrep -v '#|^$' /opt/web_app/nginx-1.16.1/conf/nginx.conf
worker_processes 2;
events {worker_connections 60000;
}
http {include mime.types;default_type application/octet-stream;sendfile on;keepalive_timeout 65;server {listen 80;server_name 127.0.0.1;##添加下面这段配置location /grafana/ {proxy_pass http://localhost:3000/grafana/;proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;proxy_set_header X-Forwarded-Proto $scheme;}location /prometheus/ {proxy_pass http://localhost:9090/prometheus/;}location /jenkins {proxy_pass http://localhost:8080;proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;proxy_set_header X-Forwarded-Proto $scheme;}location /status {stub_status;allow 127.0.0.1;deny all;}error_page 500 502 503 504 /50x.html;location = /50x.html {root html;}}
}
重启nginx
[root@master conf]# /opt/web_app/nginx-1.16.1/sbin/nginx -t
nginx: the configuration file /opt/web_app/nginx-1.16.1/conf/nginx.conf syntax is ok
nginx: configuration file /opt/web_app/nginx-1.16.1/conf/nginx.conf test is successful
[root@master conf]# /opt/web_app/nginx-1.16.1/sbin/nginx -s reload
七、配置Grafana Dashboard
1.登录grafana
浏览器输入:http://IP/grafana
账号:admin
密码:admin
账密可以修改
2.添加数据源
3.添加Dashboard
grafana官网提供了一些Dashboard Template可供参考
Dashboard地址:https://grafana.com/grafana/dashboards/?plcmt=oss-nav
本次测试我们选的模板ID分别是:
node dashboard ID:1860
redis dashboard ID:18345
nginx dashboard ID:14900
导入dashboard,如果网络不允许也可以在官网下载到本地,通过本地导入
4.查看整体效果
redis和nginx添加dashboard的步骤都一样,输入对应的ID即可,下面分别展示各监控项的大盘
node监控大盘
redis监控大盘
nginx监控大盘