当前位置：首页 > ops >正文

prometheus+grafana搭建

ops 2025/9/4 11:51:41

部署 prometheus

安装

# 1,下载
wget https://github.com/prometheus/prometheus/releases/download/v2.45.1/prometheus-3.5.0.linux-amd64.tar.gz# 2,部署
tar  -zxvf  prometheus-3.5.0.linux-amd64.tar.gz  -C   /opt/
cd   /opt/
mv  ./prometheus-3.5.0.linux-amd64   prometheus# 3,验证
[root@prometheus prometheus]#  cd   /opt/prometheus
[root@prometheus prometheus]# ./prometheus  --version
prometheus, version 3.5.0 (branch: HEAD, revision: 8be3a9560fbdd18a94dedec4b747c35178177202)build user:       root@4451b64cb451build date:       20250714-16:15:23go version:       go1.24.5platform:         linux/amd64tags:             netgo,builtinassets# 4,配置用户
groupadd  prometheus
useradd  -g  prometheus -s /sbin/nologin prometheus
chown -R  prometheus:prometheus /opt/prometheus/# 5,创建prometheus运行数据目录
mkdir  -p  /opt/prometheus/data
chown -R prometheus:prometheus /opt/prometheus/data

配置文件

[root@prometheus prometheus]# cat prometheus.yml
# my global config
global:scrape_interval: 15s # 默认15s 全局每次数据收集的间隔 minute.evaluation_interval: 15s # 规则扫描时间间隔是15秒，默认不填写是 1分钟 minute.# scrape_timeout is set to the global default (10s). # 超时时间# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: #默认规则# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"

创建脚本systemd管理

vim  /usr/lib/systemd/system/prometheus.service[Unit]
Description=Prometheus
After=network.target[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/data \
--storage.tsdb.retention.time=15d \
--web.console.templates=/opt/prometheus/consoles \
--web.console.libraries=/opt/prometheus/console_libraries \
--web.max-connections=512 \
--web.external-url "http://自己服务器IP地址:9090" \
--web.listen-address "0.0.0.0:9090" \
--web.enable-admin-api \
--web.enable-lifecycle
Restart=on-failure[Install]
WantedBy=multi-user.target

启动验证

systemctl daemon-reload
systemctl enable prometheus
systemctl start  prometheus
systemctl status prometheus
# 查看服务端口
ss -tunlp | grep 9090

访问 http://自己服务器IP:9090

点击Endpoint目标的值，再从exporter具体能抓到的数据，随便复制一个值就好，比如go_gc_pauses_seconds_count

部署node_exporter

Node_exporter收集机器的系统数据，采用prometheus官方提供的exporte

安装

# 安装node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -zxvf node_exporter-1.6.1.linux-amd64.tar.gz -C /opt/
cd /opt/
mv node_exporter-1.6.1.linux-amd64/  node_exporter# 添加用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /opt/node_exporter# 设置开机启动
vim  /lib/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node_exporter
After=network.target prometheus.service[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/node_exporter/node_exporter --web.listen-address=0.0.0.0:9101
Restart=on-failure[Install]
WantedBy=multi-user.target

设置启动服务

systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter

添加 node_exporter到配置文件

cat >> prometheus.yml <<EOF- job_name: 'node'static_configs:- targets: ['IP地址:9101']
EOF

注意：这里添加的一定要在 /opt/prometheus/prometheus.yml文件中操作，否则会导致后续prometheus中没有node节点，grafana表盘中无数据

重启prometheus服务

systemctl restart prometheus.service

验证

查看监控指标 http://IP地址:9101/metrics

这里跟部署prometheus启动验证一样，随机拿一个数据验证。

查看target

部署grafana

安装配置

# 安装
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.2.0-1.x86_64.rpm
yum -y install grafana-enterprise-10.2.0-1.x86_64.rpm# 设置开启自启
systemctl enable grafana-server
systemctl start grafana-server

登录访问

访问：http://IP地址:3000，默认账号/密码：admin/admin，首次登陆需要修改默认的管理员密码

添加数据

这里URL填http:localhost:9090/或者http:IP地址:9090/都行

然后点击保存

导入仪表盘

new->import

这里1是填官方提供的表盘形式，填写11074或者16098都行

这里名字随便填，2默认，3点击后有个prometheus标志出来点击就行。然后import

可能遇到的问题

xshell无法传文件给远程主机

原因：远程服务器中没有安装响应相对应的驱动

解决办法：服务器安转就行

yum install lrzsz

下载超时

原因：服务器访问github比较慢或者访问不上

解决方法：修改网络文件或者直接下载相应的文件然后用xshell等远程工具上传到服务器

grafana仪表盘无数据

检查部署node_exporter步骤时，验证的时候是否会有node节点出现。没有出现则在配置node_exporter文件步骤中，prometheus.yml文件是否配置正确，注意scrape_configs:节点下

 - job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["IP地址:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"- job_name: 'node'static_configs:- targets: ['IP地址:9101']

在grafana中dashboards中url是否配置正确