Zabbix 高可用架构部署方案(2最新版)
Zabbix 高可用架构部署方案(MySQL + 双 VIP+HAProxy+Nginx)
前景提要:使用 MySQL 作为数据库,两个虚拟 IP(10.0.0.100 和 10.0.0.200),HAProxy 作为数据库负载均衡,Nginx 作为 Web 访问入口。
1. 架构规划
Server1(10.0.0.12):主 Zabbix Server + MySQL 主库 + HAProxy(主) + Keepalived
Server2(10.0.0.15):备 Zabbix Server + MySQL 从库 + HAProxy(备) + Keepalived
Server3(10.0.0.18):Nginx 负载均衡器
2.环境准备
在所有服务器上执行:
# 更新系统 时间可能会有点儿长(可选)
yum update -y# 关闭防火墙和SELinux(生产环境需配置规则)
systemctl disable --now firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config# 配置主机名解析
cat > /etc/hosts << EOF
127.0.0.1 localhost localhost.localdomain
10.0.0.12 server1 zabbix-master
10.0.0.15 server2 zabbix-backup
10.0.0.18 server3 zabbix-lb
10.0.0.100 zabbix-web
10.0.0.200 zabbix-db
EOF# 安装基础工具(可选)
yum install -y vim wget net-tools
3.安装 MySQL
在Server2(10.0.0.15)上执行: 建议server1在安装zabbix时安装mysql
# 安装mysql
yum install mysql-server -y# 启动并设置开机自启
systemctl enable --now mysql# 安全初始化
mysql_secure_installation
4. 配置 MySQL主从复制(挫折重重)
主库(server1)配置 此步骤建议在zabbix官网安装zabbix完成之后再进行配置
cat > /etc/my.cnf.d/mysql-server.cnf << EOF
[mysqld]
server-id=1
log-bin=mysql-bin
binlog-do-db=zabbix
expire-logs-days=10
max-binlog-size=100M
binlog-format=ROW
innodb_flush_log_at_trx_commit=1
sync_binlog=1
EOF# 重启MySQL
systemctl restart mysqld# 创建复制用户 zabbix已经在安装zabbix 的时候创建好,这里就不再赘述
CREATE USER 'repl'@'%' IDENTIFIED BY 'ReplicationPassword';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
SHOW MASTER STATUS;
记录SHOW MASTER STATUS输出的File和Position值。 并将值填写到从库配置里
从库(server2)配置
cat > /etc/my.cnf.d/mysql-server.cnf << EOF
[mysqld]
server-id=2
log-bin=mysql-bin
binlog-do-db=zabbix
expire-logs-days=10
max-binlog-size=100M
binlog-format=ROW
relay-log=mysql-relay-bin
read-only=1
innodb_flush_log_at_trx_commit=1
sync_binlog=1
EOF# 重启MySQL
systemctl restart mysqld# 配置从库连接主库(替换FILE和POSITION值)CHANGE MASTER TOMASTER_HOST='10.0.0.12',MASTER_USER='repl',MASTER_PASSWORD='ReplicationPassword',MASTER_LOG_FILE='mysql-bin.000006',MASTER_LOG_POS=1117065;START SLAVE;
SHOW SLAVE STATUS\G;
确保Slave_IO_Running和Slave_SQL_Running均为Yes。
实际挫折1 好几次出现下面的报错
解决方法:
主节点操作:
- 修改复制用户的认证插件为
mysql_native_password
(兼容性好,MySQL 5.7 及之前常用 ):
ALTER USER 'repl'@'%' IDENTIFIED WITH mysql_native_password BY 'ReplicationPassword'; FLUSH PRIVILEGES;
从节点操作:
重新配置主从连接(无需 SSL ,简单场景 ):
CHANGE MASTER TO MASTER_HOST='10.0.0.12', MASTER_USER='repl', MASTER_PASSWORD='ReplicationPassword', MASTER_LOG_FILE='mysql-bin.xxxxxx', MASTER_LOG_POS=xxxxxx;
START SLAVE;
SHOW SLAVE STATUS\G; # 查看是否恢复
实际挫折2 修改完之后又出现下面的报错
错误分析:主从节点都为1
解决方法:
在主节点或者从节点更改一下server_id=xxx 使两个值不一样即可(/etc/my.cnf.d/mysql-server.cnf)
实际挫折3 改完又遇到下面的错误
查 performance_schema: 登录 MySQL,查询 performance_schema.replication_applier_status_by_worker 表,获取 Worker线程的详细错误
SELECT * FROM performance_schema.replication_applier_status_by_worker\G;
重点看 LAST_ERROR_MESSAGE
字段,能看到事务执行失败的具体 SQL 或原因。
解决办法:
(1)主库导出数据
mysqldump -u root -p zabbix > zabbix_db.sql
(2)将主库里面的zabbix 转到从库里
scp zabbix_db.sql 从库用户@从库IP:/tmp/
(3)在从库导入 zabbix 库:
CREATE DATABASE zabbix;
mysql zabbix < zabbix_db.sql
(4)停止从库
STOP SLAVE;
非 GTID 模式:
#跳过错误事务
CHANGE MASTER TO MASTER_LOG_FILE='mysql - bin.000006', MASTER_LOG_POS=75510; -- 错误位置 +1GTID 模式:先查当前 GTID 集合,找到对应事务的 GTID 并跳过(假设 GTID 为 xxx:123 ):SET GLOBAL sql_slave_skip_counter = 1;启动从库复制:START SLAVE;
验证复制状态:SHOW SLAVE STATUS\G;
实际挫折4 做完上述的 又遇到新的错误
解决方法:
stop slave; reset slave; CHANGE MASTER TO ...(此项又操作一遍) 后再次查看恢复正常
5. 导入 Zabbix 数据库架构
在主库(Server1)上执行: 下载Zabbix 官网安装参考
# 添加Zabbix仓库
rpm -Uvh https://repo.zabbix.com/zabbix/7.0/rocky/9/x86_64/zabbix-release-latest-7.0.el9.noarch.rpm
dnf clean all# 导入Zabbix数据库架构
zcat /usr/share/zabbix-sql-scripts/mysql/server.sql.gz | mysql --default-character-set=utf8mb4 -uzabbix -p zabbix
6. 安装 Zabbix Server
在主从库上分别执行:
# 安装Zabbix Server、Web前端和Agent
dnf install -y zabbix-server-mysql zabbix-web-mysql zabbix-nginx-conf zabbix-sql-scripts zabbix-agent# 配置Zabbix Server连接数据库(最好是将原来的文件备份然后重新再建一个)
cat > /etc/zabbix/zabbix_server.conf << EOF
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBHost=10.0.0.200
DBName=zabbix
DBUser=zabbix
DBPassword=ZabbixPassword
DBPort=3306
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
Timeout=4
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
LogSlowQueries=3000
StartPollers=15
StartPollersUnreachable=5
StartTrappers=5
StartPingers=1
StartDiscoverers=1
CacheSize=128M
HistoryCacheSize=64M
TrendCacheSize=64M
ValueCacheSize=256M
EOF# 配置Web前端时区
sed -i 's/;date.timezone =/date.timezone = Asia\/Shanghai/' /etc/php.ini# 启动服务
systemctl enable --now zabbix-server zabbix-agent nginx php-fpm
7. 配置 HAProxy(数据库负载均衡)
在 Server1 和 Server2 上分别执行:
# 安装HAProxy
dnf install -y haproxy# 配置HAProxy
cat > /etc/haproxy/haproxy.cfg << EOF
globallog /dev/log local0log /dev/log local1 noticechroot /var/lib/haproxystats socket /var/lib/haproxy/statsuser haproxygroup haproxydaemondefaultslog globalmode tcpoption tcplogoption dontlognulltimeout connect 5000timeout client 50000timeout server 50000listen mysql-clusterbind 10.0.0.200:3306mode tcpbalance sourceoption mysql-check user haproxy_checkserver mysql-master 10.0.0.12:3306 check weight 100server mysql-slave 10.0.0.15:3306 check weight 50 backuplisten statsbind *:9000mode httpstats enablestats uri /statsstats realm HAProxy\ Statisticsstats auth admin:password
EOF# 创建监控用户
mysql -u root -p << EOF
CREATE USER 'haproxy_check'@'%' IDENTIFIED BY 'CheckPassword';
GRANT PROCESS ON *.* TO 'haproxy_check'@'%';
FLUSH PRIVILEGES;
EOF# 启动HAProxy
systemctl enable --now haproxy
遇到的问题 haproxy 重启失败
配置文件的问题
8. 配置 Keepalived 实现双 VIP
主 Server(10.0.0.12)配置:
# 安装Keepalived
dnf install -y keepalived# 配置Keepalived
cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalivedglobal_defs {router_id ZABBIX_MASTER
}# Web VIP (10.0.0.100)
vrrp_instance VI_WEB {state MASTERinterface eth0virtual_router_id 101priority 101advert_int 1authentication {auth_type PASSauth_pass 1111}virtual_ipaddress {10.0.0.100/24}track_script {chk_httpd}
}# DB VIP (10.0.0.200)
vrrp_instance VI_DB {state MASTERinterface eth0virtual_router_id 201priority 101advert_int 1authentication {auth_type PASSauth_pass 2222}virtual_ipaddress {10.0.0.200/24}track_script {chk_haproxy}
}# 监控脚本
vrrp_script chk_httpd {script "systemctl is-active httpd"interval 2weight -20
}vrrp_script chk_haproxy {script "systemctl is-active haproxy"interval 2weight -20
}
EOF# 启动Keepalived
systemctl enable --now keepalived
备 Server(10.0.0.15)配置:
# 安装Keepalived
dnf install -y keepalived# 配置Keepalived
cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalivedglobal_defs {router_id ZABBIX_BACKUP
}# Web VIP (10.0.0.100)
vrrp_instance VI_WEB {state BACKUPinterface eth0virtual_router_id 101priority 100advert_int 1authentication {auth_type PASSauth_pass 1111}virtual_ipaddress {10.0.0.100/24}track_script {chk_httpd}
}# DB VIP (10.0.0.200)
vrrp_instance VI_DB {state BACKUPinterface eth0virtual_router_id 201priority 100advert_int 1authentication {auth_type PASSauth_pass 2222}virtual_ipaddress {10.0.0.200/24}track_script {chk_haproxy}
}# 监控脚本
vrrp_script chk_httpd {script "systemctl is-active httpd"interval 2weight -20
}vrrp_script chk_haproxy {script "systemctl is-active haproxy"interval 2weight -20
}
EOF# 启动Keepalived
systemctl enable --now keepalived
9. 配置 Nginx 负载均衡(Server3)
# 安装Nginx
dnf install -y nginx# 配置Nginx代理Zabbix Web
cat > /etc/nginx/conf.d/zabbix.conf << EOF
upstream zabbix_backend {server 10.0.0.100:80 weight=10 max_fails=3 fail_timeout=30s;
}server {listen 80;server_name zabbix.example.com;location / {proxy_pass http://zabbix_backend;proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;proxy_set_header X-Forwarded-Proto $scheme;proxy_connect_timeout 150;proxy_send_timeout 100;proxy_read_timeout 100;proxy_buffers 4 32k;client_max_body_size 8m;client_body_buffer_size 128k;# Zabbix Web优化proxy_http_version 1.1;proxy_set_header Connection "";proxy_cache_bypass $http_upgrade;}
}
EOF# 启动Nginx
systemctl enable --now nginx
10. 验证高可用性
访问 http://10.0.0.18/zabbix 完成 Web 界面初始化配置
验证 MySQL 主从复制:
bash
mysql -uzabbix -ppassword -h 10.0.0.200 -e "SHOW SLAVE STATUS\G"测试故障转移:
停止 Server1 的 Keepalived 服务,验证 VIP 是否自动切换到 Server2
访问 http://10.0.0.18/zabbix 确认服务正常
恢复 Server1 的 Keepalived 服务,验证 VIP 是否自动切回
11. 防火墙配置(生产环境)
# Server1和Server2
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-port=10051/tcp
firewall-cmd --permanent --add-port=3306/tcp
firewall-cmd --permanent --add-port=9000/tcp # HAProxy统计页面
firewall-cmd --permanent --add-protocol=vrrp # Keepalived
firewall-cmd --reload# Server3
firewall-cmd --permanent --add-service=http
firewall-cmd --reload
12. 监控与维护
MySQL 主从状态:定期检查复制延迟
HAProxy 状态:访问 http://10.0.0.18:9000/stats
Keepalived 状态:检查 VIP 是否正常工作
Zabbix 自监控:配置 Zabbix 监控自身组件状态