nginx日志分析笔记
nginx日志分析基础
nginx日志通常存储在/var/log/nginx/
目录下,默认包含access.log
和error.log
两种日志文件。access.log
记录客户端请求信息,error.log
记录服务器错误信息。
日志格式可通过nginx配置文件自定义,常见默认格式如下:
log_format combined '$remote_addr - $remote_user [$time_local] ''"$request" $status $body_bytes_sent ''"$http_referer" "$http_user_agent"';
常用分析工具
AWK命令行分析
快速统计HTTP状态码分布:
awk '{print $9}' access.log | sort | uniq -c | sort -rn
GoAccess实时分析
安装后生成HTML报告:
goaccess access.log -o report.html --log-format=COMBINED
ELK技术栈
使用Elasticsearch+Logstash+Kibana搭建可视化分析平台,Logstash配置示例:
input {file {path => "/var/log/nginx/access.log"start_position => "beginning"}
}
filter {grok {match => { "message" => "%{COMBINEDAPACHELOG}" }}
}
output {elasticsearch {hosts => ["localhost:9200"]}
}
Python分析示例
使用pandas进行流量分析:
import pandas as pdlogs = pd.read_csv('access.log', sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)', engine='python',header=None,names=['ip','ident','user','time','request','status','size','referer','agent']
)
top_ips = logs['ip'].value_counts().head(10)
关键指标分析
流量异常检测
计算每分钟请求量突增:
SELECT DATE_FORMAT(time, '%Y-%m-%d %H:%i') AS minute,COUNT(*) AS requests
FROM nginx_logs
GROUP BY minute
ORDER BY requests DESC
LIMIT 10;
慢请求分析
筛选处理时间超过5秒的请求(需日志包含$request_time字段):
awk '$NF > 5 {print $7}' access.log | sort | uniq -c | sort -nr
安全分析
检测可能的恶意扫描:
from collections import Counterwith open('access.log') as f:urls = [line.split()[6] for line in f]suspect_patterns = ['wp-admin', 'etc/passwd', '.git/']
hits = [url for url in urls if any(p in url for p in suspect_patterns)]
print(Counter(hits).most_common(10))