集成 OpenTelemetry + Grafana:实现 ABP VNext 的全链路可观测性
集成 OpenTelemetry + Grafana:实现 ABP VNext 的全链路可观测性
在现代微服务架构中,可观测性(Observability) 是保障系统稳定性和性能的核心能力。本文以生产级配置为目标,在 ABP VNext 中一键落地 OpenTelemetry + Collector + Grafana,覆盖端到端的追踪、指标、日志关联与告警。
📚 目录
- 集成 OpenTelemetry + Grafana:实现 ABP VNext 的全链路可观测性
- 🧠 背景知识
- 🏗️ 整体架构设计
- 部署拓扑
- 🔧 环境准备
- NuGet 包依赖
- 环境配置文件
- appsettings.Development.json
- appsettings.Production.json
- Docker Compose(含 Collector)
- otel-collector-config.yaml
- 🚀 集成步骤
- 1️⃣ 在 ABP 模块中统一注册
- 流程图:初始化与注册顺序
- 2️⃣ 自定义追踪与指标
- 3️⃣ 在 Program.cs 中暴露 Metrics 端点
- 📊 Grafana 可视化与告警
- 流程图:告警触发流程
- 📦 一键启动与项目说明
🧠 背景知识
-
OpenTelemetry
开源可观测性框架,支持分布式追踪、指标(Metrics)与日志(Logs)的统一采集与导出。 -
Grafana
功能强大的可视化平台,支持多种数据源(Prometheus、Tempo、Elasticsearch 等),用于面板展示、告警规则与 Service Map。 -
ABP VNext
基于 ASP.NET Core 的模块化应用框架,天然支持中间件扩展与依赖注入,适合集成 OpenTelemetry。
🏗️ 整体架构设计
部署拓扑
🔧 环境准备
- .NET SDK
- ABP vNext
- Docker & Docker Compose
NuGet 包依赖
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.Runtime
dotnet add package OpenTelemetry.Instrumentation.EntityFrameworkCore
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Exporter.Prometheus
dotnet add package OpenTelemetry.Extensions.Logging
环境配置文件
appsettings.Development.json
{"OpenTelemetry": {"OtlpEndpoint": "http://collector:4317","SamplerRatio": 1.0}
}
appsettings.Production.json
{"OpenTelemetry": {"OtlpEndpoint": "https://otel-collector.internal:4317","SamplerRatio": 0.1,"AuthToken": "${OTEL_EXPORTER_OTLP_HEADERS}"}
}
Docker Compose(含 Collector)
version: '3.8'
services:collector:image: otel/opentelemetry-collector:0.76.0volumes:- ./otel-collector-config.yaml:/etc/otel/config.yamlcommand: ["--config", "/etc/otel/config.yaml"]ports:- "4317:4317"- "8888:8888"prometheus:image: prom/prometheus:v2.45.0ports:- "9090:9090"volumes:- ./prometheus.yml:/etc/prometheus/prometheus.ymltempo:image: grafana/tempo:2.4.1ports:- "3200:3200"grafana:image: grafana/grafana:10.0.3ports:- "3000:3000"volumes:- grafana-data:/var/lib/grafanavolumes:grafana-data:
otel-collector-config.yaml
receivers:otlp:protocols:grpc:http:exporters:prometheus:endpoint: "0.0.0.0:8888"otlp:endpoint: "${OTEL_EXPORTER_OTLP_ENDPOINT}"headers:"Authorization": "Bearer ${OTEL_EXPORTER_OTLP_HEADERS}"tls:ca_file: "/etc/otel/ca.crt"cert_file: "/etc/otel/client.crt"key_file: "/etc/otel/client.key"service:pipelines:traces:receivers: [otlp]exporters: [otlp]metrics:receivers: [otlp]exporters: [prometheus]
安全建议:
- Collector Service 在 Kubernetes 中使用
ClusterIP
,前端服务通过 mTLS 或 Token 访问;- 限制端口 4317/8888 在内部网络,避免公网暴露。
🚀 集成步骤
1️⃣ 在 ABP 模块中统一注册
public override void PreConfigureServices(ServiceConfigurationContext context)
{var services = context.Services;var configuration = services.GetConfiguration();// —— Tracing ——services.AddOpenTelemetryTracing(builder =>{builder.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("MyApp", serviceVersion: "1.0.0")).AddAspNetCoreInstrumentation(opts => opts.RecordException = true).AddHttpClientInstrumentation().AddRuntimeInstrumentation().AddEntityFrameworkCoreInstrumentation().AddSource("MyApp").AddOtlpExporter(opt =>{opt.Endpoint = new Uri(configuration["OpenTelemetry:OtlpEndpoint"]!);if (configuration["OpenTelemetry:AuthToken"] is string token)opt.Headers = new MetadataCollection { { "Authorization", $"Bearer {token}" } };}).SetSampler(new TraceIdRatioBasedSampler(Convert.ToDouble(configuration["OpenTelemetry:SamplerRatio"])));});// —— Metrics ——services.AddMeter("MyAppMetrics");services.AddOpenTelemetryMetrics(builder =>{builder.AddAspNetCoreInstrumentation().AddHttpClientInstrumentation().AddRuntimeInstrumentation().AddMeter("MyAppMetrics").AddPrometheusExporter();});// —— Logs ——services.AddLogging(logging =>{logging.AddOpenTelemetry(options =>{options.IncludeFormattedMessage = true;options.ParseStateValues = true;options.AddOtlpExporter(opt =>{opt.Endpoint = new Uri(configuration["OpenTelemetry:OtlpEndpoint"]!);});});});
}
流程图:初始化与注册顺序
2️⃣ 自定义追踪与指标
// 注入 ActivitySource
services.AddSingleton<ActivitySource>(_ => new ActivitySource("MyApp"));// 业务服务中使用
public class OrderService
{private readonly ActivitySource _activitySource;public OrderService(ActivitySource activitySource) => _activitySource = activitySource;public async Task ProcessAsync(){using var activity = _activitySource.StartActivity("Order.Process", ActivityKind.Server);activity?.SetTag("order.id", Guid.NewGuid().ToString());// …业务逻辑…}
}// 自定义指标示例
var meter = new Meter("MyAppMetrics", "1.0.0");
var orderCounter = meter.CreateCounter<long>("order.processed.count");
orderCounter.Add(1, new("status", "success"));
3️⃣ 在 Program.cs 中暴露 Metrics 端点
app.UseRouting();app.UseOpenTelemetryPrometheusScrapingEndpoint("/metrics");app.UseAuthentication();
app.UseAuthorization();app.UseEndpoints(endpoints =>
{endpoints.MapControllers();
});
📊 Grafana 可视化与告警
-
数据源
- Tempo(Traces),Prometheus(Metrics)
-
推荐 Dashboard
- HTTP 延迟直方图
- Service Map(Trace Explorer)
- 自定义业务指标(订单处理速率、成功率)
-
告警示例
# 95% 请求延迟超过 500ms histogram_quantile(0.95, sum(rate(request_duration_seconds_bucket[5m])) by (le)) > 0.5# 5xx 错误率超过 1% sum(rate(request_duration_seconds_count{status=~"5.."}[5m]))/ sum(rate(request_duration_seconds_count[5m])) > 0.01
流程图:告警触发流程
📦 一键启动与项目说明
- 启动脚本
start.sh
(项目根目录)
#!/usr/bin/env bashcp appsettings.Development.json appsettings.jsondocker-compose up -d
- README.md
- 切换环境:编辑
appsettings.{Environment}.json
- Collector 配置:
otel-collector-config.yaml
- Dashboard 存放:
grafana/dashboards/*.json
- 压测示例:
- 切换环境:编辑
siege -c10 -t30S http://localhost:5000/api/healthwrk -c10 -t2 -d30s http://localhost:5000/api/health