当前位置: 首页 > news >正文

K8s 指标收集方案对比

文章目录

    • 1. Background
    • 2. Solutions
      • 2.1. MetricBeat
      • 2.2 Telegraf
      • 2.3 MetricServer
      • 2.4 Kubelet cAdvisor
    • 3. Comparison

1. Background

Megacloud Portal needs to add monitoring for K8S. The current demand is

Obtain the CPU/Memory metrics of Node and Pod in K8S, and display TopN after processing.

To achieve this function, the server works as follows:

  • Collects resource metrics of K8S Node and Pod
  • ETL processing and storage of collected data
  • Implement API for front end to accquire data

The key is ** the collection and processing of metric data**. The following is a brief introduction and comparison of related collection schemes.

2. Solutions

2.1. MetricBeat

MetricBeat is a metric collection tool provided by Elasic. It can collect metrics from many open source software including Kubernetes, and can send data to ElasticSearch, Kafka, Redis, and Logstash for processing or storage.

The following is the data format collected from K8S Node and Pod in MetricBeat.

-Node Data Format (information other than CPU/Memory has been omitted)

{"@timestamp": "2017-04-06T15:29:27.150Z","beat": {"hostname": "beathost","name": "beathost","version": "6.0.0-alpha1"},"kubernetes": {"node": {"cpu": {"usage": {"core" : {"ns": 7247863769557035},"nanocores": 1662117892}},"memory": {"available": {"bytes": 134202847232},"majorpagefaults": 1044,"pagefaults": 83482928,"rss": {"bytes": 178053120},"usage": {"bytes": 67062091776},"workingset": {"bytes": 51496206336}},"name": "localhost","start_time": "2017-02-08T10:33:38Z"}},"metricset": {"host": "localhost:10255","module": "kubernetes","name": "node","rtt": 650741},"type": "metricsets"
}{"beat": {"hostname": "X1","name": "X1","version": "6.0.0-alpha1"},"kubernetes": {"node": {"cpu": {"allocatable": {"cores": 2},"capacity": {"cores": 2}},"memory": {"allocatable": {"bytes": 2097786880},"capacity": {"bytes": 2097786880}},"name": "minikube","pod": {"allocatable": {"total": 110},"capacity": {"total": 110}},"status": {"ready": "true","unschedulable": false}}},"metricset": {"host": "192.168.99.100:18080"}
}
  • Pod Data Format
{"@timestamp": "2017-04-06T15:29:27.150Z","beat": {"hostname": "beathost","name": "beathost","version": "6.0.0-alpha1"},"kubernetes": {"namespace": "ns","node": {"name": "localhost",},"pod": {"name": "nginx-3137573019-pcfzh","uid": "b89a812e-18cd-11e9-b333-080027190d51","network": {"rx": {"bytes": 18999261,"errors": 0},"tx": {"bytes": 28580621,"errors": 0}},"start_time": "2017-04-06T12:09:05Z"}},"metricset": {"host": "localhost:10255","module": "kubernetes","name": "pod","rtt": 636230},"type": "metricsets"
}
  • Container Data Format
{"@timestamp": "2017-04-06T15:29:27.150Z","beat": {"hostname": "beathost","name": "beathost","version": "6.0.0-alpha1"},"kubernetes": {"container": {"cpu": {"usage": {"core": {"ns": 3305756719},"nanocores": 5992}},"memory": {"available": {"bytes": 0},"majorpagefaults": 47,"pagefaults": 2298,"rss": {"bytes": 1441792},"usage": {"bytes": 7643136},"workingset": {"bytes": 1466368}},"name": "nginx",},"namespace": "ns","node": {"name": "localhost"},"pod": {"name": "nginx-3137573019-pcfzh",}},"metricset": {"host": "localhost:10255","module": "kubernetes","name": "container","rtt": 650739},"type": "metricsets"
}

MetricBeat only has CPU/Memory indicator data for Node and container. If we use MetricBeat for collection, we need to do the following:

  • Deploy MetricBeat on K8S. We may need to do a lot of manual operations
  • We ne

2.2 Telegraf

Telegraf is an open source software written in Go for metric collection. Like MetricBeat, it provides numerous plugins to collect data from multiple sources.

For Kubernetes, Telegraf provides a Kubernetes plugin to collect data. It gets data through Kubelet’s stats/sumary API. It can also be used with the Prometheus plugin to collect more metric data.

The following are some metric data formats. Like MertricBeat, it does not provide Pod-level CPU/Memory statistics, and needs to be aggregated based on container data.

type NodeMetrics struct {NodeName         string             `json:"nodeName"`SystemContainers []ContainerMetrics `json:"systemContainers"`StartTime        time.Time          `json:"startTime"`CPU              CPUMetrics         `json:"cpu"`Memory           MemoryMetrics      `json:"memory"`Network          NetworkMetrics     `json:"network"`FileSystem       FileSystemMetrics  `json:"fs"`Runtime          RuntimeMetrics     `json:"runtime"`
}// PodMetrics contains metric data on a given pod
type PodMetrics struct {PodRef     PodReference       `json:"podRef"`StartTime  *time.Time         `json:"startTime"`Containers []ContainerMetrics `json:"containers"`Network    NetworkMetrics     `json:"network"`Volumes    []VolumeMetrics    `json:"volume"`
}// ContainerMetrics represents the metric data collect about a container from the kubelet
type ContainerMetrics struct {Name      string            `json:"name"`StartTime time.Time         `json:"startTime"`CPU       CPUMetrics        `json:"cpu"`Memory    MemoryMetrics     `json:"memory"`RootFS    FileSystemMetrics `json:"rootfs"`LogsFS    FileSystemMetrics `json:"logs"`
}

2.3 MetricServer

MetricServer also obtains metric data through the /stats/summary API provided by Kubelet. MetricServer stores the data in memory, and then provides API based on the kube-Aggregator mechanism to provide external access.

The fixed URL prefix of the API provided by MetricSever is
/apis/metrics/v1alpha1/, and then combined with the following APIs for external access to metric data, all APIs only support the GET method:

  • /nodes-Get all Node’s metric data.
  • /nodes/(node)-Get metric data of the specified Node.
  • /namespaces/(namespace)/pods-Get all Pod metrics under a certain namespace.
  • /namespaces/(namespace)/pods/(pod)- Get the metric data of the specified Pod.

In addition, We can view the CPU/Memory metrics of Node and Pod through the kubectl top command on the terminal.

$ kubectl top nodes
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
tk01            217m         10%    5296Mi          68%
vm-0-2-ubuntu   84m          4%     1189Mi          32%$ kubectl top pods --all-namespaces
NAMESPACE     NAME                              CPU(cores)   MEMORY(bytes)
kube-system   coredns-f9fd979d6-jzv8q           4m           10Mi
kube-system   coredns-f9fd979d6-tx9m4           4m           10Mi
kube-system   etcd-tk01                         14m          50Mi
kube-system   kube-apiserver-tk01               31m          293Mi

K8S provides libraries to access the above APIs. Now We has implemented the first version of metric-server-collector, which is based on MetricServer API to obtain the CPU/Memory metrics of Node and Pod and convert them into the data format we need.

2.4 Kubelet cAdvisor

Kubelet integrates cAdvisor to collect statistics on the CPU, memory, file system, and network usage of the container on the node. It provides the /stats/summary API for externally obtaining the metric data, The above-mentioned Telegraf and MetricServer schemes all obtain metric data through this API. Therefore, we can access Kubelet’s API directly to obtain metric data instead of use above tools.

For this solution, the following changes need to be made based on the current metric-server-collector:

  • Rewrite the scraper code, replace the MetricServer API, and access the kubelet API instead to obtain indicator data.
  • Convert the data returned by Kubelet to the required data format.

3. Comparison

Based on the above information, the development, operation and maintenance work of each program is compared as follows

SolutionDevelopment taskDevelopment complexityDeployment operationDeployment complexityOthers
MetricBeatDevelop ETL tools to process data★ ★ ★MetricBeat + ETL tools + data transmission & storage components.★ ★ ★
TelegrafDevelop Telegraf processor or related ETL tools to process data★ ★ ★Deploy Telegraf + ETL tools★ ★ ★
MetricServerNo additional development requiredMetricServer + collectorThe data is stored in memory, which consumes resources when the amount of data is large
Kubelet cAdvisorRe-implement the collector, expected one week★ ★Only need to deploy collector

Based on the above comparison, the preliminary conclusions are as follows:

  • The data format of MetricBeat does not meet the requirements and requires more additional processing, so it is not considered.

  • On the premise of only collecting CPU/Memory metrics for Node and Pod in K8S, Telegraf is a bit of a slasher, and requires additional development and operation and maintenance work. It can be temporarily stopped without collecting more K8S metrics. consider.

  • Compared with the above two schemes, it is very simple to deploy MetricSeverr on K8S, and it can be combined with the metric-server-collector to meet the requirements.

  • The collector implementation based on the Kubelet API can be regarded as an optimization based on the MetricServer implementation, which reduces unnecessary component operation and resource consumption. We can get the most primitive data for conversion on demand.

Reference

  • MetricBeat Kubernetes Module
  • Telegraf Kubernetes Input Plugin
  • Metrics API design
  • Metrics Server design

Appendix

  • Kubelet /stats/summary API response data
{"node": {"nodeName": "tk01", "systemContainers": [{"name": "kubelet", "startTime": "2021-01-26T05:10:01Z", "cpu": {"time": "2021-01-26T05:10:22Z", "usageNanoCores": 108726826, "usageCoreNanoSeconds": 2009799168}, "memory": {"time": "2021-01-26T05:10:22Z", "usageBytes": 61022208, "workingSetBytes": 52633600, "rssBytes": 32174080, "pageFaults": 45899, "majorPageFaults": 253}}], "startTime": "2020-09-12T12:43:58Z", "cpu": {"time": "2021-01-26T05:10:27Z", "usageNanoCores": 1078216675, "usageCoreNanoSeconds": 1758203828797878}, "memory": {"time": "2021-01-26T05:10:27Z", "availableBytes": 2564227072, "usageBytes": 7543758848, "workingSetBytes": 5632106496, "rssBytes": 3926122496, "pageFaults": 2304667, "majorPageFaults": 859}, "network": {"time": "2021-01-26T05:10:27Z", "name": "eth0", "rxBytes": 166587714846, "rxErrors": 0, "txBytes": 192097080030, "txErrors": 0, "interfaces": [{"name": "eth0", "rxBytes": 166587714846, "rxErrors": 0, "txBytes": 192097080030, "txErrors": 0}]}, "fs": {"time": "2021-01-26T05:10:27Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 26562187264, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 360152}, "runtime": {"imageFs": {"time": "2021-01-26T05:10:27Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 1377648552, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 360152}}, "rlimit": {"time": "2021-01-26T05:10:32Z", "maxpid": 32768, "curproc": 905}}, "pods": [{"podRef": {"name": "etcd-tk01", "namespace": "kube-system", "uid": "2e8885329cb9c936db545fcd71666003"}, "startTime": "2021-01-26T05:10:08Z", "containers": [{"name": "etcd", "startTime": "2021-01-26T05:10:09Z", "cpu": {"time": "2021-01-26T05:10:22Z", "usageNanoCores": 208950881, "usageCoreNanoSeconds": 2582358831}, "memory": {"time": "2021-01-26T05:10:22Z", "usageBytes": 37478400, "workingSetBytes": 37081088, "rssBytes": 36110336, "pageFaults": 11531, "majorPageFaults": 10}, "rootfs": {"time": "2021-01-26T05:10:22Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 36864, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 8}, "logs": {"time": "2021-01-26T05:10:22Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 28672, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 360152}}], "cpu": {"time": "2021-01-26T05:10:24Z", "usageNanoCores": 161540656, "usageCoreNanoSeconds": 34928899852771}, "memory": {"time": "2021-01-26T05:10:24Z", "usageBytes": 71917568, "workingSetBytes": 67014656, "rssBytes": 36155392, "pageFaults": 0, "majorPageFaults": 0}, "network": {}, "ephemeral-storage": {"time": "2021-01-26T05:10:27Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 65536, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 8}, "process_stats": {"process_count": 0}}]
}
http://www.xdnf.cn/news/993313.html

相关文章:

  • 在Ubuntu中使用Apache2部署项目
  • FastJSON 1.2.83版本升级指南:安全加固与性能优化实践
  • 二维磁光材料新纪元!NYUAD论文展示CCPS如何解决硅基光子芯片热耗散难题
  • 若依框架中权限字符(perms)的作用
  • Spark on yarn的作业提交流程
  • Android7 Input(十一)App View InputEvent事件分发
  • Appium + Python 测试全流程
  • STM32外设学习之串口
  • ABP vNext + Redis Streams:构建实时事件驱动架构
  • Redis的常用配置详解
  • 如何彻底解决缓存击穿、缓存穿透、缓存雪崩
  • Redis集群模式之Redis Cluster(1)
  • SPP——神经网络中全连接层输出尺寸限制的原因和解决办法
  • 【强连通分量 拓扑序】P9431 [NAPC-#1] Stage3 - Jump Refreshers|普及+
  • HashMap真面目
  • Python数据可视化艺术:动态壁纸生成器
  • 《C++初阶之类和对象》【类 + 类域 + 访问限定符 + 对象的大小 + this指针】
  • Vue3+TypeScript实现中介者模式
  • 【Docker管理工具】安装容器管理工具Oxker
  • 通信网络编程2.0——JAVA
  • HALCON第五讲-> 形状匹配
  • 每日八股文6.12
  • 蓝桥杯20112 不同的总分值
  • 网页怎么调用字体ttf文件?
  • Go 语言安装指南:并解决 `url.JoinPath` 及 `Exec format error` 问题
  • [论文阅读] 系统架构 | 零售 IT 中的微服务与实时处理:开源工具链与部署策略综述
  • MySQL数据库:关系型数据库的基石
  • AVL树的平衡艺术:用C++写出会“站立”的二叉树(未完待续)
  • 【SAS求解多元回归方程】REG多元回归分析-多元一次回归
  • windows基线配置