当前位置：首页 > ai >正文

三、kafka消费的全流程

ai 2025/7/13 22:23:34

五、多线程安全问题

1、多线程安全的定义

使用多线程访问一个资源，这个资源始终都能表现出正确的行为。

不被运行的环境影响、多线程可以交替访问、不需要任何额外的同步和协同。

2、Java实现多线程安全生产者

这里只是模拟多线程环境下使用生产者发送消息，其实没有做额外的线程安全操作，就是把生产者当成了一个公共资源，所有线程都可以访问这个生产者。

kafka默认客户端提供的生产者本身就是线程安全的，因为生产者发送消息只有一步操作，就是发送消息。只要消息进入消息缓冲区就可以发送给broker，不会出现消息重复发送。

package com.allwe.client.concurrent;import com.allwe.client.partitioner.MyPartitioner;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;/*** 生产者多线程安全 - 测试demo** @Author: AllWe* @Date: 2024/09/27/9:30*/
@Data
@Slf4j
public class ConcurrentProducerWorker {/*** 消息数量*/private static final int RECORD_COUNT = 1000;/*** 固定线程池 - 线程数等于CPU核数*/private static final ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());/*** 发令枪*/private static final CountDownLatch countDownLatch = new CountDownLatch(RECORD_COUNT);/*** 生产者 - 这里让所有的线程都共享同一个生产者*/private static KafkaProducer<String, String> kafkaProducer;/*** 类初始化的时候 - 创建生产者实例*/static {// 设置属性Properties properties = new Properties();properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");properties.put("key.serializer", StringSerializer.class);properties.put("value.serializer", StringSerializer.class);properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, MyPartitioner.class);kafkaProducer = new KafkaProducer<>(properties);}/*** 启动器*/public static void main(String[] args) {try {// 循环创建消息for (int count = 0; count < RECORD_COUNT; count++) {ProducerRecord<String, String> record = new ProducerRecord<>("topic_6", "allwe", "allwe_" + count);executorService.submit(new ConcurrentProducer(record, kafkaProducer, countDownLatch));}countDownLatch.await();} catch (Exception e) {e.printStackTrace();} finally {// 关闭生产者连接kafkaProducer.close();// 释放线程池资源executorService.shutdown();}}
}

package com.allwe.client.concurrent;import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;import java.util.concurrent.CountDownLatch;/*** 生产者多线程安全 - 测试demo** @Author: AllWe* @Date: 2024/09/27/9:30*/
@Data
@Slf4j
public class ConcurrentProducer implements Runnable {/*** 消息体*/private ProducerRecord<String, String> record;/*** 生产者*/private KafkaProducer<String, String> producer;/*** 发令枪*/private CountDownLatch countDownLatch;public ConcurrentProducer(ProducerRecord<String, String> record, KafkaProducer<String, String> producer, CountDownLatch countDownLatch) {this.record = record;this.producer = producer;this.countDownLatch = countDownLatch;}@Overridepublic void run() {try {String name = Thread.currentThread().getName();producer.send(record, new ConcurrentCallBackImpl(name));countDownLatch.countDown();} catch (Exception e) {e.printStackTrace();}}
}

package com.allwe.client.concurrent;import cn.hutool.core.util.ObjectUtil;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.RecordMetadata;/*** 异步发送消息回调解析器** @Author: AllWe* @Date: 2024/09/27/9:30*/
public class ConcurrentCallBackImpl implements Callback {private String threadName;public ConcurrentCallBackImpl(String threadName) {this.threadName = threadName;}@Overridepublic void onCompletion(RecordMetadata recordMetadata, Exception e) {if (ObjectUtil.isNull(e)) {// 解析回调元数据System.out.println(threadName + "|-offset:" + recordMetadata.offset() + ",partition:" + recordMetadata.partition());} else {e.printStackTrace();}}
}

3、Java实现多线程安全消费者

kafka客户端提供的消费者不是多线程安全的，是因为消费者在消费消息的时候，需要有2步操作：取消息和ACK确认，在多线程场景下可能会出现：

① 线程1取到了消息，但是没来得及进行ACK确认。

② 线程2进来了，又消费了一次相同的消息。

③ 线程2提交ACK确认。

④ 线程1提交ACK确认。

这样就会产生重复消费，这个时候就需要对消费者进行额外处理。

有两个处理方案：

① 给消费过程加锁，但是会降低程序执行效率。

② 每一个线程都创建自己的消费者，只消费自己分区内的数据。

我写的demo是使用第二种办法。

package com.allwe.client.concurrent;import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;import java.time.Duration;
import java.util.Collections;
import java.util.Properties;/*** 线程安全消费者 - 测试demo** @Author: AllWe* @Date: 2024/09/27/12:19*/
@Data
@Slf4j
public class ConcurrentConsumer implements Runnable {/*** 消费者配置参数*/private Properties properties;/*** 群组id*/private String groupId;/*** 消费主题*/private String topicName;/*** 消费者实例*/private KafkaConsumer<String, String> consumer;public ConcurrentConsumer(Properties properties, String groupId, String topicName) {this.properties = properties;this.groupId = groupId;this.topicName = topicName;// 补充配置参数properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);// 创建消费者实例 - 每一个线程都创建自己的消费者,避免共享相同的消费者实例consumer = new KafkaConsumer<>(properties);// 配置消费主题consumer.subscribe(Collections.singleton(topicName));}@Overridepublic void run() {try {String threadName = Thread.currentThread().getName();while (true) {ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));for (ConsumerRecord<String, String> record : records) {StringBuilder stringBuilder = new StringBuilder(threadName).append("|-");stringBuilder.append("partition:").append(record.partition());stringBuilder.append("offset:").append(record.offset());stringBuilder.append("key:").append(record.key());stringBuilder.append("value:").append(record.value());System.out.println(stringBuilder);}}} finally {consumer.close();}}
}

package com.allwe.client.concurrent;import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringDeserializer;import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;/*** 多线程安全消费者 - 测试demo** @Author: AllWe* @Date: 2024/09/27/12:34*/
@Data
@Slf4j
public class ConcurrentConsumerWorker {/*** 消费线程数*/private static final Integer THREAD_COUNT = 2;/*** 线程池 - 2个线程,别超过目标主题的分区数*/private static ExecutorService executorService = Executors.newFixedThreadPool(THREAD_COUNT);public static void main(String[] args) {// 消费者配置Properties properties = new Properties();properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");properties.put("key.deserializer", StringDeserializer.class);properties.put("value.deserializer", StringDeserializer.class);properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); // 从头开始消费for (Integer i = 0; i < THREAD_COUNT; i++) {executorService.submit(new ConcurrentConsumer(properties, "allwe01", "topic_6"));}}
}

六、群组协调

1、群主

在每一个群组内部，都有一个【群主】。往往是第一个注册进入群组的消费者承担，它的职责是读取当前群组消费的主题，以及目标主题的分区信息。

群主节点的数据权限高于普通消费者，它可以获取全部消费者节点对应的分区信息。但是普通消费者节点只能看见本节点的分区信息。

2、消费者协调器

属于客户端，每个消费者群组内部都有一个消费者协调器，用于获取群主节点保存的分区信息，再协调群组内的其他消费者处理哪些主题和分区。

分配好任务后将配置信息推送给【组协调器】，组协调器再将消息发送给不同的消费者。

当群组内出现某个节点掉线、上线时，消费者协调器也会参与协调。

1、向【组协调器】发送入组请求。

2、发起同步组的请求 -- 由群组计算分配策略，确定消费者的分区划分，发送给组协调器。

3、心跳机制（与组协调器维持）。

4、提交ACK确认（发起已经提交的消费偏移量的请求）。

5、主动发起离组请求。

3、组协调器

属于kafka broker，主要负责以下功能：

1、处理申请加入群组的消费者，并且选举群主。

2、收到同步组的请求后，触发分区再均衡，同步新的分配方案。

3、心跳机制（与客户端维持），如果得知哪些客户端掉线了，触发分区再均衡机制。

4、管理消费者已经消费的偏移量，保存在主题【__consumer_offsets】，默认有50个分区。

4、新的消费者加入群组的处理流程

1、消费者客户端启动、重连，都会给组协调器发送一个入组请求（joinGroup请求）。

2、消费者客户端完成joinGroup后，消费者协调器向组协调器发起同步组请求（SyncGroup请求），获取新的分配方案。

3、入组后保持心跳（客户端控制参数：max.poll.interval.ms）。

4、消费者客户端掉线，触发离组处理。

5、消费者群组的信息存储在哪里

存储在__consumer_offsets文件中，groupName.hashCode() % 50，获取配置文件的编号。

七、分区再均衡

1、功能

针对单个消费者群组，对群组内的消费者负责的分区进行重新分配。

1、假设【主题α】有三个分区，分别是①、②、③。

2、进来两个消费者A、B。A负责分区①，B负责分区②③。

3、又进来一个消费者C，再均衡监听器就把分区③分配给C。

4、消费者C掉线，再均衡监听器把分区③分配给A或者B。

2、Java代码验证分区再均衡

package com.allwe.client.reBalance;import lombok.Data;
import org.apache.kafka.clients.consumer.ConsumerRebalanceListener;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;import java.util.Collection;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;/*** 分区再均衡处理器** @Author: AllWe* @Date: 2024/10/17/8:05*/
@Data
public class ReBalanceHandler implements ConsumerRebalanceListener {// 记录每个分区的消费偏移量public final static ConcurrentHashMap<TopicPartition, Long> partitionOffsetMap = new ConcurrentHashMap<TopicPartition, Long>();private final Map<TopicPartition, OffsetAndMetadata> currOffsets;private final KafkaConsumer<String, String> consumer;public ReBalanceHandler(Map<TopicPartition, OffsetAndMetadata> currOffsets, KafkaConsumer<String, String> consumer) {this.currOffsets = currOffsets;this.consumer = consumer;}// 分区再均衡之前// 某一个消费者在让出分区之前，需要先将已消费的偏移量提交@Overridepublic void onPartitionsRevoked(Collection<TopicPartition> collection) {// 线程idfinal String id = Thread.currentThread().getId() + "";System.out.println(id + "-onPartitionsRevoked参数值为：" + collection);System.out.println(id + "-服务器准备分区再均衡，提交偏移量。当前偏移量为:" + currOffsets);//我们可以不使用consumer.commitSync(currOffsets);//提交偏移量到kafka,由我们自己维护*///开始事务//偏移量写入数据库System.out.println("分区偏移量表中:" + partitionOffsetMap);for (TopicPartition topicPartition : collection) {partitionOffsetMap.put(topicPartition, currOffsets.get(topicPartition).offset());}// 同步提交偏移量，等到成功后再往后执行consumer.commitSync(currOffsets);}// 分区再均衡之后// 新的消费者接管分区后，从上一次的偏移量开始消费@Overridepublic void onPartitionsAssigned(Collection<TopicPartition> collection) {// 线程idfinal String threadId = Thread.currentThread().getId() + "";System.out.println(threadId + "|-再均衡完成，onPartitionsAssigned参数值为：" + collection);System.out.println("分区偏移量表中：" + partitionOffsetMap);for (TopicPartition topicPartition : collection) {System.out.println(threadId + "-topicPartition" + topicPartition);// 取得接管分区之前的偏移量Long offset = partitionOffsetMap.get(topicPartition);if (offset == null) continue;consumer.seek(topicPartition, partitionOffsetMap.get(topicPartition));}}@Overridepublic void onPartitionsLost(Collection<TopicPartition> partitions) {ConsumerRebalanceListener.super.onPartitionsLost(partitions);}
}

package com.allwe.client.reBalance;import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;import java.time.Duration;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;/*** 线程安全消费者 - 测试demo** @Author: AllWe* @Date: 2024/09/27/12:19*/
@Data
@Slf4j
public class ConcurrentConsumer implements Runnable {/*** 消费者配置参数*/private Properties properties;/*** 群组id*/private String groupId;/*** 消费主题*/private String topicName;/*** 消费者实例*/private KafkaConsumer<String, String> consumer;/*** 记录分区消费者偏移量*/private final Map<TopicPartition, OffsetAndMetadata> currOffsets = new HashMap<>();public ConcurrentConsumer(Properties properties, String groupId, String topicName) {this.properties = properties;this.groupId = groupId;this.topicName = topicName;// 补充配置参数properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);// 创建消费者实例 - 每一个线程都创建自己的消费者,避免共享相同的消费者实例consumer = new KafkaConsumer<>(properties);// 配置消费主题 - 配置再均衡监听器consumer.subscribe(Collections.singleton(topicName), new ReBalanceHandler(currOffsets,consumer));}@Overridepublic void run() {try {String threadName = Thread.currentThread().getName();Integer offset = 0;while (true) {ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));for (ConsumerRecord<String, String> record : records) {StringBuilder stringBuilder = new StringBuilder(threadName).append("|-");stringBuilder.append("partition:").append(record.partition());stringBuilder.append(",offset:").append(record.offset());stringBuilder.append(",key:").append(record.key());stringBuilder.append(",value:").append(record.value());System.out.println(stringBuilder);offset++;currOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(offset, "no"));}}} finally {consumer.close();}}
}

自定义一个再均衡监听器，消费者在订阅接口中指定这个监听器，即可自动执行监听器的任务。

// 配置消费主题 - 配置再均衡监听器
consumer.subscribe(Collections.singleton(topicName), new ReBalanceHandler(currOffsets,consumer));

查看全文

http://www.xdnf.cn/news/10743.html

6月2日day43打卡

安全大模型的思考

每日算法 -【Swift 算法】查找字符串数组中的最长公共前缀

婚恋小程序直播系统框架搭建

VBA模拟进度条

飞书常用功能（留档）

Dockerfile 使用多阶段构建（build 阶段 → release 阶段）后端配置

从Java的JDK源码中学设计模式之装饰器模式

2021 RoboCom 世界机器人开发者大赛-高职组（复赛）解题报告 | 珂学家

C#学习12——预处理

当 AI 超越人类：从技术突破到文明拐点的 2025-2030 年全景展望

Manus AI与多语言手写识别的创新革命：从技术突破到行业赋能

第2章_Excel_知识点笔记

第十三章 Java基础-特殊处理

【iOS】多线程基础

ArrayList和LinkedList（深入源码加扩展）

Day-15【选择与循环】选择结构-if语句

Q：知识库-文档的搜索框逻辑是怎样的?

解决VS Code误报Java问题的终极方法

深入理解 Java 环境变量：从原理到实战配置指南

LangChain系列之LangChain4j集成Spring Bot

AI“实体化”革命：具身智能如何重构体育、工业与未来生活

Android 中的 DataBinding 详解

在图像分析算法部署中应对流行趋势的变化|文献速递-深度学习医疗AI最新文献

大模型赋能：金融智能革命中的特征工程新纪元

兼容老设备！EtherNet/IP转DeviceNet网关解决储能产线通讯难题

Celery 核心概念详解及示例

深入解析C++引用：从别名机制到函数特性实践

【语义分割专栏】2：U-net原理篇(由浅入深)

Docker 在 AI 开发中的实践：GPU 支持与深度学习环境的容器化