当前位置: 首页 > java >正文

通义智文开源QwenLong-L1: 迈向长上下文大推理模型的强化学习

在这里插入图片描述

🎉 动态

2025年5月26日: 🔥 我们正式发布🤗QwenLong-L1-32B——首个采用强化学习训练、专攻长文本推理的LRM模型。在七项长文本文档问答基准测试中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰LRM,达到与Claude-3.7-Sonnet-Thinking持平的水准,展现了当前最先进长文本推理模型的领先实力。

2025年5月26日: 🔥 我们同步开源🤗DocQA-RL-1.6K专项强化学习数据集,包含1,600道涵盖数学演算、逻辑推理和多跳推理等领域的文档问答题目。

📚 简介

在本研究中,我们提出了QwenLong-L1,这是一种新颖的强化学习(RL)框架,旨在促进LRM从短上下文熟练度向稳健的长上下文泛化过渡。在我们的初步实验中,我们展示了短上下文和长上下文推理RL训练动态之间的差异。

在这里插入图片描述

我们的框架通过强化学习训练中的渐进式上下文扩展,增强了短上下文语言推理模型(LRM)的性能。该框架包含三个核心组件:用于初始化稳健策略的预热监督微调(SFT)阶段;通过课程引导的强化学习阶段实现从短上下文到长上下文的稳定适应;以及难度感知的回溯采样机制,通过动态调整各阶段训练复杂度来激励策略探索。我们整合了包括GRPO和DAPO在内的最新强化学习算法,结合基于规则和基于模型的二元结果奖励混合函数,以平衡精确率与召回率。在策略优化过程中,通过战略性利用群体相对优势,引导LRM学习对实现稳健长上下文锚定和卓越推理能力至关重要的有效推理模式。

在这里插入图片描述

🎯 模型发布

我们发布了🤗 QwenLong-L1-32B,这是首个通过强化学习训练、专为长文本推理设计的长上下文语言推理模型。在七项长文本文档问答基准测试中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰语言推理模型,达到与Claude-3.7-Sonnet-Thinking相当的水准,展现出当前最先进语言推理模型中的领先性能。

以下是评估结果。

在这里插入图片描述

🛠️ 要求

# Create the conda environment
conda create -n qwenlongl1 python==3.10
conda activate qwenlongl1# Install requirements
pip3 install -r requirements.txt# Install verl
cd verl
pip3 install -e .# Install vLLM
pip3 install vllm==0.7.3 # Install flash-attn
pip3 install flash-attn --no-build-isolation

🚀 快速入门

以下是如何使用 🤗 Transformers 运行该模型:

from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""
context = "<YOUR_CONTEXT_HERE>" 
question = "<YOUR_QUESTION_HERE>"
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)

🗂️ 数据集

为了构建一个具有挑战性的可验证长文本推理强化学习数据集,我们开发了🤗DocQA-RL-1.6K,该数据集包含跨三个推理领域的1600个文档问答问题:

(1) 数学推理:我们使用DocMath数据集中的600道问题,这些问题要求对财务报告等长专业文档进行数值推理。对于DocMath数据集,我们从其验证集中每个子集抽取75%条目用于训练,25%用于评估;

(2) 逻辑推理:我们采用DeepSeek-R1合成了600道多选题,这些问题需要对我们精选的法律、金融、保险和生产领域真实文档进行逻辑分析;

(3) 多跳推理:我们从MultiHopRAG选取200个样本,从Musique选取200个样本,重点关注跨文档推理。

请下载以下数据集并放入./datasets/目录用于训练和评估。

强化学习训练数据:🤗DocQA-RL-1.6K

评估数据:🤗docmath、🤗frames、🤗longbench

💻 训练

我们为单阶段强化学习训练提供了基于DAPO的基础演示代码。

首先,我们应该启动一个本地验证器。

export CUDA_VISIBLE_DEVICES=0vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \--host 0.0.0.0 \--port 23547

然后,我们开始使用4个节点进行强化学习训练。

export PROJ_DIR="<YOUR_PROJ_DIR_HERE>"
export MASTER_IP="<YOUR_MASTER_IP_HERE>" # ray master ip
export NNODES=4 # total GPU nodes
export NODE_RANK=${RANK} # rank of current node
export PORT=6382
export WANDB_API_KEY="<YOUR_WANDB_API_KEY_HERE>"
export WANDB_PROJECT="QwenLong-L1"
export LLM_JUDGE=Y # 'Y': LLM JUDGE, 'N': RULE BASED
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
# verifier
export VERIFIER_PATH="Qwen/Qwen2.5-1.5B-Instruct"
export VERIFIER_HOST="<YOUR_VERIFIER_HOST_HERE>"
export VERIFIER_PORT="23547"ray_start_retry() {while true; doray start --address="${MASTER_IP}:${PORT}"if [ $? -eq 0 ]; thenbreakfiecho "Failed to connect to master, retrying in 5 seconds..."sleep 5done
}check_ray_status() {until ray status >/dev/null 2>&1; doecho "Waiting for Ray cluster to be ready..."sleep 5done
}if [ "$RANK" == "0" ]; thenecho "Starting HEAD node..."ray start --head --port=${PORT}check_ray_statusecho "Ray head node started successfully"elseecho "Starting WORKER node..."ray_start_retrycheck_ray_statusecho "Successfully joined Ray cluster"
fiif [ "$RANK" == "0" ]; thenbash ${PROJ_DIR}/scripts/rl_4nodes_dapo.sh 2>&1 | tee ${PROJ_DIR}/logs/rl_log_$(date +%Y%m%d_%H%M%S).txt &
elsesleep 30d
fiwait

实践演示

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigmodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"quantization_config = BitsAndBytesConfig(load_in_4bit=True)# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto",quantization_config=quantization_config,
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""# 填充上下文和问题
context = """
Renewable energy sources are crucial for addressing climate change and reducing dependence on fossil fuels. Solar power is one of the most abundant and widely available renewable energy sources. It converts sunlight directly into electricity using photovoltaic (PV) cells or indirectly through concentrated solar power (CSP) systems.Wind energy is another rapidly growing renewable source. Wind turbines capture the kinetic energy of moving air and convert it into electrical energy. Onshore wind farms are more common, but offshore wind farms are becoming increasingly popular due to stronger and more consistent wind resources.Hydroelectric power is generated by harnessing the energy of flowing water in rivers or dams. It is one of the oldest and most established renewable energy technologies, providing a reliable and flexible source of electricity.Biomass energy uses organic materials such as wood, agricultural waste, and dedicated energy crops to produce heat, electricity, or biofuels. It is considered renewable because the carbon dioxide released during combustion is offset by the carbon dioxide absorbed during the growth of the biomass feedstock.Geothermal energy taps into the Earth's internal heat to generate electricity or provide direct heating and cooling. It is a reliable and consistent energy source, particularly in regions with high geothermal activity.The transition to renewable energy is driven by several factors, including environmental concerns, energy security, and technological advancements. However, challenges remain, such as the intermittency of solar and wind power, high initial costs, and the need for energy storage solutions.Despite these challenges, the global renewable energy market is expanding rapidly. Many countries have set ambitious renewable energy targets, and investments in renewable energy technologies continue to grow. The International Renewable Energy Agency (IRENA) projects that renewable energy could account for up to 86% of global electricity generation by 2050 if current trends continue.
"""question = """
What are the main types of renewable energy sources mentioned in the text, and what are their respective advantages?
"""# 构建提示
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)
输出:
thinking content: Okay, let me try to figure this out. The question is asking for the main types of renewable energy sources mentioned in the text and their respective advantages. First, I need to go through the text again to identify each renewable energy source and note down their advantages.Starting with the first paragraph, it mentions solar power. The text says solar power uses photovoltaic cells or CSP systems. The advantage here is that it's abundant and widely available. Then, wind energy is next. The text talks about onshore and offshore wind farms. The advantage for wind is that offshore farms have stronger and more consistent winds, which I think makes them more reliable.Hydroelectric power is mentioned next. It's described as one of the oldest and most established, providing reliable and flexible electricity. So the advantage here is reliability and flexibility. Biomass energy uses organic materials and is considered renewable because the CO2 released is offset by the growth of the feedstock. The advantage here is that it's renewable in terms of carbon balance.Geothermal energy is next, using the Earth's internal heat. The advantage is that it's reliable and consistent, especially in areas with high geothermal activity. Wait, the question is about the main types and their advantages. Let me list them out:1. Solar power: Abundant and widely available.
2. Wind energy: Stronger and more consistent winds offshore.
3. Hydroelectric power: Reliable and flexible.
4. Biomass energy: Carbon neutrality (offset by growth).
5. Geothermal energy: Reliable and consistent.I think that's all the main types mentioned. The text also mentions challenges like intermittency for solar and wind, but the question is about advantages, so I should focus on the positive aspects each has. I need to make sure I didn't miss any. Let me check the text again.Yes, the text lists solar, wind, hydroelectric, biomass, and geothermal. Each has their specific advantages as I noted. So the answer should list each type with their respective advantages.
</think>
content: Therefore, the answer is Solar power (abundant and widely available), wind energy (stronger and consistent offshore winds), hydroelectric power (reliable and flexible), biomass energy (carbon neutrality through growth of feedstock), and geothermal energy (reliable and consistent internal heat).
http://www.xdnf.cn/news/9214.html

相关文章:

  • 浅解Vue 数据可视化开发建议与速度优化
  • 【华为云物联网】如何实现在 MQTT.fx 上模拟数据间隔上传一次,并按设定系数变动数据
  • HTML 表单与输入:基础语法到核心应用全解析
  • UBUNTU20.04 配置以QT界面程序代替系统界面启动,以及如何在tty模式下以linuxfb形式启动
  • Halcon 霍夫变换
  • 获取页面上当前激活(获得焦点)的元素
  • Frequent values/gcd区间
  • 行为型:中介者模式
  • C++11 中引入的`final` 关键字作用。
  • ImageMagick 是默认使用 CPU 来处理图像,也具备利用 GPU 加速的潜力
  • 数据库的事务(Transaction)
  • 路桥隧养护决策系统
  • atomic.Value 中存储的数据是否会被 GC
  • vue展示修改前后对比,并显示修改标注diff
  • 四足机器人环境监测系统相关问题
  • Mac 每日磁盘写入量异常高
  • AI如何颠覆财务预测?——用Python打造自动化智能分析系统
  • 基于Java,SpringBoot,Vue,UniAPP宠物洗护医疗喂养预约服务商城小程序管理系统设计
  • SQL Server 简介和与其它数据库对比
  • 联想小新笔记本电脑静电问题导致无法开机/充电的解决方案
  • 远程控制技术全面解析:找到适合你的最佳方案
  • 北京大学肖臻老师《区块链技术与应用》公开课:03-BTC-数据结构
  • 计算机网络的性能指标
  • 网络协议:[0-RTT 认证 ]
  • 在 LangGraph 中集成 Mem0 记忆系统教程
  • 【HarmonyOS5】Stage模型应用程序包结构详解
  • PDF处理控件Aspose.PDF教程:压缩 PDF 文档的完整指南
  • OpenCV CUDA模块图像处理------颜色空间处理之拜耳模式去马赛克函数demosaicing()
  • 网络套接字基础使用和概念
  • PaddleNLP 的文本分类项目