当前位置: 首页 > news >正文

Qwen3-30B-A3B 模型解析

模型介绍

Qwen3-30B-A3B 是阿里巴巴通义千问团队推出的 Qwen3 系列中的一款旗舰级专家混合(Mixture-of-Experts, MoE)大语言模型。它代表了当前开源模型在架构创新与性能表现上的最新高度,旨在为复杂推理、高效对话及智能代理任务提供一个强大而高效的基石。

  1. 双模式无缝切换(核心突破)
    Qwen3-30B-A3B 独特地内嵌了“思考模式”和“非思考模式”的智能切换能力。

    • 思考模式(Think Mode):在此模式下,模型会进行更深思熟虑的链式推理,尤其擅长解决复杂的数学问题、代码生成与调试、以及需要多步逻辑推理的任务,性能超越前代专门化的推理模型。
    • 非思考模式(Non-Think Mode):在此模式下,模型响应迅速流畅,专注于提供高效、自然的通用对话体验,在创意写作、多轮闲聊和指令遵循方面表现卓越。
  2. 卓越的人类偏好对齐
    经过精心设计的后训练(Post-training),模型在人类偏好对齐方面表现突出。它能够生成更自然、富有创造力且贴合用户意图的文本,在角色扮演、创意写作和沉浸式对话中提供出色的用户体验。

  3. 强大的智能代理(Agent)能力
    无论是思考模式还是非思考模式,模型都展现出顶尖的工具调用与任务规划能力。它能精准地理解用户指令,自主选择并调用外部工具(如API、计算器、搜索引擎),完成复杂的、多步骤的自动化任务,在开源智能体领域达到了领先水平。

  4. 全面的多语言支持
    模型在100多种语言和方言上进行了深度优化,不仅具备优秀的理解和生成能力,更在跨语言翻译和多语言指令遵循方面表现强劲,真正服务于全球化的应用场景。

模型性能

在这里插入图片描述

模型加载

from modelscope import AutoModelForCausalLM, AutoTokenizermodel_name = "Qwen/Qwen3-30B-A3B"# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)
/home/six/Zhou/test_source/gpt-oss-unsloth/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.htmlfrom .autonotebook import tqdm as notebook_tqdmDownloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B
Downloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3BLoading checkpoint shards: 100%|██████████| 16/16 [00:51<00:00,  3.24s/it]

模型结构

model
Qwen3MoeForCausalLM((model): Qwen3MoeModel((embed_tokens): Embedding(151936, 2048)(layers): ModuleList((0-47): 48 x Qwen3MoeDecoderLayer((self_attn): Qwen3MoeAttention((q_proj): Linear(in_features=2048, out_features=4096, bias=False)(k_proj): Linear(in_features=2048, out_features=512, bias=False)(v_proj): Linear(in_features=2048, out_features=512, bias=False)(o_proj): Linear(in_features=4096, out_features=2048, bias=False)(q_norm): Qwen3MoeRMSNorm((128,), eps=1e-06)(k_norm): Qwen3MoeRMSNorm((128,), eps=1e-06))(mlp): Qwen3MoeSparseMoeBlock((gate): Linear(in_features=2048, out_features=128, bias=False)(experts): ModuleList((0-127): 128 x Qwen3MoeMLP((gate_proj): Linear(in_features=2048, out_features=768, bias=False)(up_proj): Linear(in_features=2048, out_features=768, bias=False)(down_proj): Linear(in_features=768, out_features=2048, bias=False)(act_fn): SiLU())))(input_layernorm): Qwen3MoeRMSNorm((2048,), eps=1e-06)(post_attention_layernorm): Qwen3MoeRMSNorm((2048,), eps=1e-06)))(norm): Qwen3MoeRMSNorm((2048,), eps=1e-06)(rotary_emb): Qwen3MoeRotaryEmbedding())(lm_head): Linear(in_features=2048, out_features=151936, bias=False)
)

请添加图片描述

模型配置

model.config
Qwen3MoeConfig {"architectures": ["Qwen3MoeForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 151643,"decoder_sparse_step": 1,"eos_token_id": 151645,"head_dim": 128,"hidden_act": "silu","hidden_size": 2048,"initializer_range": 0.02,"intermediate_size": 6144,"max_position_embeddings": 40960,"max_window_layers": 48,"mlp_only_layers": [],"model_type": "qwen3_moe","moe_intermediate_size": 768,"norm_topk_prob": true,"num_attention_heads": 32,"num_experts": 128,"num_experts_per_tok": 8,"num_hidden_layers": 48,"num_key_value_heads": 4,"output_router_logits": false,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"router_aux_loss_coef": 0.001,"sliding_window": null,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.2","use_cache": true,"use_sliding_window": false,"vocab_size": 151936
}

模型使用

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151668 (</think>)index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)
thinking content: <think>
Okay, the user is asking for a short introduction to large language models. Let me start by defining what they are. I should mention that they're AI systems trained on vast amounts of text data. Then, I need to explain their purpose, like generating human-like text and understanding language.I should highlight key features such as their size, with billions of parameters, and how they use deep learning, maybe mention transformers. It's important to note their applications—like chatbots, content creation, and data analysis. Also, touch on their capabilities like multilingual support and reasoning. But I shouldn't forget to mention the challenges, like computational costs and potential biases. Keep it concise but informative. Let me structure that into a few paragraphs without getting too technical. Make sure it's easy to understand for someone who might not be familiar with AI terms.
</think>
content: A **large language model (LLM)** is an advanced artificial intelligence system designed to understand, generate, and interact with human language. Trained on vast amounts of text data from the internet, books, and other sources, these models learn patterns, grammar, and context to produce coherent responses, answer questions, write essays, code, or even engage in conversations. Powered by deep learning architectures like transformers, LLMs can handle complex tasks such as translation, summarization, and reasoning. Their scale—often involving billions of parameters—enables them to capture nuanced linguistic structures and adapt to diverse topics. While they excel at mimicking human-like text, they lack true understanding and rely on statistical patterns. LLMs are widely used in applications like virtual assistants, content creation, and customer service, but they also raise ethical and technical challenges, such as bias, misinformation, and computational costs.
http://www.xdnf.cn/news/1438309.html

相关文章:

  • 【C++】迭代器详解与失效机制
  • # Shell 文本处理三剑客:awk、sed 与常用小工具详解
  • 【前端面试题✨】Vue篇(一)
  • Linux网络序列化与反序列化(6)
  • Linux文本处理——awk
  • 飞牛OS Nas,SSH安装宝塔后,smb文件不能共享问题
  • STM32——串口
  • 2025年- H109-Lc1493. 删掉一个元素以后全为 1 的最长子数组(双指针)--Java版
  • 别再误会了!Redis 6.0 的多线程,和你想象的完全不一样
  • 从入门到实战:Linux sed命令全攻略,文本处理效率翻倍
  • 【机器学习深度学习】向量模型与重排序模型:RAG 的双引擎解析
  • 使用DataLoader加载本地数据 食物分类案例
  • GitHub Classroom:编程教育的高效协作方案
  • MySQL查询limit 0,100和limit 10000000,100有什么区别?
  • Shell编程从入门到实践:基础语法与正则表达式文本处理指南
  • 如何在部署模型前训练出完美的AI提示词
  • C# 中这几个主流的 ORM(对象关系映射器):Dapper、Entity Framework (EF) Core 和 EF 6
  • 11.《简单的路由重分布基础知识探秘》
  • 硬件:51单片机
  • 为什么需要锁——多线程的数据竞争是怎么引发错误的
  • 系统架构——过度设计
  • YOLOv8改进有效系列大全:从卷积到检测头的百种创新机制解析
  • 【C++上岸】C++常见面试题目--数据结构篇(第十七期)
  • 02-Media-2-ai_rtsp.py 人脸识别加网络画面RTSP推流演示
  • 51单片机(单片机基础,LED,数码管)
  • Spring Boot手写10万敏感词检查程序
  • UCIE Specification详解(十三)
  • C++ 条件变量,互斥锁
  • 【c++】多态+RTTI (运行时的类型识别信息)
  • 深度学习篇---DenseNet