当前位置：首页 > news >正文

Qwen3-Embedding-0.6B 模型结构

news 2025/9/3 14:11:25

模型介绍

Qwen3-Embedding-0.6B 是阿里巴巴通义千问团队基于Qwen3基础模型开发的文本嵌入模型，专门为文本表示、检索和重排序任务而设计。该模型在保持高效计算的同时，提供了卓越的多语言文本理解能力。

模型性能

在这里插入图片描述

模型加载

import torch
import torch.nn.functional as Ffrom torch import Tensor
from modelscope import AutoTokenizer, AutoModeltokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')

模型结构

model

Qwen3Model((embed_tokens): Embedding(151669, 1024)(layers): ModuleList((0-27): 28 x Qwen3DecoderLayer((self_attn): Qwen3Attention((q_proj): Linear(in_features=1024, out_features=2048, bias=False)(k_proj): Linear(in_features=1024, out_features=1024, bias=False)(v_proj): Linear(in_features=1024, out_features=1024, bias=False)(o_proj): Linear(in_features=2048, out_features=1024, bias=False)(q_norm): Qwen3RMSNorm((128,), eps=1e-06)(k_norm): Qwen3RMSNorm((128,), eps=1e-06))(mlp): Qwen3MLP((gate_proj): Linear(in_features=1024, out_features=3072, bias=False)(up_proj): Linear(in_features=1024, out_features=3072, bias=False)(down_proj): Linear(in_features=3072, out_features=1024, bias=False)(act_fn): SiLU())(input_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)(post_attention_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)))(norm): Qwen3RMSNorm((1024,), eps=1e-06)(rotary_emb): Qwen3RotaryEmbedding()
)

模型配置

model.config

Qwen3Config {"architectures": ["Qwen3ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151643,"head_dim": 128,"hidden_act": "silu","hidden_size": 1024,"initializer_range": 0.02,"intermediate_size": 3072,"layer_types": ["full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention"],"max_position_embeddings": 32768,"max_window_layers": 28,"model_type": "qwen3","num_attention_heads": 16,"num_hidden_layers": 28,"num_key_value_heads": 8,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000,"sliding_window": null,"tie_word_embeddings": true,"torch_dtype": "float32","transformers_version": "4.55.2","use_cache": true,"use_sliding_window": false,"vocab_size": 151669
}

模型使用

def last_token_pool(last_hidden_states: Tensor,attention_mask: Tensor) -> Tensor:left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])if left_padding:return last_hidden_states[:, -1]else:sequence_lengths = attention_mask.sum(dim=1) - 1batch_size = last_hidden_states.shape[0]return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]def get_detailed_instruct(task_description: str, query: str) -> str:return f'Instruct: {task_description}\nQuery:{query}'# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'queries = [get_detailed_instruct(task, 'What is the capital of China?'),get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documentsmax_length = 8192# Tokenize the input texts
batch_dict = tokenizer(input_texts,padding=True,truncation=True,max_length=max_length,return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())

[[0.7645569443702698, 0.14142519235610962], [0.1354975402355194, 0.5999550819396973]]

查看全文

http://www.xdnf.cn/news/1436203.html

开源模型应用落地-模型上下文协议（MCP）-构建AI智能体的“万能插座”-“mcp-use”高级用法（十三）

3.2-C++基础组件

重新审视信任基石：公网IP证书对网络安全生态的影响

【Go语言入门教程】 Go语言的起源与技术特点：从诞生到现代编程利器（一）

Cursor 教我学 Python

英伟达Jetson Orin NX-YOLOv8s目标检测模型耗时分析

深度集成Dify API：企业级RAG知识库管理平台解决方案

ts，js文件中使用 h函数渲染组件

美国服务器连接速度变慢时应该着重做哪些检查？

双Token实战：从无感刷新到安全防护，完整流程+代码解析

PostgreSQL（1） FETCH用法

【MySQL体系结构详解：一条SQL查询的旅程】

《一篇拿下！C++：类和对象（中）构造函数与析构函数》

Java 21 虚拟线程 + 分布式调度深度实战：从原理到落地，大促日志同步效率提升 367%

基于SpringBoot的校园资料分享平台

Mysql数据库基础（上）

第1章：VisualVM 简介与安装

东土科技战略升级：成立半导体子公司，赋能国产半导体智能化升级

基于 HTML、CSS 和 JavaScript 的智能图像锐化系统

HTML第五课：求职登记表

【实时Linux实战系列】基于实时Linux的农业自动化系统开发

模型介绍

模型性能

模型加载

模型结构

模型配置

模型使用

相关文章：