当前位置: 首页 > news >正文

Qwen3-Embedding-0.6B 模型结构

模型介绍

Qwen3-Embedding-0.6B 是阿里巴巴通义千问团队基于Qwen3基础模型开发的文本嵌入模型,专门为文本表示、检索和重排序任务而设计。该模型在保持高效计算的同时,提供了卓越的多语言文本理解能力。

模型性能

在这里插入图片描述


在这里插入图片描述

模型加载

import torch
import torch.nn.functional as Ffrom torch import Tensor
from modelscope import AutoTokenizer, AutoModeltokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')

模型结构

model 
Qwen3Model((embed_tokens): Embedding(151669, 1024)(layers): ModuleList((0-27): 28 x Qwen3DecoderLayer((self_attn): Qwen3Attention((q_proj): Linear(in_features=1024, out_features=2048, bias=False)(k_proj): Linear(in_features=1024, out_features=1024, bias=False)(v_proj): Linear(in_features=1024, out_features=1024, bias=False)(o_proj): Linear(in_features=2048, out_features=1024, bias=False)(q_norm): Qwen3RMSNorm((128,), eps=1e-06)(k_norm): Qwen3RMSNorm((128,), eps=1e-06))(mlp): Qwen3MLP((gate_proj): Linear(in_features=1024, out_features=3072, bias=False)(up_proj): Linear(in_features=1024, out_features=3072, bias=False)(down_proj): Linear(in_features=3072, out_features=1024, bias=False)(act_fn): SiLU())(input_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)(post_attention_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)))(norm): Qwen3RMSNorm((1024,), eps=1e-06)(rotary_emb): Qwen3RotaryEmbedding()
)

模型配置

model.config
Qwen3Config {"architectures": ["Qwen3ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151643,"head_dim": 128,"hidden_act": "silu","hidden_size": 1024,"initializer_range": 0.02,"intermediate_size": 3072,"layer_types": ["full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention"],"max_position_embeddings": 32768,"max_window_layers": 28,"model_type": "qwen3","num_attention_heads": 16,"num_hidden_layers": 28,"num_key_value_heads": 8,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000,"sliding_window": null,"tie_word_embeddings": true,"torch_dtype": "float32","transformers_version": "4.55.2","use_cache": true,"use_sliding_window": false,"vocab_size": 151669
}

模型使用

def last_token_pool(last_hidden_states: Tensor,attention_mask: Tensor) -> Tensor:left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])if left_padding:return last_hidden_states[:, -1]else:sequence_lengths = attention_mask.sum(dim=1) - 1batch_size = last_hidden_states.shape[0]return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]def get_detailed_instruct(task_description: str, query: str) -> str:return f'Instruct: {task_description}\nQuery:{query}'# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'queries = [get_detailed_instruct(task, 'What is the capital of China?'),get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documentsmax_length = 8192# Tokenize the input texts
batch_dict = tokenizer(input_texts,padding=True,truncation=True,max_length=max_length,return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
[[0.7645569443702698, 0.14142519235610962], [0.1354975402355194, 0.5999550819396973]]
http://www.xdnf.cn/news/1436203.html

相关文章:

  • Go结构体详解:核心概念与实战技巧
  • Redis-底层数据结构篇
  • MySQL-表的约束(上)
  • 开发中使用——鸿蒙本地存储之收藏功能
  • LLM 能不能发展为 AGI?
  • 开源模型应用落地-模型上下文协议(MCP)-构建AI智能体的“万能插座”-“mcp-use”高级用法(十三)
  • 3.2-C++基础组件
  • 重新审视信任基石:公网IP证书对网络安全生态的影响
  • 【Go语言入门教程】 Go语言的起源与技术特点:从诞生到现代编程利器(一)
  • Cursor 教我学 Python
  • 英伟达Jetson Orin NX-YOLOv8s目标检测模型耗时分析
  • 深度集成Dify API:企业级RAG知识库管理平台解决方案
  • ts,js文件中使用 h函数渲染组件
  • 美国服务器连接速度变慢时应该着重做哪些检查?
  • 双Token实战:从无感刷新到安全防护,完整流程+代码解析
  • PostgreSQL(1) FETCH用法
  • 【MySQL体系结构详解:一条SQL查询的旅程】
  • 《一篇拿下!C++:类和对象(中)构造函数与析构函数》
  • Java 21 虚拟线程 + 分布式调度深度实战:从原理到落地,大促日志同步效率提升 367%
  • 基于SpringBoot的校园资料分享平台
  • Mysql数据库基础(上)
  • 第1章:VisualVM 简介与安装
  • 东土科技战略升级:成立半导体子公司,赋能国产半导体智能化升级
  • 基于 HTML、CSS 和 JavaScript 的智能图像锐化系统
  • HTML第五课:求职登记表
  • 【实时Linux实战系列】基于实时Linux的农业自动化系统开发
  • C++ numeric库简介与使用指南
  • 项目解析:技术实现与面试高频问题
  • Linux - 进程切换
  • Git在idea中的实战使用经验(一)