当前位置：首页 > web >正文

Qwen3-Reranker-0.6B 模型结构

web 2025/9/3 15:36:55

模型加载

import torch
from modelscope import AutoModel, AutoTokenizer, AutoModelForCausalLMtokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B").eval()

模型结构与配置

model

Qwen3ForCausalLM((model): Qwen3Model((embed_tokens): Embedding(151669, 1024)(layers): ModuleList((0-27): 28 x Qwen3DecoderLayer((self_attn): Qwen3Attention((q_proj): Linear(in_features=1024, out_features=2048, bias=False)(k_proj): Linear(in_features=1024, out_features=1024, bias=False)(v_proj): Linear(in_features=1024, out_features=1024, bias=False)(o_proj): Linear(in_features=2048, out_features=1024, bias=False)(q_norm): Qwen3RMSNorm((128,), eps=1e-06)(k_norm): Qwen3RMSNorm((128,), eps=1e-06))(mlp): Qwen3MLP((gate_proj): Linear(in_features=1024, out_features=3072, bias=False)(up_proj): Linear(in_features=1024, out_features=3072, bias=False)(down_proj): Linear(in_features=3072, out_features=1024, bias=False)(act_fn): SiLU())(input_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)(post_attention_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)))(norm): Qwen3RMSNorm((1024,), eps=1e-06)(rotary_emb): Qwen3RotaryEmbedding())(lm_head): Linear(in_features=1024, out_features=151669, bias=False)
)

model.config

Qwen3Config {"architectures": ["Qwen3ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151645,"head_dim": 128,"hidden_act": "silu","hidden_size": 1024,"initializer_range": 0.02,"intermediate_size": 3072,"layer_types": ["full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention"],"max_position_embeddings": 40960,"max_window_layers": 28,"model_type": "qwen3","num_attention_heads": 16,"num_hidden_layers": 28,"num_key_value_heads": 8,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000,"sliding_window": null,"tie_word_embeddings": true,"torch_dtype": "float32","transformers_version": "4.55.2","use_cache": true,"use_sliding_window": false,"vocab_size": 151669
}

模型使用

def format_instruction(instruction, query, doc):if instruction is None:instruction = 'Given a web search query, retrieve relevant passages that answer the query'output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)return outputdef process_inputs(pairs):inputs = tokenizer(pairs, padding=False, truncation='longest_first',return_attention_mask=False, max_length=max_length - len(prefix_tokens) - len(suffix_tokens))for i, ele in enumerate(inputs['input_ids']):inputs['input_ids'][i] = prefix_tokens + ele + suffix_tokensinputs = tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=max_length)for key in inputs:inputs[key] = inputs[key].to(model.device)return inputs@torch.no_grad()
def compute_logits(inputs, **kwargs):batch_scores = model(**inputs).logits[:, -1, :]true_vector = batch_scores[:, token_true_id]false_vector = batch_scores[:, token_false_id]batch_scores = torch.stack([false_vector, true_vector], dim=1)batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)scores = batch_scores[:, 1].exp().tolist()return scores# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B", torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda().eval()
token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")
max_length = 8192prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
prefix_tokens = tokenizer.encode(prefix, add_special_tokens=False)
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)task = 'Given a web search query, retrieve relevant passages that answer the query'queries = ["What is the capital of China?","Explain gravity",
]documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]# Tokenize the input texts
inputs = process_inputs(pairs)
scores = compute_logits(inputs)print("scores: ", scores)

scores:  [0.9994982481002808, 0.9993619322776794]

查看全文

http://www.xdnf.cn/news/19869.html

Shell脚本一键监控平台到期时间并钉钉告警推送指定人

自动化基本技术原理

嵌入式解谜日志-网络编程

Kafka面试精讲 Day 5：Broker集群管理与协调机制

基于SQLite的智能图片压缩存储系统：代码解析与实战应用

QuickUp-Ubuntu

FPGA AD7606串行驱动与并行驱动

【Flask + Vue3 前后端分离管理系统】

友思特案例 | 食品行业视觉检测案例集锦（三）

利用 Python 获取微店商品关键词搜索 API 接口数据的实战指南

利用飞算Java打造电商系统核心功能模块的设计与实现

硬件开发（1）—单片机（1）

atomic常用类方法

VR智慧楼宇技术：打造智能办公空间的卓越方案

深圳外贸峰会究竟藏着啥秘密？能让外贸人收获满满？

RHEL9源码编译MySQL8.0.40

图像加密安全传输--设备端视频流加密，手机端视频流解密，使用ChaCha20-Poly1305 进行系统分析

爬虫-----最全的爬虫库介绍（一篇文章让你成为爬虫大佬，爬你想爬）

windows系统离线安装Ollama、创建模型(不使用docker)、coze调用

Linux为什么不是RTOS

【Vue】前端 vue2项目搭建入门级（一）

IoT Power软件 -- 每次开启强制升级解决方法

Pandas Python数据处理库：高效处理Excel/CSV数据，支持分组统计与Matplotlib可视化联动

嵌入式C语言之链表冒泡排序

【PlayWright】自动化测试框架机制详解

Python应用——ffmpeg处理音视频的常见场景

GitLab,2025最新如何配置中的SSH key步骤

【高等数学】第十一章曲线积分与曲面积分——第一节对弧长的曲线积分

GaussDB 数据库架构师修炼(十九)-性能调优-长事务分析

leetcode-每日一题-3025. 人员站位的方案数 I-C语言

模型加载

模型结构与配置

模型使用

相关文章：