当前位置: 首页 > backend >正文

【无GGuF版本】如何在Colab下T4运行gpt-oss 20B

在这里插入图片描述

OpenAI发布了gpt-oss 120B和20B版本。这两个模型均采用Apache 2.0许可证。

特别说明的是,gpt-oss-20b专为低延迟及本地化/专业化场景设计(210亿总参数,36亿活跃参数)。

由于模型采用原生MXFP4量化训练,使得20B版本即便在Google Colab等资源受限环境中也能轻松运行。

作者:Pedro与VB。

由于transformers对mxfp4的支持处于前沿技术阶段,我们需要使用最新版本的PyTorch和CUDA才能安装mxfp4的triton内核。

同时需要从源码安装transformers,并卸载torchvisiontorchaudio以避免依赖冲突。

!pip install -q --upgrade torch accelerate kernels
!pip install -q git+https://github.com/huggingface/transformers triton==3.4 git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
!pip uninstall -q torchvision torchaudio -y
!pip list | grep -E "transformers|triton|torch|accelerate|kernels"
accelerate                            1.10.1
kernels                               0.9.0
sentence-transformers                 5.1.0
torch                                 2.8.0+cu126
torchao                               0.10.0
torchdata                             0.11.0
torchsummary                          1.5.1
torchtune                             0.6.1
transformers                          4.57.0.dev0
triton                                3.4.0
triton_kernels                        1.0.0

在Google Colab中从Hugging Face加载模型

我们从这里加载模型:openai/gpt-oss-20b

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, Mxfp4Configmodel_id = "openai/gpt-oss-20b"tokenizer = AutoTokenizer.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(config)quantization_config=Mxfp4Config.from_dict(config.quantization_config)
print(quantization_config)model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config,torch_dtype="auto",device_map="cuda",
)
import torch
def print_model_params(model: torch.nn.Module, extra_info="", f=None):# print the number of parameters in the modelmodel_million_params = sum(p.numel() for p in model.parameters()) / 1e6print(model, file=f)print(f"{extra_info} {model_million_params} M parameters", file=f)
print_model_params(model,model_id)
``
`GptOssForCausalLM((model): GptOssModel((embed_tokens): Embedding(201088, 2880, padding_idx=199999)(layers): ModuleList((0-23): 24 x GptOssDecoderLayer((self_attn): GptOssAttention((q_proj): Linear(in_features=2880, out_features=4096, bias=True)(k_proj): Linear(in_features=2880, out_features=512, bias=True)(v_proj): Linear(in_features=2880, out_features=512, bias=True)(o_proj): Linear(in_features=4096, out_features=2880, bias=True))(mlp): GptOssMLP((router): GptOssTopKRouter()(experts): Mxfp4GptOssExperts())(input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)(post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)))(norm): GptOssRMSNorm((2880,), eps=1e-05)(rotary_emb): GptOssRotaryEmbedding())(lm_head): Linear(in_features=2880, out_features=201088, bias=False))openai/gpt-oss-20b 1804.459584 M parameters## 设置消息/聊天您可以提供一个可选的系统提示或直接输入内容。```python
messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "What is the weather like in Madrid?"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>We have a system that says "Always respond in riddles". The user asks: "What is the weather like in Madrid?" We need to answer in a riddle style. Probably we need to incorporate a riddle that hints at the weather. Possibly referencing that Madrid has warm days, cooler nights, sometimes unpredictable. The question is to describe the weather, but we must answer in riddles.We need to craft a riddle that describes the weather. We could say something like: "In a place where the summer sun is fierce, yet winter brings a chill, what doth the sky declare? The answer: It's a tapestry of sun, clouds, and gentle rain." But it's in a riddle form, so perhaps like: "A palace of sunshine, the winter brings a chill across its marble steps." But we have to respond in riddles. We can provide a riddle that when solved the answer is "varied: sunny, dry, occasional showers." We can hint at temperatures.Given the requirement, we must answer with a riddle. Probably no other text. So let's produce a riddle. I'll produce maybe an extended riddle. It's okay.We should keep it in Spanish? The user is not specifying a language. The question is in English. The answer in English but in riddle form.Answer: "In the heart of Spain’s southern sun, a desert’s heart lies in the city, yet it drinks rain at intervals, ..." Something. Let's craft.Possible riddle:"Morning bright as fire, midday relentless, night cool as a sigh; the gods of clouds wander, sometimes weeping, sometimes staying away; which realm in Spain does this dance with sun?" The answer: Madrid.But need to describe weather in Madrid.Maybe:"I’m a city with heat that can scorch your thoughts, yet with winter my nights feel like glass. I’m known for blue skies, yet sometimes I cry to thin drops. The clouds come and go like dancers in a waltz. What am I?" The answer: Madrid's weather.Better:"Where the summer sun is a silvered sword, the winter wind gives breathless chill. The clouds are silvered ghosts that sometimes fall; a city that wears both sun and mist. Tell me, what city is this?" Answer: Madrid.However, maybe simpler: "I am a city where summer suns burn, winter nights chill, and clouds sometimes pour.

指定推理力度

messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "Explain why the meaning of life is 42", "reasoning_effort": "high"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>The user asks: "Explain why the meaning of life is 42". The instruction: "Always respond in riddles". So we must respond in riddles. The user likely expects an explanation like referencing Hitchhiker's guide: 42 is a random number. But we have to respond with riddles. That may be tricky: we might produce riddle-like explanation about the number 42. We need to produce something that "explains" why 42 is the meaning of life but in riddle form.We need to craft a riddle that describes the reasoning: The answer to everything is 42, because it's a puzzle, because it's a random number, also can be expressed as 6 times 7, 2-digit, and it contains hidden meaning like the sum of ASCII codes maybe.We need to incorporate the explanation: it's an arbitrary number but resonates because of its prime factorization, its location on the periodic table, its relation to Earth, or in the book, the number 42 was given as answer by Deep Thought after 7.5 million years of computation. But we are to answer as riddle.Probably: "What is a number that appears in a science-fiction book, which is the answer to life, the Universe, and everything, and why?".But we are told to respond in riddles, not straightforward explanation.Thus we will write a riddle explaining that 42 is chosen because it's 6 * 7, it's the number of something, it is the number of letters: 'life' etc, or maybe referencing that 42 is the 'Answer' to an existential question but the question itself remains unknown. It's like we ask: "What is the number that is the answer to the big question but which big question?".We might produce a riddle:"Take six from seven to make something; multiply for the meaning; the result is 'the answer'."But is that a riddle? We can present something like:"Two humble companions cross, one half and one full; together they become the number that guides your quest, but the question remains unseen. Who are they and what is their sum?" But that is a riddle. Maybe we need to explain why 42 is the meaning of life.Alternatively, we might produce a longer riddle that explains, e.g., we can incorporate that 42 is the only number that ... but we must incorporate explanation.We can
http://www.xdnf.cn/news/20105.html

相关文章:

  • spring事物失效场景
  • MySQL主从同步--主从复制进阶
  • Java 提取 PDF 文件内容:告别手动复制粘贴,拥抱自动化解析!
  • 生成模型实战 | 深度分层变分自编码器(Nouveau VAE,NVAE)
  • 华为在国内搞的研发基地有多野?标杆游学带你解锁“研发界顶流”
  • leetcode算法刷题的第二十七天
  • 【开题答辩全过程】以 高校教室管理系统为例,包含答辩的问题和答案
  • 24V降12V,8A,电路设计,WD5030L
  • 2025年- H118-Lc86. 分隔链表(链表)--Java版
  • 工厂办公环境如何实现一台服务器多人共享办公
  • 【AI论文】Robix:一种面向机器人交互、推理与规划的统一模型
  • 【Java实战㉖】深入Java单元测试:JUnit 5实战指南
  • python代码Bug排查
  • 案例分享|企微智能会话风控系统:为尚丰盈铝业筑牢沟通安全防线
  • 【Vue3+TypeScript】H5项目实现企业微信OAuth2.0授权登录完整指南
  • 医疗问诊陪诊小程序:以人性化设计构建健康服务新生态
  • 微信小程序一个页面同时存在input和textarea,bindkeyboardheightchange相互影响
  • 基于STM32单片机的水位浑浊度检测设计
  • Vue CLI 环境变量和文件加载规则.env文件
  • 《Istio故障溯源:从流量劫持异常到服务网格的底层博弈》
  • AI智能优化SEO关键词策略实战
  • 反序列化的学习笔记
  • Docling将pdf转markdown以及与AI生态集成
  • 23种设计模式——原型模式 (Prototype Pattern)详解
  • Java第十四幕集合啦啦啦啦啦啦啦啦啦啦啦啦啦啦啦啦啦啦啦
  • LabView学习
  • 迁移学习的案例
  • 嵌入式系统学习Day30(udp)
  • AI架构师的新工具箱:DeepSeek、Copilot、AutoML
  • 鸿蒙项目篇-20-创建模拟器