当前位置：首页 > backend >正文

【无GGuF版本】如何在Colab下T4运行gpt-oss 20B

backend 2025/9/6 7:33:31

在这里插入图片描述

OpenAI发布了gpt-oss 120B和20B版本。这两个模型均采用Apache 2.0许可证。

特别说明的是，gpt-oss-20b专为低延迟及本地化/专业化场景设计（210亿总参数，36亿活跃参数）。

由于模型采用原生MXFP4量化训练，使得20B版本即便在Google Colab等资源受限环境中也能轻松运行。

作者：Pedro与VB。

由于transformers对mxfp4的支持处于前沿技术阶段，我们需要使用最新版本的PyTorch和CUDA才能安装mxfp4的triton内核。

同时需要从源码安装transformers，并卸载torchvision和torchaudio以避免依赖冲突。

!pip install -q --upgrade torch accelerate kernels

!pip install -q git+https://github.com/huggingface/transformers triton==3.4 git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

!pip uninstall -q torchvision torchaudio -y

!pip list | grep -E "transformers|triton|torch|accelerate|kernels"

accelerate                            1.10.1
kernels                               0.9.0
sentence-transformers                 5.1.0
torch                                 2.8.0+cu126
torchao                               0.10.0
torchdata                             0.11.0
torchsummary                          1.5.1
torchtune                             0.6.1
transformers                          4.57.0.dev0
triton                                3.4.0
triton_kernels                        1.0.0

在Google Colab中从Hugging Face加载模型

我们从这里加载模型：openai/gpt-oss-20b

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, Mxfp4Configmodel_id = "openai/gpt-oss-20b"tokenizer = AutoTokenizer.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(config)quantization_config=Mxfp4Config.from_dict(config.quantization_config)
print(quantization_config)model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config,torch_dtype="auto",device_map="cuda",
)

import torch
def print_model_params(model: torch.nn.Module, extra_info="", f=None):# print the number of parameters in the modelmodel_million_params = sum(p.numel() for p in model.parameters()) / 1e6print(model, file=f)print(f"{extra_info} {model_million_params} M parameters", file=f)

print_model_params(model,model_id)
``
`GptOssForCausalLM((model): GptOssModel((embed_tokens): Embedding(201088, 2880, padding_idx=199999)(layers): ModuleList((0-23): 24 x GptOssDecoderLayer((self_attn): GptOssAttention((q_proj): Linear(in_features=2880, out_features=4096, bias=True)(k_proj): Linear(in_features=2880, out_features=512, bias=True)(v_proj): Linear(in_features=2880, out_features=512, bias=True)(o_proj): Linear(in_features=4096, out_features=2880, bias=True))(mlp): GptOssMLP((router): GptOssTopKRouter()(experts): Mxfp4GptOssExperts())(input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)(post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)))(norm): GptOssRMSNorm((2880,), eps=1e-05)(rotary_emb): GptOssRotaryEmbedding())(lm_head): Linear(in_features=2880, out_features=201088, bias=False))openai/gpt-oss-20b 1804.459584 M parameters## 设置消息/聊天您可以提供一个可选的系统提示或直接输入内容。```python
messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "What is the weather like in Madrid?"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))

<|channel|>analysis<|message|>We have a system that says "Always respond in riddles". The user asks: "What is the weather like in Madrid?" We need to answer in a riddle style. Probably we need to incorporate a riddle that hints at the weather. Possibly referencing that Madrid has warm days, cooler nights, sometimes unpredictable. The question is to describe the weather, but we must answer in riddles.We need to craft a riddle that describes the weather. We could say something like: "In a place where the summer sun is fierce, yet winter brings a chill, what doth the sky declare? The answer: It's a tapestry of sun, clouds, and gentle rain." But it's in a riddle form, so perhaps like: "A palace of sunshine, the winter brings a chill across its marble steps." But we have to respond in riddles. We can provide a riddle that when solved the answer is "varied: sunny, dry, occasional showers." We can hint at temperatures.Given the requirement, we must answer with a riddle. Probably no other text. So let's produce a riddle. I'll produce maybe an extended riddle. It's okay.We should keep it in Spanish? The user is not specifying a language. The question is in English. The answer in English but in riddle form.Answer: "In the heart of Spain’s southern sun, a desert’s heart lies in the city, yet it drinks rain at intervals, ..." Something. Let's craft.Possible riddle:"Morning bright as fire, midday relentless, night cool as a sigh; the gods of clouds wander, sometimes weeping, sometimes staying away; which realm in Spain does this dance with sun?" The answer: Madrid.But need to describe weather in Madrid.Maybe:"I’m a city with heat that can scorch your thoughts, yet with winter my nights feel like glass. I’m known for blue skies, yet sometimes I cry to thin drops. The clouds come and go like dancers in a waltz. What am I?" The answer: Madrid's weather.Better:"Where the summer sun is a silvered sword, the winter wind gives breathless chill. The clouds are silvered ghosts that sometimes fall; a city that wears both sun and mist. Tell me, what city is this?" Answer: Madrid.However, maybe simpler: "I am a city where summer suns burn, winter nights chill, and clouds sometimes pour.

指定推理力度

messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "Explain why the meaning of life is 42", "reasoning_effort": "high"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))

<|channel|>analysis<|message|>The user asks: "Explain why the meaning of life is 42". The instruction: "Always respond in riddles". So we must respond in riddles. The user likely expects an explanation like referencing Hitchhiker's guide: 42 is a random number. But we have to respond with riddles. That may be tricky: we might produce riddle-like explanation about the number 42. We need to produce something that "explains" why 42 is the meaning of life but in riddle form.We need to craft a riddle that describes the reasoning: The answer to everything is 42, because it's a puzzle, because it's a random number, also can be expressed as 6 times 7, 2-digit, and it contains hidden meaning like the sum of ASCII codes maybe.We need to incorporate the explanation: it's an arbitrary number but resonates because of its prime factorization, its location on the periodic table, its relation to Earth, or in the book, the number 42 was given as answer by Deep Thought after 7.5 million years of computation. But we are to answer as riddle.Probably: "What is a number that appears in a science-fiction book, which is the answer to life, the Universe, and everything, and why?".But we are told to respond in riddles, not straightforward explanation.Thus we will write a riddle explaining that 42 is chosen because it's 6 * 7, it's the number of something, it is the number of letters: 'life' etc, or maybe referencing that 42 is the 'Answer' to an existential question but the question itself remains unknown. It's like we ask: "What is the number that is the answer to the big question but which big question?".We might produce a riddle:"Take six from seven to make something; multiply for the meaning; the result is 'the answer'."But is that a riddle? We can present something like:"Two humble companions cross, one half and one full; together they become the number that guides your quest, but the question remains unseen. Who are they and what is their sum?" But that is a riddle. Maybe we need to explain why 42 is the meaning of life.Alternatively, we might produce a longer riddle that explains, e.g., we can incorporate that 42 is the only number that ... but we must incorporate explanation.We can

查看全文

http://www.xdnf.cn/news/20105.html