当前位置：首页 > news >正文

商汤InternLM发布最先进的开源多模态推理模型——Intern-S1

news 2025/7/27 7:02:36

在这里插入图片描述

简介

我们推出Intern-S1——这是迄今为止最先进的开源多模态推理模型。Intern-S1兼具强大的通用任务能力与顶尖的科学领域性能，足以媲美领先的闭源商业模型。

该模型基于2350亿参数的MoE语言架构与60亿参数的视觉编码器，并经过5万亿token多模态数据的持续预训练，其中包含超过2.5万亿科学领域token。这使得Intern-S1在保持优秀通用能力的同时，特别擅长解析化学结构、理解蛋白质序列、规划化合物合成路线等专业科学任务，成为真实科研场景中的得力助手。

特性

在语言与视觉推理基准测试中表现优异，尤其擅长科学类任务
基于5万亿token的超大规模数据集持续预训练，其中超50%为专业科学数据，具备深厚领域知识沉淀
动态分词器可原生解析分子式、蛋白质序列及地震信号等专业数据格式

性能表现

我们在通用数据集和科学数据集上对Intern-S1进行了多维度评估。下表展示了与当前主流视觉语言模型及大语言模型的性能对比结果。

Benchmarks	Intern-S1		InternVL3-78B	Qwen2.5-VL-72B	DS-R1-0528	Qwen3-235B-A22B	Kimi-K2-Instruct	Gemini-2.5 Pro	o3	Grok-4
Benchmarks	MMLU-Pro	83.5 ✅		73.0	72.1	83.4	82.2	82.7	86.0	85.0	85.9
MMMU	77.7 ✅		72.2	70.2	-	-	-	81.9	80.8	77.9
GPQA	77.3		49.9	49.0	80.6	71.1	77.8	83.8	83.3	87.5
MMStar	74.9 ✅		72.5	70.8	-	-	-	79.3	75.1	69.6
MathVista	81.5 👑		79.0	74.8	-	-	-	80.3	77.5	72.5
AIME2025	86.0		10.7	10.9	87.5	81.5	51.4	83.0	88.9	91.7
MathVision	62.5 ✅		43.1	38.1	-	-	-	73.0	67.7	67.3
IFEval	86.7		75.6	83.9	79.7	85.0	90.2	91.5	92.2	92.8
SFE	44.3 👑		36.2	30.5	-	-	-	43.0	37.7	31.2
Physics	44.0 ✅		23.1	15.7	-	-	-	40.0	47.9	42.8
SmolInstruct	51.0 👑		19.4	21.0	30.7	28.7	48.1	40.4	43.9	47.3
ChemBench	83.4 👑		61.3	61.6	75.6	75.8	75.3	82.8	81.6	83.3
MatBench	75.0 👑		49.3	51.5	57.7	52.1	61.7	61.7	61.6	67.9
MicroVQA	63.9 👑		59.1	53.0	-	-	-	63.1	58.3	59.5
ProteinLMBench	63.1		61.6	61.0	61.4	59.8	66.7	62.9	67.7	66.2
MSEarthMCQ	65.7 👑		57.2	37.6	-	-	-	59.9	61.0	58.0
XLRS-Bench	55.0 👑		49.3	50.9	-	-	-	45.2	43.6	45.4

注意：✅ 表示开源模型中的最佳性能，👑 表示所有模型中的最佳性能。

我们使用 OpenCompass 和 VLMEvalkit 来评估所有模型。

快速开始

采样参数

我们推荐使用以下超参数以确保更好的结果

top_p = 1.0
top_k = 50
min_p = 0.0
temperature = 0.7

Transformers

以下提供了演示代码，展示如何基于文本和多模态输入进行生成。

请使用transformers>=4.53.0以确保模型正常运行。

文本输入

from transformers import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "internlm/Intern-S1"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)messages = [{"role": "user","content": [{"type": "text", "text": "tell me about an interesting physical phenomenon."},],}
]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

图像输入

from transformers import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "internlm/Intern-S1"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)messages = [{"role": "user","content": [{"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},{"type": "text", "text": "Please describe the image explicitly."},],}
]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

视频输入

请确保已通过pip install decord命令安装decord视频解码库。

from transformers import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "internlm/Intern-S1"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)messages = [{"role": "user","content": [{"type": "video","url": "https://huggingface.co/datasets/hf-internal-testing/fixtures_videos/resolve/main/tennis.mp4",},{"type": "text", "text": "What type of shot is the man performing?"},],}]inputs = processor.apply_chat_template(messages,return_tensors="pt",add_generation_prompt=True,video_load_backend="decord",tokenize=True,return_dict=True,).to(model.device, dtype=torch.float16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

服务

您可以使用以下任一LLM推理框架来创建与OpenAI兼容的服务器：

lmdeploy(>=0.9.2)

lmdeploy serve api_server internlm/Intern-S1 --reasoning-parser intern-s1 --tool-call-parser intern-s1 --tp 8

vllm

Coming soon.

sglang

对Intern-S1的支持仍在开发中，请参考这个PR。

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \python3 -m sglang.launch_server \--model-path internlm/Intern-S1 \--trust-remote-code \--mem-fraction-static 0.85 \--tp 8 \--enable-multimodal \--grammar-backend none

ollama 本地部署:

# install ollama
curl -fsSL https://ollama.com/install.sh | sh
# fetch model
ollama pull internlm/interns1
# run model 
ollama run internlm/interns1
# then use openai client to call on http://localhost:11434/v1

高级用法

工具调用

如今，许多大语言模型（LLM）都具备工具调用功能，这是一种强大的能力，允许模型通过与外置工具和API交互来扩展其功能。这使得模型能够执行诸如获取实时信息、运行代码或调用其他应用程序中的函数等任务。

对开发者而言，一个关键优势是越来越多的开源LLM被设计为与OpenAI API兼容。这意味着你可以利用OpenAI库中熟悉的语法和结构，在这些开源模型上实现工具调用。因此，本教程演示的代码具有通用性——它不仅适用于OpenAI模型，也适用于任何遵循相同接口标准的模型。

为了说明其工作原理，我们将通过一个实际代码示例（基于lmdeploy api服务器）展示如何使用工具调用获取最新天气预报。

      
from openai import OpenAI
import jsondef get_current_temperature(location: str, unit: str = "celsius"):"""Get current temperature at a location.Args:location: The location to get the temperature for, in the format "City, State, Country".unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])Returns:the temperature, the location, and the unit in a dict"""return {"temperature": 26.1,"location": location,"unit": unit,}def get_temperature_date(location: str, date: str, unit: str = "celsius"):"""Get temperature at a location and date.Args:location: The location to get the temperature for, in the format "City, State, Country".date: The date to get the temperature for, in the format "Year-Month-Day".unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])Returns:the temperature, the location, the date and the unit in a dict"""return {"temperature": 25.9,"location": location,"date": date,"unit": unit,}def get_function_by_name(name):if name == "get_current_temperature":return get_current_temperatureif name == "get_temperature_date":return get_temperature_datetools = [{'type': 'function','function': {'name': 'get_current_temperature','description': 'Get current temperature at a location.','parameters': {'type': 'object','properties': {'location': {'type': 'string','description': 'The location to get the temperature for, in the format \'City, State, Country\'.'},'unit': {'type': 'string','enum': ['celsius','fahrenheit'],'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'}},'required': ['location']}}
}, {'type': 'function','function': {'name': 'get_temperature_date','description': 'Get temperature at a location and date.','parameters': {'type': 'object','properties': {'location': {'type': 'string','description': 'The location to get the temperature for, in the format \'City, State, Country\'.'},'date': {'type': 'string','description': 'The date to get the temperature for, in the format \'Year-Month-Day\'.'},'unit': {'type': 'string','enum': ['celsius','fahrenheit'],'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'}},'required': ['location','date']}}
}]messages = [{'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in San Francisco now? How about tomorrow?'}
]openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(model=model_name,messages=messages,max_tokens=32768,temperature=0.8,top_p=0.8,stream=False,extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),tools=tools)
print(response.choices[0].message)
messages.append(response.choices[0].message)for tool_call in response.choices[0].message.tool_calls:tool_call_args = json.loads(tool_call.function.arguments)tool_call_result = get_function_by_name(tool_call.function.name)(**tool_call_args)tool_call_result = json.dumps(tool_call_result, ensure_ascii=False)messages.append({'role': 'tool','name': tool_call.function.name,'content': tool_call_result,'tool_call_id': tool_call.id})response = client.chat.completions.create(model=model_name,messages=messages,temperature=0.8,top_p=0.8,stream=False,extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),tools=tools)
print(response.choices[0].message.content)

思维模式与非思维模式的切换

Intern-S1 默认启用思维模式，以增强模型的推理能力，从而生成更高质量的回答。如需关闭该功能，可在 tokenizer.apply_chat_template 中设置 enable_thinking=False

text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=False  # think mode indicator
)

通过LMDeploy部署Intern-S1模型时，您可以在请求中调整enable_thinking参数来动态控制思维模式。

from openai import OpenAI
import jsonmessages = [
{'role': 'user','content': 'who are you'
}, {'role': 'assistant','content': 'I am an AI'
}, {'role': 'user','content': 'AGI is?'
}]openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)
model_name = client.models.list().data[0].idresponse = client.chat.completions.create(model=model_name,messages=messages,temperature=0.7,top_p=0.8,max_tokens=2048,extra_body={"enable_thinking": False,}
)
print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))

对于vllm和sglang用户，请通过以下方式配置

extra_body={"chat_template_kwargs": {"enable_thinking": False}
}

查看全文

http://www.xdnf.cn/news/1189891.html

开源智能体框架（Agent Zero）

从稀疏数据（CSV）创建非常大的 GeoTIFF（和 WMS）

Linux选择题

cacti的命令执行和回显

Python应用:三局两胜制石头剪刀布游戏

人工智能发展历程

Linux应用开发基础知识——Framebuffer应用编程（六）

Linux用户

almalinux9.6-4070显卡-ollama-qwen2.5-7b

rt-thread 5.2.1 基于at-start-f437开发过程记录

Python 面向对象基础

力扣刷题（第九十九天）

Rust嵌入式开发实战

去除视频字幕 4 : 下一步，打算研究 Video Inpainting (视频修复):

Redis 缓存机制详解：原理、问题与最佳实践

Effective C++ 条款4：确定对象被使用前已先被初始化

编程与数学 03-002 计算机网络 06_网络层职责

设计模式十一：享元模式（Flyweight Pattern）

机器人仿真（2）Ubuntu24.04下RTX5090配置IsaacSim与IsaacLab

激光雷达-相机标定工具：支持普通相机和鱼眼相机的交互式标定

字节跳动扣子 Coze 宣布开源：采用 Apache 2.0 许可证，支持商用

简介