当前位置: 首页 > backend >正文

Hunyuan-MT-7B模型介绍

模型介绍

混元翻译模型,包含一个翻译模型Hunyuan-MT-7B和一个集成模型Hunyuan-MT-Chimera。翻译模型用来将待翻译的文本翻译成目标语言,集成模型用来把翻译模型的多个翻译结果集成为一个更好的翻译。重点支持33种语言互译,支持5种民汉语言。

核心特性与优势

  • 在WMT25参赛的31种语言中,有30种语言获得了第一名的成绩。 Hunyuan-MT-7B在业界同尺寸模型中效果最优。
  • Hunyuan-MT-Chimera-7B是业界首个开源翻译集成模型,可以进一步提升翻译效果。
  • 提出了一个完整的翻译模型训练范式,从预训练->CPT->SFT->翻译强化->集成强化,翻译效果达到同尺寸SOTA(最先进水平)。

模型性能

模型加载

from modelscope import AutoModelForCausalLM, AutoTokenizer
import osmodel_name_or_path = "Tencent-Hunyuan/Hunyuan-MT-7B"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")  # You may want to use bfloat16 and/or move to GPU here
/home/six/Zhou/test_source/gpt-oss-unsloth/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.htmlfrom .autonotebook import tqdm as notebook_tqdmDownloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-MT-7B
Downloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-MT-7BLoading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.99it/s]

模型配置

model.config
HunYuanDenseV1Config {"add_classification_head": false,"architectures": ["HunYuanDenseV1ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"attention_head_dim": 128,"bos_token_id": 1,"cla_share_factor": 2,"class_num": 0,"dense_list": [4096,0],"dtype": "float32","eos_token_id": 127960,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"im_end_id": 5,"im_newline_id": 11,"im_start_id": 4,"initializer_range": 0.02,"intermediate_size": 14336,"mask_init_id": 12,"max_position_embeddings": 32768,"mlp_bias": false,"model_type": "hunyuan_v1_dense","norm_type": "rms","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"org_vocab_size": 290943,"pad_id": 127961,"pad_token_id": 0,"pool_type": "last","pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": {"alpha": 100000.0,"beta_fast": 32,"beta_slow": 1,"factor": 1.0,"mscale": 1.0,"mscale_all_dim": 1.0,"type": "dynamic"},"rope_theta": 10000.0,"text_end_id": 7,"text_start_id": 6,"tie_word_embeddings": true,"transformers_version": "4.56.0","use_cache": true,"use_cla": false,"use_qk_norm": true,"use_rotary_pos_emb": true,"vocab_size": 128256
}

模型结构

model
HunYuanDenseV1ForCausalLM((model): HunYuanDenseV1Model((embed_tokens): Embedding(128256, 4096, padding_idx=0)(layers): ModuleList((0-31): 32 x HunYuanDenseV1DecoderLayer((self_attn): HunYuanDenseV1Attention((q_proj): Linear(in_features=4096, out_features=4096, bias=False)(k_proj): Linear(in_features=4096, out_features=1024, bias=False)(v_proj): Linear(in_features=4096, out_features=1024, bias=False)(o_proj): Linear(in_features=4096, out_features=4096, bias=False)(query_layernorm): HunYuanDenseV1RMSNorm((128,), eps=1e-05)(key_layernorm): HunYuanDenseV1RMSNorm((128,), eps=1e-05))(mlp): HunYuanDenseV1MLP((gate_proj): Linear(in_features=4096, out_features=14336, bias=False)(up_proj): Linear(in_features=4096, out_features=14336, bias=False)(down_proj): Linear(in_features=14336, out_features=4096, bias=False)(act_fn): SiLU())(input_layernorm): HunYuanDenseV1RMSNorm((4096,), eps=1e-05)(post_attention_layernorm): HunYuanDenseV1RMSNorm((4096,), eps=1e-05)))(norm): HunYuanDenseV1RMSNorm((4096,), eps=1e-05)(rotary_emb): HunYuanDenseV1RotaryEmbedding())(lm_head): Linear(in_features=4096, out_features=128256, bias=False)
)

模型调用

messages = [{"role": "user", "content": "Translate the following segment into Chinese, without additional explanation.\n\nGet something off your chest"},
]
tokenized_chat = tokenizer.apply_chat_template(messages,tokenize=True,add_generation_prompt=False,return_tensors="pt"
)outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
output_text = tokenizer.decode(outputs[0])
output_text
'<|startoftext|>Translate the following segment into Chinese, without additional explanation.\n\nGet something off your chest<|extra_0|>把心里的话说出来吧。<|eos|>'
http://www.xdnf.cn/news/19907.html

相关文章:

  • 使用Vue.js和WebSocket打造实时库存仪表盘
  • window使用ffmep工具,加自定义脚本执行视频转码成h264(运营人员使用)
  • P13929 [蓝桥杯 2022 省 Java B] 山 题解
  • 第三方网站测评:【WEB应用文件包含漏洞(LFI/RFI)的测试步骤】
  • 神经网络模型介绍
  • LeetCode 3132.找出与数组相加的整数2
  • 机器学习算法在Backtrader策略稳定性中的作用分析
  • pytorch可视化工具(训练评估:Tensorboard、swanlab)
  • c#编写的应用程序调用不在同一文件夹下的DLL
  • OpenLayers 入门篇教程 -- 章节三 :掌控地图的视野和交互
  • 下一代自动驾驶汽车系统XIL验证方法
  • 【Doris入门】Doris数据表模型使用指南:核心注意事项与实践
  • select, poll, epoll
  • PyTorch 损失函数与优化器全面指南:从理论到实践
  • 论文理解:Reflexion: Language Agents with Verbal Reinforcement Learning
  • 【正则表达式】 正则表达式运算法优先级的先后是怎么排序的?
  • 【Pytest】解决Pytest中Teardown钩子的TypeError:实例方法与类方法的调用差异
  • Java中最常用的设计模式
  • Mysql主从复制之延时同步
  • 【Linux基础】Linux系统管理:深入理解Linux运行级别及其应用
  • 面经分享二:Kafka、RabbitMQ 、RocketMQ 这三中消息中间件实现原理、区别与适用场景
  • 笔记:卷积神经网络(CNN)
  • VS2015+QT编译protobuf库
  • 【倒计时2个月】好•真题资源+专业•练习平台=高效备赛2025初中古诗文大会
  • 达人数据导出:小青苔如何让达人数据管理效率飙升?
  • 海康摄像头开发---JSON数据与图片分离
  • 近期刷题总结
  • ChartView的基本介绍与使用
  • 江协科技STM32学习笔记补充之004 基于XC6206P332MR(Torex)的5V到3.3V的电压转换电路分析
  • 2025年中国GEO优化服务机构官方信息汇总与能力概览