当前位置：首页 > web >正文

字节 Golang 大模型应用开发框架 Eino简介

web 2025/8/18 10:00:49

Eino

Eino['aino] (pronounced similarly to “I know”) aims to be the ultimate LLM application development framework in Golang. Drawing inspirations from many excellent LLM application development frameworks in the open-source community such as LangChain & LlamaIndex, etc., as well as learning from cutting-edge research and real world applications, Eino offers an LLM application development framework that emphasizes on simplicity, scalability, reliability and effectiveness that better aligns with Golang programming conventions.
Eino['aino]（谐音 “I know”）旨在成为用 Go 语言编写的终极大型语言模型（LLM）应用开发框架。它从开源社区中的诸多优秀 LLM 应用开发框架，如 LangChain 和 LlamaIndex 等获取灵感，同时借鉴前沿研究成果（cutting-edge research）与实际应用（real world applications），提供了一个强调简洁性、可扩展性、可靠性与有效性，且更符合 Go 语言编程惯例的 LLM 应用开发框架。

similarly ˈsɪmələli ˈsɪmələrli 相似地；同样，也
ultimate / ˈʌltɪmət / / ˈʌltɪmət / 最终的，最后的；最根本的，最基础的；极限的，终极的
inspirations / ˌɪnspəˈreɪʃnz / 启发，灵感
drawing / ˈdrɔːɪŋ / / ˈdrɔːɪŋ /
emphasize ˈemfəsaɪz 强调，着重：使某事物显得更为重要或显著
simplicity / sɪmˈplɪsəti / / sɪmˈplɪsəti / 简单（性)，容易（性）；简朴，朴素；简单
scalability / ˌskeɪləˈbɪləti / / ˌskeɪləˈbɪləti / 可扩展性；可伸缩性；可量测性
conventions / kənˈvenʃnz / 惯例；会议；约定（convention 的复数）

What Eino provides are:
a carefully curated list of component abstractions and implementations that can be easily reused and combined to build LLM applications
精心整理的一系列组件（component）抽象与实现，可轻松复用与组合，用于构建 LLM 应用。
a powerful composition framework that does the heavy lifting of strong type checking, stream processing, concurrency management, aspect injection, option assignment, etc. for the user.
强大的编排（orchestration）框架，为用户承担繁重的类型检查、流式处理、并发管理、切面注入、选项赋值等工作。
a set of meticulously designed API that obsesses on simplicity and clarity.
一套精心设计、注重简洁明了的 API。
an ever-growing collection of best practices in the form of bundled flows and examples.
以集成流程（flow）和示例（example）形式不断扩充的最佳实践集合。
a useful set of tools that covers the entire development cycle, from visualized development and debugging to online tracing and evaluation.
一套实用工具（DevOps tools），涵盖从可视化开发与调试到在线追踪与评估的整个开发生命周期。

curate / ˈkjʊərət; ˌkjʊˈreɪt / / ˈkjʊrət / （某教区的）助理牧师；堂区牧师 / 操持（收藏品或展品的）展出；组织（音乐节的）演出
curated 筹办（收藏品或展品）的展出；组织（音乐节的）演出（curate 的过去式及过去分词） / 仔细挑选并展览的
composition / ˌkɒmpəˈzɪʃ(ə)n / / ˌkɑːmpəˈzɪʃ(ə)n / 成分构成，成分；（音乐、艺术、诗歌的）作品；创作，作曲；构图；作文
heavy lifting 搬运重物
aspect / ˈæspekt / / ˈæspekt / 方面，特色；切面；朝向，方位；外表，外观；（动词的）体
meticulously / məˈtɪkjələsli / / məˈtɪkjələsli / 细致地；一丝不苟地；拘泥地
clarity / ˈklærəti / / ˈklærəti / 清晰易懂；思路清晰；（画面或声音的）清晰，清楚；清澈，明净
obsess / əbˈses / / əbˈses / 使着迷；使心神不宁；挂牵，念念不忘
obsess on 沉迷
bundled flows 集成的流程
ever-growing collection 不断扩充的集合
visualized / ˈvɪʒuəlaɪzd / 直观的；直视的；使形象化；想像（visualize 的过去分词）
visualized development 可视化开发

With the above arsenal, Eino can standardize, simplify, and improve efficiency at different stages of the AI application development cycle:
借助上述能力和工具，Eino 能够在人工智能应用开发生命周期的不同阶段实现标准化、简化操作并提高效率：

arsenal / ˈɑːsənl / / ˈɑːrsənl / 兵工厂；军械库；武器；军火库；武器和军用储备；可用于某目的的大量资源；储备的大批武器；（设备、方法等）宝库，珍藏；阿森纳
standardize / ˈstændədaɪz / / ˈstændərdaɪz / 使标准化，使符合标准（或规格）；定出标准，立下标准；（对照标准）确定……的性能，确定……的特性
在这里插入图片描述

quick walkthrough （快速上手）

Use a component directly:

model, _ := openai.NewChatModel(ctx, config) // create an invokable LLM instance
message, _ := model.Generate(ctx, []*Message{SystemMessage("you are a helpful assistant."),UserMessage("what does the future AI App look like?")})

Of course, you can do that, Eino provides lots of useful components to use out of the box. But you can do more by using orchestration, for three reasons:
orchestration encapsulates common patterns of LLM application.
orchestration solves the difficult problem of processing stream response by the LLM.
orchestration handles type safety, concurrency management, aspect injection and option assignment for you.
当然，你可以这样用，Eino 提供了许多开箱即用的有用组件（use out of the box）。
但通过使用编排功能，你能实现更多，原因有三：
编排封装了大语言模型（LLM）应用的常见模式（common patterns）
编排解决了处理大语言模型流式响应这一难题（processing stream response 处理流式响应）
编排为你处理类型安全、并发管理、切面注入以及选项赋值等问题。

invokable 可调用的：可以被调用或请求执行的，通常用于计算机编程中描述函数或方法。

Eino provides three set of APIs for orchestration
在这里插入图片描述
Chain Simple chained directed graph（链式有向图） that can only go forward（向前）.
Graph Cyclic or Acyclic directed graph. Powerful and flexible.
Workflow Acyclic graph that supports data mapping at struct field level.

acyclic / ˌeɪˈsaɪklɪk / / ˌeɪˈsaɪklɪk / 非循环的；[物] 非周期的
directed graph 有向图

Let’s create a simple chain: a ChatTemplate followed by a ChatModel.
在这里插入图片描述

chain, _ := NewChain[map[string]any, *Message]().AppendChatTemplate(prompt).AppendChatModel(model).Compile(ctx)
chain.Invoke(ctx, map[string]any{"query": "what's your name?"})

Now let’s create a graph that uses a ChatModel to generate answer or tool calls, then uses a ToolsNode to execute those tools if needed.
创建一个 Graph，先用一个 ChatModel 生成回复或者 Tool 调用指令，如生成了 Tool 调用指令，就用一个 ToolsNode 执行这些 Tool。
在这里插入图片描述

graph := NewGraph[map[string]any, *schema.Message]()_ = graph.AddChatTemplateNode("node_template", chatTpl)
_ = graph.AddChatModelNode("node_model", chatModel)
_ = graph.AddToolsNode("node_tools", toolsNode)
_ = graph.AddLambdaNode("node_converter", takeOne)_ = graph.AddEdge(START, "node_template")
_ = graph.AddEdge("node_template", "node_model")
_ = graph.AddBranch("node_model", branch)
_ = graph.AddEdge("node_tools", "node_converter")
_ = graph.AddEdge("node_converter", END)compiledGraph, err := graph.Compile(ctx)
if err != nil {
return err
}
out, err := r.Invoke(ctx, map[string]any{"query":"Beijing's weather this weekend"})

Now let’s create a workflow that flexibly maps input & output at the field level:
创建一个 Workflow，它能在字段级别灵活映射输入与输出（maps input & output ）：
flexibly / ˈfleksəbli / / ˈfleksəbli / 灵活地；易曲地；柔软地；有弹性地
在这里插入图片描述

type Input1 struct {Input string
}type Output1 struct {Output string
}type Input2 struct {Role schema.RoleType
}type Output2 struct {Output string
}type Input3 struct {Query stringMetaData string
}var (ctx context.Contextm model.BaseChatModellambda1 func(context.Context, Input1) (Output1, error)lambda2 func(context.Context, Input2) (Output2, error)lambda3 func(context.Context, Input3) (*schema.Message, error)
)wf := NewWorkflow[[]*schema.Message, *schema.Message]()
wf.AddChatModelNode("model", m).AddInput(START)
wf.AddLambdaNode("lambda1", InvokableLambda(lambda1)).AddInput("model", MapFields("Content", "Input"))
wf.AddLambdaNode("lambda2", InvokableLambda(lambda2)).AddInput("model", MapFields("Role", "Role"))
wf.AddLambdaNode("lambda3", InvokableLambda(lambda3)).AddInput("lambda1", MapFields("Output", "Query")).AddInput("lambda2", MapFields("Output", "MetaData"))
wf.End().AddInput("lambda3")
runnable, err := wf.Compile(ctx)
if err != nil {return err
}
our, err := runnable.Invoke(ctx, []*schema.Message{schema.UserMessage("kick start this workflow!"),
})

Now let’s create a ‘ReAct’ agent: A ChatModel binds to Tools. It receives input Messages and decides independently whether to call the Tool or output the final result. The execution result of the Tool will again become the input Message for the ChatModel and serve as the context for the next round of independent judgment.
创建一个 “ReAct” 智能体：一个 ChatModel 绑定了一些 Tool。它接收输入的消息，自主判断是调用 Tool 还是输出最终结果。Tool 的执行结果会再次成为聊天模型的输入消息，并作为下一轮自主判断的上下文。
在这里插入图片描述
我们在 Eino 的 flow 包中提供了开箱即用的 ReAct 智能体的完整实现。代码参见： flow/agent/react
We provide a complete implementation for ReAct Agent out of the box in the flow package. Check out the code here: flow/agent/react

Our implementation of ReAct Agent uses Eino’s graph orchestration exclusively, which provides the following benefits out of the box:
Type checking: it makes sure the two nodes’ input and output types match at compile time.
Stream processing: concatenates message stream before passing to chatModel and toolsNode if needed, and copies the stream into callback handlers.
Concurrency management: the shared state can be safely read and written because the StatePreHandler is concurrency safe.
Aspect injection: injects callback aspects before and after the execution of ChatModel if the specified ChatModel implementation hasn’t injected itself.
Option assignment: call options are assigned either globally, to specific component type or to specific node.

concatenate / kɒnˈkætɪˌneɪt / / kənˈkæt(ə)nˌeɪt / 连接，连结，使连锁；连接的，连结的，连锁的

ReAct 智能体实现完全（exclusively）基于 Eino 的编排能力。通过使用 Eino 编排，我们可以自动获得如下能力:
类型检查：在编译时确保两个节点的输入和输出类型匹配
流处理：如有需要，在将消息流传递给 ChatModel 和 ToolsNode 节点之前进行拼接（concatenate），以及将该流复制到callback handler 中。
并发管理：由于 StatePreHandler 是线程安全的，共享的 state（the shared state）可以被安全地读写
切面注入：如果指定的 ChatModel 实现未自行注入，会在 ChatModel 执行之前和之后注入回调切面。
选项赋值：调用 Option 可以全局设置，也可以针对特定组件类型（to specific component type）或特定节点进行设置。

For example, you could easily extend the compiled graph with callbacks:

handler := NewHandlerBuilder().OnStartFn(func(ctx context.Context, info *RunInfo, input CallbackInput) context.Context) {log.Infof("onStart, runInfo: %v, input: %v", info, input)}).OnEndFn(func(ctx context.Context, info *RunInfo, output CallbackOutput) context.Context) {log.Infof("onEnd, runInfo: %v, out: %v", info, output)}).Build()compiledGraph.Invoke(ctx, input, WithCallbacks(handler))

or you could easily assign options to different nodes:

// assign to All nodes
compiledGraph.Invoke(ctx, input, WithCallbacks(handler))// assign only to ChatModel nodes
compiledGraph.Invoke(ctx, input, WithChatModelOption(WithTemperature(0.5))// assign only to node_1
compiledGraph.Invoke(ctx, input, WithCallbacks(handler).DesignateNode("node_1"))

Key Features

Rich Components

Encapsulates common building blocks into component abstractions, each have multiple component implementations that are ready to be used out of the box.
将常见的构建模块（building blocks）封装为组件抽象，每个组件抽象都有多个可开箱即用的组件实现。

component abstractions such as ChatModel, Tool, ChatTemplate, Retriever, Document Loader, Lambda, etc.
诸如 ChatModel、Tool、ChatTemplate、Retriever、Document Loader、Lambda 等组件抽象。

each component type has an interface of its own: defined Input & Output Type, defined Option type, and streaming paradigms that make sense.
每种组件类型都有其自身的接口：定义了输入和输出类型、定义了选项类型，以及合理的流处理范式。

implementations are transparent. Abstractions are all you care about when orchestrating components together.
实现细节是透明的。在编排组件时，只需关注抽象层面（Abstractions are all you care about）。

Implementations can be nested and captures complex business logic.
实现可以嵌套，并包含复杂的业务逻辑。

ReAct Agent, MultiQueryRetriever, Host MultiAgent, etc. They consist of multiple components and non-trivial business logic.
ReAct Agent、MultiQueryRetriever、Host MultiAgent 等。它们由多个组件和复杂的业务逻辑构成。

They are still transparent from the outside. A MultiQueryRetriever can be used anywhere that accepts a Retriever.
从外部看，它们的实现细节依然透明。例如在任何接受 Retriever 的地方，都可以使用 MultiQueryRetriever。

Powerful Orchestration

instances / ˈɪnstənsɪz / 相依物体，例子；实体；实例

Data flows from Retriever / Document Loaders / ChatTemplate to ChatModel, then flows to Tools and parsed as Final Answer. This directed, controlled flow of data through multiple components can be implemented through graph orchestration.
数据从 Retriever / Document Loader / ChatTemplate 流向 ChatModel，接着流向 Tool ，并被解析为最终答案。这种通过多个组件的有向、可控的数据流（directed, controlled flow of data），可以通过图编排来实现。

Component instances are graph nodes, and edges are data flow channels.
组件实例是图的节点（Node），而边（Edge）则是数据流通道。

Graph orchestration is powerful and flexible enough to implement complex business logic:
图编排功能强大且足够灵活，能够实现复杂的业务逻辑：
type checking, stream processing, concurrency management, aspect injection and option assignment are handled by the framework.
类型检查、流处理、并发管理、切面注入和选项分配都由框架处理
branch out execution at runtime, read and write global state, or do field level data mapping using workflow(currently in alpha stage).
在运行时进行分支（Branch）执行、读写全局状态（State），或者使用工作流进行字段级别的数据映射（field level data mapping）。

Complete Stream Processing

converge / kənˈvɜːdʒ / / kənˈvɜːrdʒ / （使）汇聚，集中；（观点、目标）趋同；（数）收敛
fan out / fæn aʊt / 分散或使（某物）分散开来；展开或使（某物）展开，使其分散开来

Stream processing is important because ChatModel outputs chunks of messages in real time as it generates them. It’s especially important with orchestration because more components need to handle streaming data.
流式处理（Stream Processing）很重要，因为 ChatModel 在生成消息时会实时输出消息块。在编排场景下会尤为重要，因为更多的组件需要处理流式数据

Eino automatically concatenates stream chunks for downstream nodes that only accepts non-stream input, such as ToolsNode.
对于只接受非流式输入的下游节点（如 ToolsNode），Eino 会自动将流拼接（Concatenate）起来

Eino automatically boxes non stream into stream when stream is needed during graph execution.
在图执行过程中，当需要流时，Eino 会自动将非流式转换为流式

Eino automatically merges multiple streams as they converge into a single downward node.
当多个流汇聚到一个下游节点时，Eino 会自动合并（Merge）这些流

Eino automatically copies stream as they fan out to different downward node, or is passed to callback handlers.
当流分散到不同的下游节点或传递给回调处理器时，Eino 会自动复制（Copy）这些流

Orchestration elements such as branch and state handlers are also stream aware.
如分支（Branch）、或状态处理器（StateHandler）等编排元素，也能够感知和处理流

With these streaming processing abilities, the streaming paradigms of components themselves become transparent to the user.
借助上述流式处理能力，组件本身的流式处理范式（the streaming paradigms of components themselves）变的对用户透明

A compiled Graph can run with 4 different streaming paradigms:
经过编译的 Graph 可以用 4 种不同的流式范式来运行：
在这里插入图片描述

Highly Extensible Aspects (Callbacks)

Aspects handle cross-cutting concerns such as logging, tracing, metrics, etc., as well as exposing internal details of component implementations.
Five aspects are supported: OnStart, OnEnd, OnError, OnStartWithStreamInput, OnEndWithStreamOutput.
Developers can easily create custom callback handlers, add them during graph run via options, and they will be invoked during graph run.
Graph can also inject aspects to those component implementations that do not support callbacks on their own.

切面用于处理诸如日志记录、追踪、指标统计等横切面关注点（cross-cutting concerns），同时也用于暴露组件实现的内部细节。
支持五种切面：OnStart、OnEnd、OnError、OnStartWithStreamInput、OnEndWithStreamOutput。
开发者可以轻松创建自定义回调处理程序，在图运行期间通过 Option 添加它们，这些处理程序会在图运行时被调用。
图还能将切面注入到那些自身不支持回调的组件实现中。

Eino Framework Structure

在这里插入图片描述
artifact / ˈɑːtɪfækt / / ˈɑːtɪfækt / （尤指有文化价值或历史价值的）人工制品，历史文物；非自然存在物体，假象（=artefact）
paradigm / ˈpærədaɪm / / ˈpærədaɪm / 典范，范例；样板，范式；词形变化表；纵聚合关系语言项
pregel 预凝胶
canvas / ˈkænvəs / / ˈkænvəs / 帆布；画布，油画
prompt / prɒmpt / / prɑːmpt / 促使，导致；鼓励，提示（说话者）；（计算机上）提示；（给演员）提词，提白

The Eino framework consists of several parts:
Eino(this repo): Contains Eino’s type definitions, streaming mechanism, component abstractions, orchestration capabilities, aspect mechanisms, etc.
EinoExt: Component implementations, callback handlers implementations, component usage examples, and various tools such as evaluators, prompt optimizers.
Eino Devops: visualized developing, visualized debugging etc.
EinoExamples is the repo containing example applications and best practices for Eino.

evaluator / ɪˈvæljʊeɪtə / 评估员；[电子] 鉴别器；求值程序
containing / kənˈteɪnɪŋ / / kənˈteɪnɪŋ /

Eino 框架由几个部分组成：
Eino（本代码仓库）：包含类型定义、流处理机制、组件抽象、编排功能、切面机制等。
EinoExt：组件实现、回调处理程序实现、组件使用示例，以及各种工具，如评估器、提示优化器等。
Eino Devops：可视化开发、可视化调试等。
EinoExamples：是包含示例应用程序和最佳实践的代码仓库。

Detailed Documentation

For learning and using Eino, we provide a comprehensive Eino User Manual to help you quickly understand the concepts in Eino and master the skills of developing AI applications based on Eino. Start exploring through the Eino User Manual now!

For a quick introduction to building AI applications with Eino, we recommend starting with Eino: Quick Start

comprehensive / ˌkɒmprɪˈhensɪv / / ˌkɑːmprɪˈhensɪv / 综合性的，全面的；有理解力的；综合中学；专业综合测验

针对 Eino 的学习和使用，提供了完善的 Eino用户手册，帮助大家快速理解 Eino 中的概念，掌握基于 Eino 开发设计 AI 应用的技能
若想快速上手，了解通过 Eino 构建 AI 应用的过程，推荐先阅读Eino: 快速开始

Dependencies

Go 1.18 and above.
Eino relies on kin-openapi 's OpenAPI JSONSchema implementation. In order to remain compatible with Go 1.18, we have fixed kin-openapi’s version to be v0.118.0.

Security

If you discover a potential security issue in this project, or think you may have discovered a security issue, we ask that you notify Bytedance Security via our security center or vulnerability reporting email.

vulnerability
/ ˌvʌlnərəˈbɪləti / / ˌvʌlnərəˈbɪləti / 易损性，弱点；（桥牌）局况

LangChain 和 LlamaIndex 介绍

LangChain /læŋ tʃeɪn/
LlamaIndex /ˈlɑːmə ˈɪndeks/

LangChain 和 LlamaIndex（原 GPT Index）是当前构建基于大语言模型（LLM）应用的两大主流框架，二者均专注于解决 LLM 在实际场景中面临的知识局限、上下文依赖、数据连接等问题，但核心定位和功能侧重有所不同。

LangChain

核心定位

用于构建 LLM 驱动应用的通用框架，强调 “链（Chains）” 和 “代理（Agents）” 的概念，旨在通过模块化组件将 LLM 与外部数据、工具、API 等连接起来，实现复杂的任务流程。它的核心思想是 “让 LLM 像人一样思考和行动”—— 不仅能生成文本，还能调用工具、处理数据、规划步骤。

核心功能

链（Chains）：将多个 LLM 调用或工具调用组合成有序流程。例如，“检索文档→生成回答” 的检索增强生成（RAG）链、“翻译→总结→分析” 的多步骤处理链等
代理（Agents）：让 LLM 拥有自主决策能力，根据任务需求动态选择工具（如搜索引擎、数据库、计算器）并执行步骤。例如，“回答复杂问题时自动搜索最新数据→整理结果→生成答案”
集成外部数据：通过 “文档加载器（Document Loaders）” 连接各类数据源（PDF、数据库、网页等），并通过 “文本分割器（Text Splitters）” 处理数据，再结合向量存储实现检索
工具生态：支持集成大量第三方工具，如搜索引擎（Google Search）、代码解释器（Code Interpreter）、数据库查询（SQL）、API 调用等
记忆（Memory）：保存对话历史或中间结果，支持多轮对话场景，让 LLM 能 “记住” 之前的交互内容

适用场景

复杂对话系统（如智能客服、多轮问答机器人）
检索增强生成（RAG）应用（如基于私有文档的问答工具）
自动化任务处理（如数据分析报告生成、邮件自动分类回复）
智能代理应用（如自动规划行程、学术论文辅助写作）

优势

高度模块化，组件可灵活组合，支持自定义扩展
生态丰富，工具和集成能力极强，适配主流 LLM（GPT、Claude、文心一言等）
支持复杂逻辑流程设计，适合构建需要多步骤决策的应用

LlamaIndex

核心定位

LlamaIndex 是一个专注于 “数据连接” 的框架，核心目标是解决 LLM 与私有数据或特定领域数据的融合问题
让 LLM 能高效 “理解” 和 “检索” 外部数据
它更像是 LLM 与数据之间的 “桥梁”，尤其在检索增强生成（RAG）场景中表现突出

核心功能

数据 ingestion（数据摄入）：提供统一接口加载各类数据源（文档、数据库、API 等），并自动处理格式转换、文本提取等问题
数据索引（Indexing）：将数据结构化并构建高效索引，核心是向量索引（Vector Index）—— 通过嵌入模型（Embedding Model）将文本转换为向量，实现快速相似性检索
查询接口（Query Interface）：提供简单的 API 接口，支持多种查询方式（如关键词检索、语义检索、多步查询），并能将检索到的上下文与 LLM 结合生成回答
优化工具：针对 RAG 场景提供优化功能，如自动文本分割策略、索引优化、检索结果排序等，提升回答准确性
与 LLM 集成：支持主流 LLM（OpenAI、Anthropic、开源模型等），并能无缝对接 LangChain 等框架

适用场景

基于私有文档的问答系统（如企业知识库查询、论文 / 报告解读工具）
数据检索与分析结合的应用（如从大量文档中提取关键信息、生成摘要）
需要高效处理大规模非结构化数据的场景（如法律文书检索、医疗记录分析）

优势

专注数据处理和检索优化，在 RAG 场景中开箱即用，降低技术门槛
索引机制高效，支持大规模数据存储和快速查询
对数据加载、分割、索引的细节处理更精细化，适合对检索准确性要求高的场景

LangChain 与 LlamaIndex 的对比与协同

在这里插入图片描述

协同关系

两者并非竞争关系，而是互补关系
在 RAG 应用中，可使用 LlamaIndex 处理数据索引和检索，再通过 LangChain 的链（Chains）将检索结果与 LLM 结合，或通过代理（Agents）实现更复杂的交互逻辑
LlamaIndex 可作为 LangChain 的 “数据检索模块”，而 LangChain 可扩展 LlamaIndex 的应用场景（如添加多轮对话、工具调用能力）

总结

如果需要构建涉及多步骤流程、工具调用或智能决策的 LLM 应用，优先选择 LangChain
如果专注于让 LLM 高效结合私有数据回答问题（如 RAG 场景），LlamaIndex 是更便捷的选择
实际应用中，两者常结合使用，发挥各自优势

RAG介绍

RAG（Retrieval-Augmented Generation，检索增强生成）是一种结合信息检索与大语言模型（LLM）生成能力的技术框架，旨在解决 LLM 存在的 “知识过时、缺乏私有数据、易产生幻觉（虚构信息）” 等问题，让模型生成的内容更准确、更可靠、更贴合特定场景需求。

Augmented ɔːɡˈmentɪd ɔːɡˈmentɪd 增广的；增音的；扩张的
Hallucination həˌluːsɪˈneɪʃ(ə)n həˌluːsɪˈneɪʃ(ə)n 幻觉，幻想；错觉

RAG 的核心原理

RAG 的核心逻辑可以概括为 “先检索，再生成”，通过以下步骤实现对 LLM 输出的增强：

数据准备与索引构建
将私有文档、行业数据、最新资料等外部数据进行处理（如分割、清洗），并通过嵌入模型（Embedding Model）转换为向量（数值表示），存储到向量数据库中形成 “知识索引”
例如：将企业内部手册、学术论文、历史对话记录等转化为可检索的向量数据。

用户查询与相似性检索
当用户提出问题时，系统先将问题转换为向量，然后在向量数据库中检索与问题语义最相似的 “相关文档片段”（即上下文）。
这一步类似 “智能搜索”，但比传统关键词搜索更精准，能理解语义层面的关联。

结合检索结果生成回答
将检索到的相关文档片段与用户问题一起输入 LLM，让模型基于这些 “事实依据” 生成回答，而非仅依赖模型自身的预训练知识。
例如用户问 “公司 2024 年休假政策是什么”，系统先检索到内部休假制度文档，再让 LLM 基于文档内容解释政策，避免模型编造过时信息。

RAG 的核心价值

解决知识局限性
LLM 的预训练知识截止到特定时间（如 GPT-4 截止到 2023 年 10 月），无法回答最新事件或未包含的私有数据。RAG 可通过检索实时 / 私有数据补充知识。
例：用 RAG 结合最新财报数据，让 LLM 分析企业最新业绩。

减少 “幻觉”（Hallucination）
LLM 可能在不确定时虚构信息，而 RAG 强制模型基于检索到的真实文档生成内容，回答可追溯、可验证。
例：法律问答场景中，RAG 检索具体法条后再生成解释，避免法律建议出错。

适配私有 / 领域数据
企业无需将敏感数据上传至公共 LLM，而是通过 RAG 让模型在本地 / 私有环境中检索数据，兼顾安全性与实用性。
例：医疗行业用 RAG 结合患者病历和医学文献，辅助医生诊断建议。

降低模型训练成本
无需为特定领域数据重新训练大模型（成本极高），只需通过 RAG 动态接入数据，快速适配新场景。

RAG 的关键组件

一个完整的 RAG 系统通常包含以下核心模块：
数据源（Data Sources）：需要接入的外部数据，如文档（PDF、Word）、数据库、API 接口、网页等
文档加载器（Document Loaders）：将不同格式的数据源转换为统一的文本格式（如 LangChain、LlamaIndex 提供的加载工具）
文本分割器（Text Splitters）：将长文本分割为短片段（Chunk），避免超过 LLM 上下文窗口限制，提升检索精度
嵌入模型（Embedding Models）：将文本转换为向量（如 OpenAI 的 text-embedding-ada-002、开源的 BERT 衍生模型），捕捉语义信息
向量数据库（Vector Databases）：存储向量数据并支持高效相似性检索，如 Pinecone、Milvus、Qdrant、Chroma 等
大语言模型（LLM）：基于检索到的上下文生成回答，如 GPT-4、Claude、文心一言、Llama 2 等
检索器（Retrievers）：执行相似性检索的组件，负责从向量数据库中召回与问题相关的文档片段

RAG 的典型应用场景

企业知识库问答：员工查询内部制度、产品手册、技术文档等（如 “如何申请报销？”）
客户支持系统：客服机器人基于产品手册、故障解决方案实时回答用户问题
法律 / 医疗辅助：律师检索法条案例生成法律意见，医生结合病历和文献提供诊断参考
学术 / 科研工具：快速检索论文库、文献资料，辅助生成综述或实验分析
智能文档分析：对合同、报告等文档进行检索式总结、关键信息提取或合规检查