Deep Research(深度研究)是一种新兴的大模型驱动多步信息检索与报告生成技术。Deep Research自主检索和阅读网页内容,提取关键信息,并多轮推理整理成全面的研究报告。它能够自动将复杂问题分解为子问题、反复搜索权威资料、交叉分析多源信息,最终生成带有引用来源的结构化报告。旨在解决传统搜索和一般问答中的信息碎片化和Al幻觉问题。
'''
定义Agent 的提示词(Prompt),告诉 LLM:
- 你是一个研究助手;
- 收到一个主题后,请生成 20 到 30 条搜索建议;
- 每条建议应当包括 搜索关键词 + 搜索原因。
'''
PLANNER_INSTRUCTIONS =("You are a helpful research assistant. Given a query, come up with 5-7 web searches ""to perform to best answer the query.\n""Return **ONLY valid JSON** that follows this schema:\n"'{{"searches": [ {{"query": "example", "reason": "why"}} ]}}\n')'''
单个搜索建议,包含两个字段
reason 为什么要搜索这个关键词?(用于解释搜索动机)
query 要搜索的关键词本身
'''classWebSearchItem(BaseModel):reason:str"Your reasoning for why this search is important to the query."query:str"The search term to use for the web search."'''
搜索计划的整体结构,包含一个列表 searches,每项是上面的 WebSearchItem
'''classWebSearchPlan(BaseModel):searches:list[WebSearchItem]"""A list of web searches to perform to best answer the query."""'''
with_structured_outputd规定输出一份结构化的 WebSearchPlan 对象,里面包含多个 WebSearchItem。
'''
planner_prompt = ChatPromptTemplate.from_messages([("system", PLANNER_INSTRUCTIONS),("human","{query}")])planner_chain = planner_prompt | model.with_structured_output(WebSearchPlan,method="json_mode")
search_agent
search_agent的作用是接收一个搜索关键词,调用 Web 搜索工具,然后根据搜索结果生成一份简洁的摘要(2-3 段,<300词),不带评论,只保留信息本身。也就是说,这是整个系统中真正“去网上查资料”的角色。
model = ChatOpenAI(model="qwen-plus-latest", max_tokens=32000)# --2 search_agent--
SEARCH_INSTRUCTIONS =("You are a research assistant. Given a search term, search the Web and produce a ""concise 2-3 paragraph (<300 words) summary of the main points. Only return the summary.")'''
使用tavily搜索工具进行搜索
'''
search_tool = TavilySearch(max_results=5, topic="general")search_agent = create_react_agent(model=model,prompt=SEARCH_INSTRUCTIONS,tools=[search_tool],)
# --3 writer_chain--
WRITER_PROMPT =("You are a senior researcher tasked with writing a cohesive report for a research query. ""You will be provided with the original query and some initial research.\n\n""① 先给出完整的大纲;\n""② 然后生成正式报告。\n\n""**写作要求**:\n""• 报告使用 Markdown 格式;\n""• 章节清晰,层次分明;\n""• markdown_report部分至少包含2000中文字 (注意需要用中文进行回复);\n""• 内容丰富、论据充分,可加入引用和数据,允许分段、添加引用、表格等;\n""• 最终仅返回 JSON:\n"'{{"short_summary": "...", "markdown_report": "...", "follow_up_questions": ["..."]}}')'''
定义ReportData类,用来约束模型的输出结构
'''classReportData(BaseModel):short_summary:str"""A short 2-3 sentence summary of the findings.一份2-3句话的简短研究结论摘要。"""markdown_report:str"""The final report最终生成的报告(Markdown格式)"""follow_up_questions:list[str]"""Suggested topics to research further建议进一步研究的相关主题。"""writer_prompt = ChatPromptTemplate.from_messages([("system", WRITER_PROMPT),("human","{query}")])writer_chain = writer_prompt | model.with_structured_output(ReportData)
import json
import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel, ValidationError, parse_obj_as
from langchain.prompts import ChatPromptTemplate
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.types import Command
from langgraph.prebuilt import create_react_agent
from langchain_tavily import TavilySearch
from langchain_openai import ChatOpenAIload_dotenv(override=True)model = ChatOpenAI(model="qwen-plus-latest", max_tokens=32000)# --1 planner_chain--PLANNER_INSTRUCTIONS =("You are a helpful research assistant. Given a query, come up with 5-7 web searches ""to perform to best answer the query.\n""Return **ONLY valid JSON** that follows this schema:\n"'{{"searches": [ {{"query": "example", "reason": "why"}} ]}}\n')classWebSearchItem(BaseModel):reason:str"Your reasoning for why this search is important to the query.""你对为什么此搜索对于解答该问题很重要的理由。"query:str"The search term to use for the web search.""用于网络搜索的关键词。"classWebSearchPlan(BaseModel):searches:list[WebSearchItem]"""A list of web searches to perform to best answer the query.为了尽可能全面回答该问题而需要执行的网页搜索列表。"""planner_prompt = ChatPromptTemplate.from_messages([("system", PLANNER_INSTRUCTIONS),("human","{query}")])planner_chain = planner_prompt | model.with_structured_output(WebSearchPlan,method="json_mode")# --2 search_agent--
SEARCH_INSTRUCTIONS =("You are a research assistant. Given a search term, search the Web and produce a ""concise 2-3 paragraph (<300 words) summary of the main points. Only return the summary.")search_tool = TavilySearch(max_results=5, topic="general")search_agent = create_react_agent(model=model,prompt=SEARCH_INSTRUCTIONS,tools=[search_tool],)# --3 writer_chain--
WRITER_PROMPT =("You are a senior researcher tasked with writing a cohesive report for a research query. ""You will be provided with the original query and some initial research.\n\n""① 先给出完整的大纲;\n""② 然后生成正式报告。\n\n""**写作要求**:\n""• 报告使用 Markdown 格式;\n""• 章节清晰,层次分明;\n""• markdown_report部分至少包含2000中文字 (注意需要用中文进行回复);\n""• 内容丰富、论据充分,可加入引用和数据,允许分段、添加引用、表格等;\n""• 最终仅返回 JSON:\n"'{{"short_summary": "...", "markdown_report": "...", "follow_up_questions": ["..."]}}')classReportData(BaseModel):short_summary:str"""A short 2-3 sentence summary of the findings.一份2-3句话的简短研究结论摘要。"""markdown_report:str"""The final report最终生成的报告(Markdown格式)"""follow_up_questions:list[str]"""Suggested topics to research further建议进一步研究的相关主题。"""writer_prompt = ChatPromptTemplate.from_messages([("system", WRITER_PROMPT),("human","{query}")])writer_chain = writer_prompt | model.with_structured_output(ReportData)# -------- planner_node --------defplanner_node(state: MessagesState)-> Command:user_query = state["messages"][-1].contentraw = planner_chain.invoke({"query": user_query})# raw 可能已经是 WebSearchPlan,也可能是 dict(被解析过)try:plan = parse_obj_as(WebSearchPlan, raw)except ValidationError:# 若模型只返回 ["keyword1", ...]ifisinstance(raw,dict)andisinstance(raw.get("searches"),list):plan = WebSearchPlan(searches=[WebSearchItem(query=q, reason="")for q in raw["searches"]])else:raisereturn Command(goto="search_node",update={"messages":[AIMessage(content=plan.model_dump_json())],# JSON 字符串"plan": plan,# 同时保存原生对象,后面也能直接用},)# ---------- search_node ----------defsearch_node(state: MessagesState)-> Command:plan_json = state["messages"][-1].contentplan = WebSearchPlan.model_validate_json(plan_json)summaries =[]for item in plan.searches:# 1.用 HumanMessagerun = search_agent.invoke({"messages":[HumanMessage(content=item.query)]})# 2.取可读内容:最后一条 ToolMessage 或 AIMessagemsgs = run["messages"]readable =next((m for m inreversed(msgs)ifisinstance(m,(ToolMessage, AIMessage))), msgs[-1])summaries.append(f"## {item.query}\n\n{readable.content}")combined ="\n\n".join(summaries)return Command(goto="writer_node",update={"messages":[AIMessage(content=combined)]})# ---------- writer_node ----------defwriter_node(state: MessagesState)-> Command:original_query = state["messages"][0].contentcombined_summary = state["messages"][-1].contentwriter_input =(f"原始问题:{original_query}\n\n"f"搜索摘要:\n{combined_summary}")report: ReportData = writer_chain.invoke({"query": writer_input})return Command(goto=END,update={"messages":[AIMessage(content=json.dumps(report.dict(), ensure_ascii=False, indent=2))]},)# -------- 构建 & 运行 Graph --------
builder = StateGraph(MessagesState)
builder.add_node("planner_node", planner_node)
builder.add_node("search_node", search_node)
builder.add_node("writer_node", writer_node)builder.add_edge(START,"planner_node")
builder.add_edge("planner_node","search_node")
builder.add_edge("search_node","writer_node")
builder.add_edge("writer_node", END)graph = builder.compile()