当前位置：首页 > news >正文

多Agent协作案例：用AutoGen实现“写代码+测Bug”的自动开发流程

news 2025/9/5 11:50:22

在大型语言模型（LLM）能力日益强大的今天，我们不再满足于让LLM仅仅完成单一的任务，而是希望它们能够像人类团队一样，通过协作来解决复杂的问题。多Agent协作应运而生，它将多个具有不同技能的AI Agent组织起来，共同完成一个宏大的目标。

本文将带您走进多Agent协作的世界，特别是使用微软出品的AutoGen框架，实现一个经典的“写代码+测Bug”的自动化开发流程。我们将深入了解AutoGen的核心概念，并给出一个清晰的代码示例，让您亲身体验AI Agent团队的强大之处。

一、为什么需要多Agent协作？

单个强大的LLM（如GPT-4）虽然能力出众，但面对复杂、多方面的任务时，往往会：

有限的上下文窗口：难以一次性处理所有信息和指令。

知识的局限性：即使是通用大模型，也可能在特定领域（如下一代代码自动生成、精确的单元测试生成）略显不足。

单一视角：难以进行批判性思考、多角度验证或发现潜在的逻辑漏洞。

缺乏分工：无法像人类团队那样，让“程序员”专心写代码，“测试员”专心找Bug，互相协作，并行推进。

多Agent协作的优势：

专业化分工：每个Agent可以被训练或Prompt化，专注于特定技能，如代码生成、测试、文档撰写、环境配置等。

增强鲁棒性：通过不同Agent的交叉验证和反馈，可以发现并修正错误，提高最终结果的质量。

解决复杂问题：将复杂任务分解为一系列更小、更易于管理的子任务，分配给不同的Agent，并行或顺序执行。

模拟真实世界工作流：模仿人类团队的协作模式，实现更自然的自动化流程。

二、 AutoGen 简介：构建 Agent 协作的利器

AutoGen 是一个由微软开发的开源框架，旨在简化用户创建、配置和协调LLM Agent的流程。它打破了传统LLM API一次交互只能得到一个响应的限制，允许Agent之间进行多轮、自动化的对话。

AutoGen的核心概念：

Agent: AutoGen中的Agent是一类可以与LLM模型进行交互，并能被自定义行为和能力的实体。AutoGen提供了多种内置Agent类型，同时也支持自定义Agent。

User Proxy Agent: 模拟真实用户的代理，负责发起对话、接收Agent的响应、执行工具（如运行Python代码）、并根据情况发送新的指令或反馈。

Assistant Agent: 通常由LLM驱动，负责执行具体任务，如代码生成、写博客、回答问题等。可以配置多个Assistant Agent，每个Agent可以有不同的LLM配置、Prompt或工具。

initiate_chat / a_initiate_chat: 这是启动Agent之间对话的函数。你可以指定发起者（通常是User Proxy Agent）和参与者（一个或多个Assistant Agent）。

Message: Agent之间的沟通内容，可以是“User Message”（用户角色发送）、“Assistant Message”（助手角色发送）等。

Tools: AutoGen支持Agent调用外部工具（如Python解释器、搜索引擎WebAPI等）。

Callbacks: 允许在Agent交互过程中执行自定义逻辑，如记录日志、收集指标、进行分析等。

三、案例：实现“写代码+测Bug”的自动开发流程

我们的目标是构建一个自动化流程：

用户 (User Proxy Agent) 提出一个编程需求（例如：“写一个Python函数，计算斐波那契数列”）。

编码Agent (Coder Agent) 接收需求，思考后编写Python代码。

测试Agent (Tester Agent) 接收到代码后，思考可能存在的Bug，并编写一个单元测试来验证代码的正确性。

执行Agent (User Proxy Agent) 接收到测试结果，将测试用例交由Python解释器执行。

结果反馈：根据测试结果，User Proxy Agent告知用户代码是否通过测试，或者如果测试失败，则将失败信息反馈给Coder Agent，进入下一轮的修正循环。

1. 准备环境

首先，安装AutoGen及相关依赖：

<BASH>

pip install "autogen[extended]" # extended includes more tools like python_interpreter

pip install python-dotenv # For managing API keys

然后，你需要一个LLM API密钥（如OpenAI）。创建一个.env文件，包含你的API密钥：

<TEXT>

OPENAI_API_KEY=sk-your_openai_api_key

2. 定义Agent

我们需要三个主要Agent：

User Proxy Agent: 负责启动流程，执行代码，并提供反馈。

Coder Agent: 能够理解需求并编写代码。

Tester Agent: 能够接收代码，编写测试用例，并评估代码表现。

代码示例 (main.py)：

import autogen

from autogen import config_list_from_json, UserProxyAgent, AssistantAgent, GroupChat, GroupChatManager

import os

from dotenv import load_dotenv

# 加载API密钥

load_dotenv()

# --- LLM 配置 ---

# 推荐使用 OpenAI 的 gpt-4 模型，因为它在代码理解和生成方面表现最佳。

# 如果没有 OpenAI API KEY, 你也可以配置其他模型，例如 Azure OpenAI, OLLAMA 等。

# 请确保你的 LLM 提供商配置正确。

config_list = [

{

"model": "gpt-4", # 或 "gpt-3.5-turbo"

"api_key": os.getenv("OPENAI_API_KEY")

}

]

llm_config = {

"config_list": config_list,

"cache_seed": 42, # For reproducibility

"temperature": 0.7 # Adjust for creativity vs. determinism

}

# --- Agent 定义 ---

# 1. User Proxy Agent (initiator, code executor)

# UserProxyAgent 负责与 LLM 交互，并执行代码。

# `human_input_mode="NEVER"` 表示它不会主动请求人类输入，而是完全依赖 Agent 之间的对话。

# `code_execution_config` 定义了代码执行的环境。

# `last_n_messages` 定义了记忆的窗口大小，即对话中保留的最近消息数量。

user_proxy = UserProxyAgent(

name="UserProxy",

llm_config=llm_config,

is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""), # 定义对话结束的条件

human_input_mode="NEVER", # 不主动询问用户

code_execution_config={

"work_dir": "coding", # 设置代码执行的工作目录

"use_docker": False, # False 表示在本地运行 Python 代码 (如果需要隔离环境，True 会更安全)

max_consecutive_auto_reply=10, # Agent 之间连续自动回复的最大次数

# last_n_messages=5 # optional: memory window size

)

# 2. Coder Agent (writes code based on user request)

# AssistantAgent 是一个通用的助手 Agent，我们可以通过 `system_message`

# 来赋予它特定的角色和能力。

coder_agent = AssistantAgent(

name="Coder",

llm_config=llm_config,

system_message="""You are a Python expert programmer.

You can write python code to solve user requests.

Your code must be correct and runnable.

When you are done writing the code, you must say "TERMINATE" so that the tester agent can take over.

When you write code, wrap it in markdown code blocks like this:

```python

# your code here

""",

)

3. Tester Agent (tests the code written by Coder Agent)

tester_agent = AssistantAgent(

name="Tester",

llm_config=llm_config,

system_message="""You are a Python expert tester.

You receive code from the Coder Agent and you need to find bugs or potential improvements.

Write a unit test for the provided code. The unit test should be comprehensive, covering edge cases if possible.

If the code seems correct and no bugs are found, write a simple greeting message and say "TERMINATE".

If you find a potential bug or an edge case, write a clear description of the issue and then use the python_interpreter tool to run the unit tests.

When you request code execution, wrap the code in markdown code blocks like this:

# test code here

If the unit test passes, respond with "Test passed." and say "TERMINATE".

If the unit test fails, describe the failure, say "Test failed.", and then say "TERMINATE" to let the coder agent fix it.

""",

)

<TEXT>

#### 3. 创建 Group Chat 机器人

AutoGen 的 `GroupChat` 和 `GroupChatManager` 组件是实现多Agent协作的核心。

* **`GroupChat`:** 定义了参与对话的Agent以及它们之间的会话流程（谁可以说话）。

* **`GroupChatManager`:** 负责管理 `GroupChat` 的状态，决定哪个Agent下一个说话，并处理 Agent 之间的消息传递。

```python

# --- Group Chat 设置 ---

# 定义对话中的参与者

agents = [user_proxy, coder_agent, tester_agent]

# 创建 GroupChat

# `speaker_selection_method` 控制 Agent 之间的对话流。

# "auto" 表示由 LLM 决定下一个发言的 Agent。

# "round_robin" 表示按顺序轮流发言。

# "manual" 则需要你提供一个函数来决定下一个发言人。

# 我们这里用 "auto" 让 LLM 决定流程。

groupchat = GroupChat(

agents=agents,

messages=[],

max_round=20, # 最大对话轮数，防止无限循环

speaker_selection_method="auto"

)

# 创建 GroupChatManager

# `selector` 是一个LLM，用于从所有Agent中选择下一个发言者。

# `allow_delegation` 允许 Agent 将任务委托给其他 Agent (我们这里设置为 True，使得 tester 可以请求 coder 修正 Bug)

manager = GroupChatManager(

groupchat=groupchat,

llm_config=llm_config,

# 选填：指定某个Agent作为默认的Responder，如果 LLM 无法决定

# default_responder=user_proxy

)

4. 启动对话

现在，我们使用 UserProxyAgent 的 initiate_chat 方法来启动整个Agent团队的协作。

# --- 启动 Agent 之间的对话 ---

user_request = """

Write a Python function that calculates the nth Fibonacci number using recursion.

The function should handle edge cases like n=0 and negative inputs.

"""

print("--- Initiating conversation ---")

# UserProxyAgent 启动对话，指定参与者和初始消息

user_proxy.initiate_chat(

manager, # 对话的管理者

message=user_request, # 用户发起的请求

# Optional: specify which agent should start the conversation if not the user_proxy

# starter_agent=coder_agent

)

print("\n--- Conversation Finished ---")

# 示例：你可以进一步分析 groupchat.messages 来查看完整的对话历史

# for i, message in enumerate(groupchat.messages):

# print(f"Message {i}: {message['from']} ({message['role']}): {message['content']}")

# 示例：检查 user_proxy 是否执行了代码，并输出结果

# 我们可以获取到 coderAgent 最后一次生成的代码，并尝试自己调用或让 UserProxyAgent 再次执行

# 找到 coder_agent 的最后一条代码消息

coder_last_code_msg = None

for msg in reversed(groupchat.messages):

if msg['from'] == coder_agent.name and 'content' in msg and '```python' in msg['content']:

coder_last_code_msg = msg

break

if coder_last_code_msg:

code_block = coder_last_code_msg['content'].split('```python')[1].split('```')[0].strip()

print("\n--- Final Code Generated by Coder ---")

print(code_block)

# UserProxyAgent 已经执行了代码，我们可以参考其执行的输出。

# 或者，如果我们想手动触发执行 (如果 UserProxyAgent 没有正确执行的话):

# print("\n--- Manually executing the final code ---")

# result = user_proxy.execute_code(code_block)

# print("Execution Result:", result)

# 更好的方式是利用 UserProxyAgent 的 `execute_code` 方法来调用

# 可以在 manager 中定义一个 `run_code` 的工具，然后让 tester 使用

# 或者直接在 `initiate_chat` 中定义 `code_execution_config` 使得 UserProxyAgent

# 自动执行所有返回的代码块。

4. 运行与观察

保存上面的代码为 main.py，然后在终端运行：

<BASH>

python main.py

您将看到Agent之间一系列的对话：

UserProxy 提出需求。

Coder 收到需求，思考后生成斐波那契函数代码。

Tester 收到代码，开始思考如何测试，可能会写一些测试用例（例如，fib(0), fib(1), fib(5), fib(-1)）。

Tester 将测试代码和原始代码包装起来，通过 user_proxy.execute_code() 执行。

UserProxy 接收测试结果。

如果测试通过，UserProxy 会告知Coder和Tester，并最终结束对话。

如果测试失败（例如，原始代码没有正确处理负数输入），UserProxy 会将失败信息反馈给Coder。

Coder 收到失败信息，修正代码。

Tester 收到修正后的代码，再次进行测试。

这个过程会循环，直到代码通过测试或达到最大轮数。

可能的对话流程示例：

<TEXT>

User: Write a Python function that calculates the nth Fibonacci number using recursion.

The function should handle edge cases like n=0 and negative inputs.

Coder: ... (generates initial code with a potential bug) ...

```python

def fibonacci(n):

if n <= 0:

return "Input should be a positive integer" # Potential bug for n=0

elif n == 1:

return 1

else:

return fibonacci(n-1) + fibonacci(n-2)

TERMINATE

UserProxy: ... (executes the code with fibonacci(0)) ...

Test Result: The function returned "Input should be a positive integer" for n=0, which is not the expectedFibonacci(0) = 0.

Tester: ... (analyzes the result) ...

I found a potential issue with the base case for n=0 in the provided Fibonacci function. The expected output for fibonacci(0) is 0, but the current implementation returns an error message. Let's write a unit test to verify this.

# Test code for Fibonacci function

import unittest

class TestFibonacci(unittest.TestCase):

def test_fib_zero(self):

self.assertEqual(fibonacci(0), 0) # Expecting 0 for fib(0)

def test_fib_one(self):

self.assertEqual(fibonacci(1), 1)

def test_fib_positive(self):

self.assertEqual(fibonacci(5), 5)

def test_fib_negative(self):

self.assertEqual(fibonacci(-1), "Input should be a positive integer")

# Assuming the fibonacci function is defined above in the same scope

# self.assertEqual(autogen.execute_code(code="def fibonacci(...): ..."), "Test failed.") # Simplified representation

I have written unit tests. Running them now.

Test failed. (because fibonacci(0) returned an error message instead of 0)

TERMINATE

UserProxy: ... (feedback to Coder) ...

The tester found a failure in the test_fib_zero case. The function fibonacci(0) returned "Input should be a positive integer" but it should return 0. Please fix the code.

Coder: ... (corrects the code) ...

def fibonacci(n):

if n < 0:

return "Input must be a non-negative integer"

elif n == 0:

return 0 # Corrected base case for n=0

elif n == 1:

return 1

else:

return fibonacci(n - 1) + fibonacci(n - 2)

TERMINATE

UserProxy: ... (executes new code with tests) ...

Test Result: All tests passed.

Tester: All tests passed.

TERMINATE

UserProxy: The code has been successfully tested and corrected. The final Fibonacci function is ready.

FINAL ANSWER: I have written a Python function for calculating Fibonacci numbers and tested it. The function correctly handles n=0, n=1, positive integers, and negative inputs.

<TEXT>

### 五、进阶优化与注意事项

1. **LLM选择：** 对于代码相关的任务，建议使用能力更强、上下文窗口更大的模型（如GPT-4）。

2. **`system_message`精心设计：** Coder Agent和Tester Agent的`system_message`至关重要。明确它们的职责、行为规范、输出格式、工具使用规则。

3. **`speaker_selection_method`：** `auto`是便利的，但有时您可能需要更精确地控制对话流，这时可以自定义一个函数来决定下一个发言者。

4. **`max_consecutive_auto_reply`：** 设置合理的上限，防止Agent陷入无限循环。

5. **`max_round`：** 设定总的对话轮数上限，防止超出预期。

6. **工具的精确定义：** Agent能否正确使用工具，很大程度上取决于工具本身的描述是否清晰、参数定义是否准确。

7. **反思与修正机制：** 在`system_message`中鼓励Agent进行自我反思、检查，寻找代码的潜在问题，并根据tester的反馈进行修改。

8. **记忆管理 (`last_n_messages`)：** 对于多轮复杂的对话，适当增加记忆窗口，让Agent能够回顾更长的上下文。

9. **错误处理与重试：** 在Agent之间设计更精细的错误处理机制，例如，某次工具执行失败后，Agent不要立刻报错，而是尝试用另一种方式或工具。

10. **并发执行 (Parallel Execution):** AutoGen框架正在不断发展，未来可能支持更灵活的并行执行策略，进一步提高效率。