Python智能体开发
以下是一个基于Python的简单AI智能体实现示例,使用强化学习(Q-Learning算法)解决迷宫导航问题。这个案例可以帮助你快速理解AI Agent的核心实现逻辑。
---
### **1. 环境定义(迷宫)**
```python
import numpy as np
# 定义迷宫环境(0=可行走区域,1=障碍,2=目标)
maze = np.array([
[0, 1, 0, 0],
[0, 1, 0, 1],
[0, 0, 0, 1],
[1, 0, 2, 1]
])
start_pos = (0, 0) # 起始位置
```
---
### **2. Q-Learning智能体类**
```python
class QLearningAgent:
def __init__(self, maze, alpha=0.1, gamma=0.9, epsilon=0.1):
self.maze = maze
self.actions = ['up', 'down', 'left', 'right'] # 可用动作
self.q_table = np.zeros((maze.shape[0], maze.shape[1], len(self.actions))) # Q表
self.alpha = alpha # 学习率
self.gamma = gamma # 折扣因子
self.epsilon = epsilon # 探索率
def choose_action(self, state):
# ε-greedy策略选择动作
if np.random.uniform(0, 1) < self.epsilon:
return np.random.choice(self.actions) # 随机探索
else:
x, y = state
return self.actions[np.argmax(self.q_table[x, y])] # 选择最优动作
def update_q_table(self, state, action, reward, next_state):
# Q值更新公式
x, y = state
next_x, next_y = next_state
action_idx = self.actions.index(action)
old_value = self.q_table[x, y, action_idx]
next_max = np.max(self.q_table[next_x, next_y])
new_value = (1 - self.alpha) * old_value + self.alpha * (reward + self.gamma * next_max)
self.q_table[x, y, action_idx] = new_value
def get_next_state(self, state, action):
# 根据动作计算下一个状态
x, y = state
if action == 'up' and x > 0 and self.maze[x-1][y] != 1:
return (x-1, y)
elif action == 'down' and x < self.maze.shape[0]-1 and self.maze[x+1][y] != 1:
return (x+1, y)
elif action == 'left' and y > 0 and self.maze[x][y-1] != 1:
return (x, y-1)
elif action == 'right' and y < self.maze.shape[1]-1 and self.maze[x][y+1] != 1:
return (x, y+1)
return state # 无效动作保持原地
```
---
### **3. 训练过程**
```python
def train_agent(episodes=500):
agent = QLearningAgent(maze)
for episode in range(episodes):
state = start_pos
total_reward = 0
while True:
action = agent.choose_action(state)
next_state = agent.get_next_state(state, action)
# 计算奖励
if maze[next_state] == 2:
reward = 10 # 到达目标
done = True
elif next_state == state:
reward = -1 # 撞墙惩罚
done = False
else:
reward = -0.1 # 每步小惩罚
done = False
# 更新Q表
agent.update_q_table(state, action, reward, next_state)
state = next_state
total_reward += reward
if done or total_reward < -20: # 防止无限循环
break
if (episode+1) % 100 == 0:
print(f"Episode {episode+1}, Total Reward: {total_reward}")
return agent
trained_agent = train_agent()
```
---
### **4. 测试智能体**
```python
def test_agent(agent):
state = start_pos
path = [state]
while True:
action = agent.actions[np.argmax(agent.q_table[state])]
next_state = agent.get_next_state(state, action)
path.append(next_state)
if maze[next_state] == 2:
print("Goal Reached! Path:", path)
break
state = next_state
test_agent(trained_agent)
```
---
### **代码解释**
1. **Q表结构**:`q_table[行][列][动作]` 存储每个状态-动作对的预期收益。
2. **动作选择**:使用ε-greedy策略平衡探索与利用。
3. **奖励设计**:
- 到达目标:+10
- 撞墙:-1
- 每步移动:-0.1(鼓励快速到达目标)
4. **训练输出示例**:
```
Episode 100, Total Reward: -4.3
Episode 200, Total Reward: 6.2
Episode 500, Total Reward: 8.9
Goal Reached! Path: [(0,0), (0,1), (1,1), (2,1), (2,2), (3,2)]
```
---
### **扩展方向**
1. **深度Q网络(DQN)**:使用神经网络替代Q表(适合更大状态空间):
```python
import torch
import torch.nn as nn
class DQN(nn.Module):
def __init__(self, input_size, output_size):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_size, 64),
nn.ReLU(),
nn.Linear(64, output_size)
)
def forward(self, x):
return self.net(x)
```
2. **集成OpenAI Gym**:使用标准环境(如`FrozenLake`):
```python
import gym
env = gym.make('FrozenLake-v1')
```
3. **多智能体协作**:使用`PettingZoo`库实现多Agent系统。
---
### **关键调试技巧**
1. **可视化Q表**:`print(agent.q_table)`
2. **调整超参数**:尝试不同的`alpha`(学习率)和`gamma`(未来奖励折扣)
3. **奖励塑形**:修改奖励函数以加速收敛
这个案例展示了AI Agent开发的核心要素:**环境交互、学习算法、奖励机制**。实际项目中可结合PyTorch/TensorFlow、ROS(机器人)或LangChain(语言模型)构建更复杂的系统。