具身系列——Q-Learning算法实现CartPole游戏(强化学习)
完整代码参考: rl/qlearning_cartpole.py · 陈先生/ailib - Gitee.com
部分训练得分:
Episode 0 Reward: 19.0 Avg Reward: 19.00 Time: 0.00s
Episode 1 Reward: 17.0 Avg Reward: 18.98 Time: 0.00s
Episode 2 Reward: 10.0 Avg Reward: 18.89 Time: 0.00s
Episode 3 Reward: 30.0 Avg Reward: 19.00 Time: 0.00s
Episode 4 Reward: 23.0 Avg Reward: 19.04 Time: 0.00s
Episode 5 Reward: 10.0 Avg Reward: 18.95 Time: 0.00s
Episode 6 Reward: 12.0 Avg Reward: 18.88 Time: 0.00s
Episode 7 Reward: 27.0 Avg Reward: 18.96 Time: 0.00s
Episode 8 Reward: 24.0 Avg Reward: 19.01 Time: 0.00s
Episode 9 Reward: 16.0 Avg Reward: 18.98 Time: 0.00s
……
Episode 90 Reward: 10.0 Avg Reward: 22.67 Time: 0.00s
Episode 91 Reward: 11.0 Avg Reward: 22.55 Time: 0.00s
Episode 92 Reward: 9.0 Avg Reward: 22.42 Time: 0.00s
Episode 93 Reward: 22.0 Avg Reward: 22.41 Time: 0.00s
Episode 94 Reward: 30.0 Avg Reward: 22.49 Time: 0.00s
Episode 95 Reward: 26.0 Avg Reward: 22.52 Time: 0.00s
Episode 96 Reward: 11.0 Avg Reward: 22.41 Time: 0.00s
Episode 97 Reward: 10.0 Avg Reward: 22.29 Time: 0.00s
Episode 98 Reward: 9.0 Avg Reward: 22.15 Time: 0.00s
Episode 99 Reward: 23.0 Avg Reward: 22.16 Time: 0.00s
Training completed. Final Q-table:[[0. 0.][0. 0.][0. 0.]...[0. 0.][0. 0.][0. 0.]]