当前位置: 首页 > ds >正文

具身系列——Double DQN算法实现CartPole游戏(强化学习)

完整代码参考: rl/ddqn_cartpole.py · 陈先生/ailib - Gitee.com 

部分训练得分:

Model saved to ./output/best_model.pth
New best model saved with average reward: 9.6
Episode:   0 | Train Reward:  25.0 | Epsilon: 0.995 | Best Eval Avg: 9.6
Episode:   1 | Train Reward:  14.0 | Epsilon: 0.990 | Best Eval Avg: 9.6
Episode:   2 | Train Reward:  12.0 | Epsilon: 0.985 | Best Eval Avg: 9.6
Episode:   3 | Train Reward:  20.0 | Epsilon: 0.980 | Best Eval Avg: 9.6
Episode:   4 | Train Reward:  40.0 | Epsilon: 0.975 | Best Eval Avg: 9.6
Episode:   5 | Train Reward:  12.0 | Epsilon: 0.970 | Best Eval Avg: 9.6
Episode:   6 | Train Reward:  22.0 | Epsilon: 0.966 | Best Eval Avg: 9.6
Episode:   7 | Train Reward:  13.0 | Epsilon: 0.961 | Best Eval Avg: 9.6
Episode:   8 | Train Reward:  15.0 | Epsilon: 0.956 | Best Eval Avg: 9.6
Episode:   9 | Train Reward:  16.0 | Epsilon: 0.951 | Best Eval Avg: 9.6
……
Model saved to ./output/best_model.pth
New best model saved with average reward: 251.6
Episode:  90 | Train Reward:  25.0 | Epsilon: 0.634 | Best Eval Avg: 251.6
Episode:  91 | Train Reward: 150.0 | Epsilon: 0.631 | Best Eval Avg: 251.6
Episode:  92 | Train Reward:  44.0 | Epsilon: 0.627 | Best Eval Avg: 251.6
Episode:  93 | Train Reward:  23.0 | Epsilon: 0.624 | Best Eval Avg: 251.6
Episode:  94 | Train Reward:  27.0 | Epsilon: 0.621 | Best Eval Avg: 251.6
Episode:  95 | Train Reward:  79.0 | Epsilon: 0.618 | Best Eval Avg: 251.6
Episode:  96 | Train Reward: 103.0 | Epsilon: 0.615 | Best Eval Avg: 251.6
Episode:  97 | Train Reward:  45.0 | Epsilon: 0.612 | Best Eval Avg: 251.6
Episode:  98 | Train Reward:  33.0 | Epsilon: 0.609 | Best Eval Avg: 251.6
Episode:  99 | Train Reward:  33.0 | Epsilon: 0.606 | Best Eval Avg: 251.6
Test Episode: 0, Reward: 228.0
Test Episode: 1, Reward: 222.0
Test Episode: 2, Reward: 207.0

http://www.xdnf.cn/news/3914.html

相关文章:

  • 软考 系统架构设计师系列知识点之杂项集萃(53)
  • 软考 系统架构设计师系列知识点之杂项集萃(52)
  • PowerShell 备份 Windows10/11 还原计算机驱动程序SOP
  • TimSort算法解析
  • 计算机网络:详解TCP协议(四次握手三次挥手)
  • Fortran语言,do-end do循环,相互包含测试,自动性能优化
  • qml显示视频帧(QQuickImageProvider)
  • 学习黑客红队模拟演练报告
  • SpringBoot的汽车商城后台管理系统源码开发实现
  • YOLOv7细节解读
  • Go语言实现Kafka消息队列
  • NaVILA: Legged Robot Vision-Language-ActionModel for Navigation
  • PHP的include和require
  • FGMRES(Flexible Generalized Minimal Residual)方法
  • 系统思考:核心价值与竞争力
  • 永磁同步电机控制算法--基于PI的位置伺服控制
  • C# 方法(返回值、返回语句和void方法)
  • 微服务框架选型
  • SpringMVC——第三章:获取请求数据
  • React--》掌握react构建拖拽交互的技巧
  • Linux ACPI - ACPI系统描述表架构(2)
  • 【Redis】Redis常用命令
  • 软件架构之旅(6):浅析ATAM 在软件技术架构评估中的应用
  • 蓝桥杯15届国赛 合法密码
  • 嵌入式系统基础知识
  • 【Hive入门】Hive与Spark SQL集成:混合计算实践指南
  • python使用cv2在图片上标点
  • Python语句类型与格式规范研究
  • RT-Thread studio的驱动5.1.0报错修改
  • c++学习