当前位置：首页 > ops >正文

EngineAI 1. Start/Resume Training

ops 2025/6/17 12:16:27

过段时间再整理吧，最新的可以去看Docs

Start/Resume Training

Args

--exp_name EXP_NAME: Experiment name.
--sub_exp_name SUB_EXP_NAME: Name of the sub-experiment to run or load, default is default.
--run_name RUN_NAME: Name of the run, default is current time %Y-%m-%d_%H-%M-%S.
--log_root：Path of log_root, default is engineai_rl_workspace/logs/{exp_name}/{sub_exp_name}.
--load_run LOAD_RUN: Name of the run to load when resume=True, default is -1. If -1: will load the last run.
--checkpoint CHECKPOINT: Saved model checkpoint number, default is -1. If -1: will load the last checkpoint.
--resume: Resume training from a checkpoint
--run_exist: Run training from an existing run with its config.json.
--debug: In debug mode, no logs will be saved.
--num_envs NUM_ENVS: Number of environments to create.
--seed SEED: Random seed.
--max_iterations MAX_ITERATIONS: Maximum number of training iterations.
--logger LOGGER: Logger module to use. Choice: tensorboard, wandb, neptune.
--upload_model: upload models to wandb or neptune.
--sim_device SIM_DEVICE: Device used by the simulator, (cpu, gpu, cuda:0, cuda:1 etc..), default is cuda:0.
--rl_device RL_DEVICE: Device used by the RL algorithm, (cpu, gpu, cuda:0, cuda:1 etc..), default is cuda:0.
--video: Record video during training. Headless mode also works.
--record_length RECORD_LENGTH: The number of steps to record for videos, default is 200.
--record_interval RECORD_INTERVAL: The number of step as interval to record a video.
--fps FPS: The fps of recorded videos, default is 50.
--frame_size FRAME_SIZE: The size of recorded frame, default is (1280, 720).
--camera_offset CAMERA_OFFSET: The offset of the video filming camera, default is (0, -2, 0).
--camera_rotation CAMREA_ROTATION: The rotation of the video filming camera, default is (0, 0, 90).
--env_idx_record ENV_IDX_RECORD: The env idx to record, default is 0.
--actor_idx_record ACTOR_IDX_RECORD: The actor idx to record, default is 0.
--rigid_body_idx_record RIGID_BODY_IDX_RECORD: The rigid_body idx to record, default is 0.

Examples

From Scratch

Files required to resume the run will be saved for resume or play, which will work even when the code is changed.

# basic

python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo

# headless

python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless

# use specific logger

python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --logger wandb

# run with params overriden python

engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --num_envs 4096 --max_iterations 30000 --seed 1

Video Recording

# default

setting python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --video # custom setting python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --video --record_length 500 --record_interval 100 --fps 100 --frame_size=1920,1080 --camera_offset=-2,0,0 --camera_rotation=0,0,90 --env_idx_record 1 --actor_idx_record 1 --rigid_body_idx_record

From `.json` Config from Scratch

Since a config is saved for each, if you want to start a new run with modification of the .json config of a old run, you can create a new folder copying the old config, modify the config, and run a training from it.

The Algos config files will be converted to .py config files, and used for training.

# from a default sub_exp_name

python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --run_exist --load_run 2025-06-03_12-00-00 # from a specific sub_exp_name python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --run_exist --sub_exp_name fixed_std --load_run 2025-06-03_12-00-00

Using a Specific Logger (Tensorboard, Wandb, Neptune)

python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --logger wandb

Resume a Run

# resume from default log root python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --resume --load_run 2025-06-03_12-00-00 # resume from a specific log root python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --resume --log_root ~/server/engineai_rl_workspace/logs/pm01_rough_ppo/default --load_run 2025-06-03_12-00-00 # resume from a specific checkpoint python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --resume --load_run 2025-06-03_12-00-00 --checkpoint 30000

Debug Mode

Training won't save log files in debug mode, so user can maintain a clean log directory

# debug mode 
python engineai_rl_workspace/scripts/train.py --exp_name pm01_rough_ppo --headless --debug

查看全文

http://www.xdnf.cn/news/14130.html

pyhton基础【7】容器介绍二

iOS 审核 cocos 4.3a【苹果机审的“分层阈值”设计】

详解智能指针

大规模异步新闻爬虫的分布式实现

理解C++中传引用和传值的区别

CTFshow-PWN-栈溢出（pwn56-pwn59）

学习Oracle------认识VARCHAR2

langchain从入门到精通（七）——利用回调功能调试链应用 - 让过程更透明

Wiiu平台RetroArch全能模拟器美化整合包v1.18

【大模型应用开发】SpringBoot 整合基于 Ollama 的 DeepSeek，并对接前端（全部代码！！！）

TensorFlow 2.0 与 Python 3.11 兼容性

查找PPT中引用的图表在哪个EXCEL文件中

笔记本电脑安装win11哪个版本好_笔记本电脑安装win11专业版图文教程

Spring中观察者模式的应用

【论文解读】AgentThink：让VLM在自动驾驶中学会思考与使用工具

sql列中数据通过逗号分割的集合，对其中的值进行全表查重

NAS 资源帖

STM32项目---汽车氛围灯

flowable工作流的学习demo

【本地虚拟机】xshell连接虚拟机linux服务器

云平台|Linux部分指令

【Erdas实验教程】021：遥感图像辐射增强（查找表拉伸）

NLP学习路线图（四十七）：隐私保护

YOLOv8新突破：FASFFHead多尺度检测的极致探索

【模板】埃拉托色尼筛法（埃氏筛）

Spring-rabbit重试消费源码分析

OCCT基础类库介绍：Modeling Data - 2D Geometry 3D Geometry Topology

Javascript和NodeJS异常捕获对比

C++基础算法————二分查找

深度学习——基于卷积神经网络实现食物图像分类【1】（datalodar处理方法）