具身智能数据集解析
数据解析
1. robomimic
HDF5解析
2. serl
pkl
格式,比robomimic
多reward
3. Lerobot
数据结构说明:
dataset attributes:├ hf_dataset: a Hugging Face dataset (backed by Arrow/parquet). Typical features example:│ ├ observation.images.cam_high (VideoFrame):│ │ VideoFrame = {'path': path to a mp4 video, 'timestamp' (float32): timestamp in the video}│ ├ observation.state (list of float32): position of an arm joints (for instance)│ ... (more observations)│ ├ action (list of float32): goal position of an arm joints (for instance)│ ├ episode_index (int64): index of the episode for this sample│ ├ frame_index (int64): index of the frame for this sample in the episode ; starts at 0 for each episode│ ├ timestamp (float32): timestamp in the episode│ ├ next.done (bool): indicates the end of an episode ; True for the last frame in each episode│ └ index (int64): general index in the whole dataset├ episode_data_index: contains 2 tensors with the start and end indices of each episode│ ├ from (1D int64 tensor): first frame index for each episode — shape (num episodes,) starts with 0│ └ to: (1D int64 tensor): last frame index for each episode — shape (num episodes,)├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance│ ├ observation.images.cam_high: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.}│ ...├ info: a dictionary of metadata on the dataset│ ├ codebase_version (str): this is to keep track of the codebase version the dataset was created with│ ├ fps (float): frame per second the dataset is recorded/synchronized to│ ├ video (bool): indicates if frames are encoded in mp4 video files to save space or stored as png files│ └ encoding (dict): if video, this documents the main options that were used with ffmpeg to encode the videos├ videos_dir (Path): where the mp4 videos or png images are stored/accessed└ camera_keys (list of string): the keys to access camera features in the item returned by the dataset (e.g. `["observation.images.cam_high", ...]`)
示例
{"codebase_version": "v2.0","robot_type": "unknown","total_episodes": 800,"total_frames": 20000,"total_tasks": 1,"total_videos": 0,"total_chunks": 1,"chunks_size": 1000,"fps": 15,"splits": {"train": "0:800"},"data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet","video_path": null,"features": {"observation.image": {"dtype": "image","shape": [84,84,3],"names": ["height","width","channel"]},"observation.state": {"dtype": "float32","shape": [4],"names": {"motors": ["motor_0","motor_1","motor_2","motor_3"]}},"action": {"dtype": "float32","shape": [3],"names": {"motors": ["motor_0","motor_1","motor_2"]}},"episode_index": {"dtype": "int64","shape": [1],"names": null},"frame_index": {"dtype": "int64","shape": [1],"names": null},"timestamp": {"dtype": "float32","shape": [1],"names": null},"next.reward": {"dtype": "float32","shape": [1],"names": null},"next.done": {"dtype": "bool","shape": [1],"names": null},"index": {"dtype": "int64","shape": [1],"names": null},"task_index": {"dtype": "int64","shape": [1],"names": null}}
}
4. 智元
总体结构:
data
├── task_info
│ ├── task_327.json
│ ├── task_352.json
│ └── ...
├── observations
│ ├── 327 # This represents the task id.
│ │ ├── 648642 # This represents the episode id.
│ │ │ ├── depth # This is a folder containing depth information saved in PNG format.
│ │ │ ├── videos # This is a folder containing videos from all camera perspectives.
│ │ ├── 648649
│ │ │ └── ...
│ │ └── ...
│ ├── 352
│ │ ├── 648544
│ │ │ ├── depth
│ │ │ ├── videos
│ │ ├── 648564
│ │ │ └── ...
│ └── ...
├── parameters
│ ├── 327
│ │ ├── 648642
│ │ │ ├── camera
│ │ ├── 648649
│ │ │ └── camera
│ │ └── ...
│ └── 352
│ ├── 648544
│ │ ├── camera # This contains all the cameras' intrinsic and extrinsic parameters.
│ └── 648564
│ │ └── camera
| └── ...
├── proprio_stats
│ ├── 327[task_id]
│ │ ├── 648642[episode_id]
│ │ │ ├── proprio_stats.h5 # This file contains all the robot's proprioceptive information.
│ │ ├── 648649
│ │ │ └── proprio_stats.h5
│ │ └── ...
│ ├── 352[task_id]
│ │ ├── 648544[episode_id]
│ │ │ ├── proprio_stats.h5
│ │ └── 648564
│ │ └── proprio_stats.h5
│ └── ...
json示例:
"episode_id": 648060,"task_id": 327,"task_name": "Pickup items in the supermarket","init_scene_text": "The robot is positioned in front of the fruit stand in the supermarket environment.","label_info": {"action_config": [{"start_frame": 44,"end_frame": 183,"action_text": "Retrieve cucumber from the shelf.","skill": "Pick"},{"start_frame": 183,"end_frame": 456,"action_text": "Place the held cucumber into the plastic bag in the shopping cart.","skill": "Place"},{"start_frame": 456,"end_frame": 616,"action_text": "Retrieve tomato from the shelf.","skill": "Pick"},{"start_frame": 616,"end_frame": 840,"action_text": "Place the held tomato into the plastic bag in the shopping cart.","skill": "Place"},{"start_frame": 840,"end_frame": 1069,"action_text": "Retrieve corn from the shelf.","skill": "Pick"},{"start_frame": 1069,"end_frame": 1369,"action_text": "Place the held corn into the shopping cart's plastic bag.","skill": "Place"}],"key_frame": []
h5 示例:
|-- timestamp
|-- state|-- effector|-- force|-- position|-- end|-- angular|-- orientation|-- position|-- velocity|-- wrench|-- head|-- effort|-- position|-- velocity|-- joint|-- current_value|-- effort|-- position|-- velocity|-- robot|-- orientation|-- orientation_drift|-- position|-- position_drift|-- waist|-- effort|-- position|-- velocity
|-- action|-- effector|-- force|-- index|-- position|-- end|-- orientation|-- position|-- head|-- effort|-- position|-- velocity|-- joint|-- effort|-- index|-- position|-- velocity|-- robot|-- index|-- orientation|-- position|-- velocity|-- waist|-- effort|-- position|-- velocity```
解释
Group Shape Meaning
/timestamp [N] timestamp in nanoseconds
/state/effector/position (gripper) [N, 2] left [:, 0], right [:, 1], gripper open range in mm
/state/effector/position (dexhand) [N, 12] left [:, :6], right [:, 6:], joint angle in rad
/state/end/orientation [N, 2, 4] left [:, 0, :], right [:, 1, :], flange quaternion with xyzw
/state/end/position [N, 2, 3] left [:, 0, :], right [:, 1, :], flange xyz in meters
/state/head/position [N, 2] yaw [:, 0], pitch [:, 1], rad
/state/joint/current_value [N, 14] left arm [:, :7], right arm [:, 7:]
/state/joint/position [N, 14] left arm [:, :7], right arm [:, 7:], rad
/state/robot/orientation [N, 4] quaternion in xyzw, yaw only
/state/robot/position [N, 3] xyz position, where z is always 0 in meters
/state/waist/position [N, 2] pitch [:, 0] in rad, lift [:, 1]in meters
/action/*/index [M] actions indexes refer to when the control source is actually sending signals
/action/effector/position (gripper) [N, 2] left [:, 0], right [:, 1], 0 for full open and 1 for full close
/action/effector/position (dexhand) [N, 12] same as /state/effector/position
/action/effector/index [M_1] index when the control source for end effector is sending control signals
/action/end/orientation [N, 2, 4] same as /state/end/orientation
/action/end/position [N, 2, 3] same as /state/end/position
/action/end/index [M_2] same as other indexes
/action/head/position [N, 2] same as /state/head/position
/action/head/index [M_3] same as other indexes
/action/joint/position [N, 14] same as /state/joint/position
/action/joint/index [M_4] same as other indexes
/action/robot/velocity [N, 2] vel along x axis [:, 0], yaw rate [:, 1]
/action/robot/index [M_5] same as other indexes
/action/waist/position [N, 2] same as /state/waist/position
/action/waist/index [M_6] same as other indexes