当前位置：首页 > news >正文

具身智能数据集解析

news 2025/7/6 5:00:17

数据解析

1. robomimic

HDF5解析

2. serl

pkl格式，比robomimic多reward

3. Lerobot

数据结构说明：

dataset attributes:├ hf_dataset: a Hugging Face dataset (backed by Arrow/parquet). Typical features example:│  ├ observation.images.cam_high (VideoFrame):│  │   VideoFrame = {'path': path to a mp4 video, 'timestamp' (float32): timestamp in the video}│  ├ observation.state (list of float32): position of an arm joints (for instance)│  ... (more observations)│  ├ action (list of float32): goal position of an arm joints (for instance)│  ├ episode_index (int64): index of the episode for this sample│  ├ frame_index (int64): index of the frame for this sample in the episode ; starts at 0 for each episode│  ├ timestamp (float32): timestamp in the episode│  ├ next.done (bool): indicates the end of an episode ; True for the last frame in each episode│  └ index (int64): general index in the whole dataset├ episode_data_index: contains 2 tensors with the start and end indices of each episode│  ├ from (1D int64 tensor): first frame index for each episode — shape (num episodes,) starts with 0│  └ to: (1D int64 tensor): last frame index for each episode — shape (num episodes,)├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance│  ├ observation.images.cam_high: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.}│  ...├ info: a dictionary of metadata on the dataset│  ├ codebase_version (str): this is to keep track of the codebase version the dataset was created with│  ├ fps (float): frame per second the dataset is recorded/synchronized to│  ├ video (bool): indicates if frames are encoded in mp4 video files to save space or stored as png files│  └ encoding (dict): if video, this documents the main options that were used with ffmpeg to encode the videos├ videos_dir (Path): where the mp4 videos or png images are stored/accessed└ camera_keys (list of string): the keys to access camera features in the item returned by the dataset (e.g. `["observation.images.cam_high", ...]`)

示例

{"codebase_version": "v2.0","robot_type": "unknown","total_episodes": 800,"total_frames": 20000,"total_tasks": 1,"total_videos": 0,"total_chunks": 1,"chunks_size": 1000,"fps": 15,"splits": {"train": "0:800"},"data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet","video_path": null,"features": {"observation.image": {"dtype": "image","shape": [84,84,3],"names": ["height","width","channel"]},"observation.state": {"dtype": "float32","shape": [4],"names": {"motors": ["motor_0","motor_1","motor_2","motor_3"]}},"action": {"dtype": "float32","shape": [3],"names": {"motors": ["motor_0","motor_1","motor_2"]}},"episode_index": {"dtype": "int64","shape": [1],"names": null},"frame_index": {"dtype": "int64","shape": [1],"names": null},"timestamp": {"dtype": "float32","shape": [1],"names": null},"next.reward": {"dtype": "float32","shape": [1],"names": null},"next.done": {"dtype": "bool","shape": [1],"names": null},"index": {"dtype": "int64","shape": [1],"names": null},"task_index": {"dtype": "int64","shape": [1],"names": null}}
}

4. 智元

总体结构：

data
├── task_info
│   ├── task_327.json
│   ├── task_352.json
│   └── ...
├── observations
│   ├── 327 # This represents the task id.
│   │   ├── 648642 # This represents the episode id.
│   │   │   ├── depth # This is a folder containing depth information saved in PNG format.
│   │   │   ├── videos # This is a folder containing videos from all camera perspectives.
│   │   ├── 648649
│   │   │   └── ...
│   │   └── ...
│   ├── 352
│   │   ├── 648544
│   │   │   ├── depth
│   │   │   ├── videos
│   │   ├── 648564
│   │   │   └── ...
│   └── ...
├── parameters
│   ├── 327
│   │   ├── 648642
│   │   │   ├── camera
│   │   ├── 648649
│   │   │   └── camera
│   │   └── ...
│   └── 352
│       ├── 648544
│       │   ├── camera # This contains all the cameras' intrinsic and extrinsic parameters.
│       └── 648564
│       │    └── camera
|       └── ...
├── proprio_stats
│   ├── 327[task_id]
│   │   ├── 648642[episode_id]
│   │   │   ├── proprio_stats.h5 # This file contains all the robot's proprioceptive information.
│   │   ├── 648649
│   │   │   └── proprio_stats.h5
│   │   └── ...
│   ├── 352[task_id]
│   │   ├── 648544[episode_id]
│   │   │   ├── proprio_stats.h5
│   │   └── 648564
│   │    └── proprio_stats.h5
│   └── ...

json示例：

"episode_id": 648060,"task_id": 327,"task_name": "Pickup items in the supermarket","init_scene_text": "The robot is positioned in front of the fruit stand in the supermarket environment.","label_info": {"action_config": [{"start_frame": 44,"end_frame": 183,"action_text": "Retrieve cucumber from the shelf.","skill": "Pick"},{"start_frame": 183,"end_frame": 456,"action_text": "Place the held cucumber into the plastic bag in the shopping cart.","skill": "Place"},{"start_frame": 456,"end_frame": 616,"action_text": "Retrieve tomato from the shelf.","skill": "Pick"},{"start_frame": 616,"end_frame": 840,"action_text": "Place the held tomato into the plastic bag in the shopping cart.","skill": "Place"},{"start_frame": 840,"end_frame": 1069,"action_text": "Retrieve corn from the shelf.","skill": "Pick"},{"start_frame": 1069,"end_frame": 1369,"action_text": "Place the held corn into the shopping cart's plastic bag.","skill": "Place"}],"key_frame": []

h5 示例：

|-- timestamp
|-- state|-- effector|-- force|-- position|-- end|-- angular|-- orientation|-- position|-- velocity|-- wrench|-- head|-- effort|-- position|-- velocity|-- joint|-- current_value|-- effort|-- position|-- velocity|-- robot|-- orientation|-- orientation_drift|-- position|-- position_drift|-- waist|-- effort|-- position|-- velocity
|-- action|-- effector|-- force|-- index|-- position|-- end|-- orientation|-- position|-- head|-- effort|-- position|-- velocity|-- joint|-- effort|-- index|-- position|-- velocity|-- robot|-- index|-- orientation|-- position|-- velocity|-- waist|-- effort|-- position|-- velocity```
解释

Group Shape Meaning
/timestamp [N] timestamp in nanoseconds
/state/effector/position (gripper) [N, 2] left [:, 0], right [:, 1], gripper open range in mm
/state/effector/position (dexhand) [N, 12] left [:, :6], right [:, 6:], joint angle in rad
/state/end/orientation [N, 2, 4] left [:, 0, :], right [:, 1, :], flange quaternion with xyzw
/state/end/position [N, 2, 3] left [:, 0, :], right [:, 1, :], flange xyz in meters
/state/head/position [N, 2] yaw [:, 0], pitch [:, 1], rad
/state/joint/current_value [N, 14] left arm [:, :7], right arm [:, 7:]
/state/joint/position [N, 14] left arm [:, :7], right arm [:, 7:], rad
/state/robot/orientation [N, 4] quaternion in xyzw, yaw only
/state/robot/position [N, 3] xyz position, where z is always 0 in meters
/state/waist/position [N, 2] pitch [:, 0] in rad, lift [:, 1]in meters
/action/*/index [M] actions indexes refer to when the control source is actually sending signals
/action/effector/position (gripper) [N, 2] left [:, 0], right [:, 1], 0 for full open and 1 for full close
/action/effector/position (dexhand) [N, 12] same as /state/effector/position
/action/effector/index [M_1] index when the control source for end effector is sending control signals
/action/end/orientation [N, 2, 4] same as /state/end/orientation
/action/end/position [N, 2, 3] same as /state/end/position
/action/end/index [M_2] same as other indexes
/action/head/position [N, 2] same as /state/head/position
/action/head/index [M_3] same as other indexes
/action/joint/position [N, 14] same as /state/joint/position
/action/joint/index [M_4] same as other indexes
/action/robot/velocity [N, 2] vel along x axis [:, 0], yaw rate [:, 1]
/action/robot/index [M_5] same as other indexes
/action/waist/position [N, 2] same as /state/waist/position
/action/waist/index [M_6] same as other indexes