当前位置：首页 > ai >正文

数据集笔记：SeekWorld

ai 2025/7/18 6:35:02

提出了一项新任务：地理定位推理（Geolocation Reasoning）
- 该任务要求模型在感知视觉信息的同时，推理出图像中视觉语义所隐含的高级逻辑关系，从而确定图像的拍摄地点
- TheEighthDay/SeekWorld at main

构建了一个基于规则的地理定位强化学习数据集 SeekWorld

包含两个训练集

Train-Clue-Tracking

从 o3 中收集的针对视觉线索跟踪的详细推理过程数据

{"response": "<think>\n\n<image clue>1: crop_images/20250317_190614-norway_rogaland/image_1.jpg [coordinates: 0.0000, 0.0000, 0.9990, 0.9981]\n\n<image clue>2: crop_images/20250317_190614-norway_rogaland/image_2.jpg [coordinates: 0.2004, 0.3430, 0.5559, 0.6496]\n\nI am considering the architectural origin of this picture.\n\nThe image resolution is low, perhaps due to a screenshot or downsampling issue, making the text difficult to read. It seems that the vertical sign says \"SKAGEN\"? However, the vertically arranged letters on the red building look like \"S N...\". I plan to continue zooming in on the image to better examine these letters.。。。

Train-No-Process

不含推理过程的普通样本数据

{"message": "[{\"role\": \"system\", \"content\": \"You are a helpful assistant good at solving problems with step-by-step reasoning. You should first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags.\"}, {\"role\": \"user\", \"content\": [{\"type\": \"image\", \"image\": \"/data/phd/tiankaibin/dataset/data/streetview_images_first_tier_cities/bourbon_street_french_quarter_new_orleans_la_usa_h45_r100_20250317_184521.jpg\"},{\"type\": \"text\", \"text\": \"In which country and within which first-level administrative region of that country was this picture taken?Please answer in the format of <answer>$country,administrative_area_level_1$</answer>?\"}]}]", "answer": "$united states,louisiana/state of louisiana/la/pelican state$"}