当前位置：首页 > news >正文

如何硬解析 .shp 文件中的几何体，拯救 .dbf、.shx 等文件缺失的 ESRI Shapefile 格式文件

news 2025/7/25 20:16:59

最近遇到一个问题就是 ESRI Shapefile 格式文件中 .dbf 文件损坏，导致 .shp 无法打开，又不能放弃这个数据，所以只能做修复，网上搜了很多方法，结果都失败了，所以用 deepseek 查询应解析的方法，亲测可行（简单多边形），现将方法分享：

ESRI Shapefile 文件组成

1 核心文件

扩展名	必选	用途
.shp	✅	存储几何图形（点、线、面等）
.shx	✅	存储几何索引（加速查询）
.dbf	✅	存储属性数据（类似表格）
.prj	❌	坐标系统定义（如 WGS84）
.cpg	❌	字符编码声明（如 UTF-8）

必须同时存在 .shp、.shx、.dbf 文件才能正常使用。
缺失 .prj 可能导致坐标系统未知。

2 .shp 文件结构（二进制格式）

文件头（Header）

偏移量	长度	类型	描述
0	4B	`int32`	文件标识码（固定 `9994`）
24	4B	`int32`	版本号（`1000`）
28	4B	`int32`	几何类型（见下表）
36-68	32B	`double`	边界框（xmin, ymin, xmax, ymax）

几何记录（Record）

记录头（8B）：
- int32 记录编号（从 1 开始）
- int32 记录长度（以 16-bit 字为单位）
记录内容：
- int32 几何类型
- 几何数据（坐标点等）

Shapefile 几何类型

类型值	名称	描述
`0`	Null Shape	空几何
`1`	Point	二维点 `[x, y]`
`3`	PolyLine	多段线
`5`	Polygon	多边形（首尾闭合）
`8`	MultiPoint	多点集合
`11`	PointZ	三维点 `[x, y, z, m]`

多边形规则

外环：顶点按 逆时针 排列
内环（孔洞）：顶点按 顺时针 排列

Python 硬解析只有 .shp 文件的已损坏的 ESRI Shapefile 格式数据（Polygon）

环境和数据准备

安装 gma：pip install gma

本文基于：gma 2.1.7，Python 3.12

本文用到数据：
链接: https://pan.baidu.com/s/1kMlji9xqSsYIVn1cygIewA?pwd=9nc1
提取码: 9nc1

代码

import structdef parse_polygon_shp(shp_path):polygons = []iii = 0with open(shp_path, "rb") as f:# 跳过文件头（100字节）f.read(100)while True:# 读取记录头（8字节：记录号+长度）record_header = f.read(8)if not record_header:break# 解析记录头（大端序）record_num = struct.unpack(">i", record_header[:4])[0]record_len = struct.unpack(">i", record_header[4:8])[0]  # 单位：16-bit words# 读取记录内容（长度：record_len * 2 字节）try:record_data = f.read(record_len * 2)except:breakif len(record_data) < 44:  # 多边形至少需要44字节continue# 解析几何类型（小端序，从记录内容开始）shape_type = struct.unpack("<i", record_data[0:4])[0]if shape_type != 5:  # 仅处理多边形continue# 解析边界框（4个double）bbox = struct.unpack("<4d", record_data[4:36])# 解析环数和点数num_parts = struct.unpack("<i", record_data[36:40])[0]num_points = struct.unpack("<i", record_data[40:44])[0]# 解析环的起始索引（num_parts个int32）parts = []offset = 44parts_data = record_data[offset : offset + 4 * num_parts]parts = list(struct.unpack(f"<{num_parts}i", parts_data))offset += 4 * num_parts# 解析所有点（num_points个xy对）points = []points_data = record_data[offset : offset + 16 * num_points]for i in range(num_points):x = struct.unpack("<d", points_data[16*i : 16*i + 8])[0]y = struct.unpack("<d", points_data[16*i + 8 : 16*i + 16])[0]points.append((x, y))# 保存多边形数据polygons.append({"shape_type": shape_type,"bbox": bbox,"num_parts": num_parts,"num_points": num_points,"parts": parts,"points": points})return polygonsshp_path = "青海省.shp"
res = parse_polygon_shp(shp_path)import pandas as pd
from gma import iofts = []
for data in res:pt = data['points']ft = io.CreateFeature(pt, GeomType = 'Polygon', Projection = 'WGS84')fts.append(ft)ly = io.CreateLayer(fts)
ly.SaveAs("青海省_New.shp")