轻量xlsx读取库xlsx_drone的编译与测试
这个库是在看其他网页时,作为和功能丰富的xlsxio库的对比来的,按照xlsx_drone github页面介绍,
特征
- 不使用任何外部应用程序来解析它们。
- 注重速度而不是功能。
- 简单的接口。
- UTF-8 支持。
安装
直接将 src 和 ext 文件夹复制并粘贴到项目根文件夹和源代码中。您可能希望以不同的方式容纳文件是可以理解的,但请注意 xlsx_drone.h 使用相对路径调用其名称:#include “xlsx_drone.h”
// external libraries
#include "../ext/zip.h"
#include "../ext/sxmlc.h"
#include "../ext/sxmlsearch.h"
除此以外,连一个安装步骤都没有写,只给出了几个代码片段,没法直接使用,文档也就让你参考 src/xlsx_drone.h。
还好源代码包中有CMakelists.txt,那就表示能用cmake & make来生成。实际结果确实生成了动态库, 而且几乎瞬间生成。
/# cd par/xlsx_drone
/par/xlsx_drone# mkdir build
/par/xlsx_drone# cd build
/par/xlsx_drone/build# cmake ..
-- The C compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /par/xlsx_drone/build
/par/xlsx_drone/build# make
[ 20%] Building C object CMakeFiles/xlsx_drone.dir/ext/zip.c.o
In file included from /par/xlsx_drone/ext/zip.c:37:
/par/xlsx_drone/ext/miniz.h:5106:9: note: '#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.'5106 | #pragma message( \| ^~~~~~~
[ 40%] Building C object CMakeFiles/xlsx_drone.dir/ext/sxmlc.c.o
[ 60%] Building C object CMakeFiles/xlsx_drone.dir/ext/sxmlsearch.c.o
[ 80%] Building C object CMakeFiles/xlsx_drone.dir/src/xlsx_drone.c.o
[100%] Linking C shared library libxlsx_drone.so
[100%] Built target xlsx_drone
浏览源码目录树,有个test目录下面有个xlsx_drone.test.c源文件,其中还有个helpers目录,装了几个看上去用来测试的文件。
上手编译它,很多链接找不到引用错误,其中一些像单元测试函数,一些像xlsx_drone中的函数。
/par/xlsx_drone/test# ls
helpers xlsx_drone.test.c
/par/xlsx_drone/test# gcc xlsx_drone.test.c -o xlsx_drone
/usr/bin/ld: /tmp/ccqiKV5x.o: in function `test_xlsx_open':
xlsx_drone.test.c:(.text+0x1c): undefined reference to `xlsx_set_print_err_messages'
/usr/bin/ld: xlsx_drone.test.c:(.text+0x2d): undefined reference to `xlsx_open'
/usr/bin/ld: xlsx_drone.test.c:(.text+0x4c): undefined reference to `UnityAssertEqualNumber'
/usr/bin/ld: xlsx_drone.test.c:(.text+0x51): undefined reference to `xlsx_get_xlsx_errno'
注意到,此文件开头包含了#include "…/ext/unity.h"语句,而ext目录下除了有unity的头文件,还有c源代码,在Makefile中没有用到,那就加上unity.c一起编译,一下子,单元测试引用的错误全消失了,只留下xlsx类错误。
gcc xlsx_drone.test.c ../ext/unity.c -o xlsx_drone/usr/bin/ld: /tmp/ccD6n0u6.o: in function `test_xlsx_open':
xlsx_drone.test.c:(.text+0x1c): undefined reference to `xlsx_set_print_err_messages'
/usr/bin/ld: xlsx_drone.test.c:(.text+0x2d): undefined reference to `xlsx_open'
再加上-L 和-l参数,就链接成功了,把共享动态库目录加入LD_LIBRARY_PATH,就能执行了。执行报错,好像是打开文件失败,
/par/xlsx_drone/test# gcc xlsx_drone.test.c ../ext/unity.c -o xlsx_drone -L /par/xlsx_drone/build -lxlsx_drone
/par/xlsx_drone/test# ./xlsx_drone
./xlsx_drone: error while loading shared libraries: libxlsx_drone.so: cannot open shared object file: No such file or directory/par/xlsx_drone/test# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/par/xlsx_drone/build
/par/xlsx_drone/test# ./xlsx_drone
xlsx_drone.test.c:19:test_xlsx_open:FAIL: Expected 1 Was 0
xlsx_drone.test.c:370:test_xlsx_load_sheet:FAIL: Expected 0 Was -13
Segmentation fault (core dumped)
看xlsx_drone.test.c源码,里面的路径是"test\helpers\non_existent.xlsx"之类,于是再在父目录执行,结果还是报错。最后把Windows的\目录分隔符全都替换成Linux的/,重新编译,执行成功,如下所示;
/par/xlsx_drone# gcc test/xlsx_drone.test2.c ext/unity.c -o xlsx_drone2 -L /par/xlsx_drone/build -lxlsx_drone
/par/xlsx_drone# ./xlsx_drone2
test/xlsx_drone.test2.c:16:test_xlsx_open:FAIL: Expected 0 Was 1
test/xlsx_drone.test2.c:1231:test_xlsx_load_sheet:PASS
test/xlsx_drone.test2.c:1232:test_xlsx_unload_sheet:PASS
test/xlsx_drone.test2.c:1233:test_xlsx_get_last_column:PASS
test/xlsx_drone.test2.c:476:test_xlsx_read_cell:FAIL: Unity Double Precision Disabled
test/xlsx_drone.test2.c:1235:test_xlsx_close:PASS-----------------------
6 Tests 2 Failures 0 Ignored
FAIL
其中第一个测试是测不存在的文件,我故意放了一个non_existent.xlsx在helpers,导致期望0的测试执行结果为1,也报错了。
再把首页的代码片段放入一个main函数,并包含头文件,
#include "src/xlsx_drone.h"
int main()
{// open *.xlsxxlsx_workbook_t wb;xlsx_open("test/helpers/sample.xlsx", &wb);// be free to inspect some wb dataint number_of_sheets = wb.n_sheets;// load sheetxlsx_sheet_t *sheet_1 = xlsx_load_sheet(&wb, 1, NULL);// be free to inspect some sheet datachar *sheet_name = sheet_1->name;int last_row = sheet_1->last_row; // valued 0 if the sheet is empty// as of version 0.2.0 you can retrieve the last column (more info below)char *last_column = xlsx_get_last_column(sheet_1); // i.e.: "FB" or "R", etc.// read cellxlsx_cell_t cell_data_holder; xlsx_read_cell(sheet_1, 4, "B", &cell_data_holder);// inspect resultswitch(cell_data_holder.value_type) {case XLSX_POINTER_TO_CHAR:printf("Cell 4B has value: %s", cell_data_holder.value.pointer_to_char_value);break;case XLSX_INT:printf("Cell 4B has value: %d", cell_data_holder.value.int_value);break;case XLSX_LONG_LONG:printf("Cell 4B has value: %lld", cell_data_holder.value.long_long_value);break;case XLSX_DOUBLE:printf("Cell 4B has value: %f", cell_data_holder.value.double_value);break;default:printf("Cell 4B has no value");}// you can also inspect the cell categoryint cell_category = cell_data_holder.style->related_category; /*typedef enum xlsx_cell_category {XLSX_NUMBER, // int, long long, or doubleXLSX_TEXT, // stringXLSX_DATE, // intXLSX_TIME, // doubleXLSX_DATE_TIME, // doubleXLSX_UNKNOWN} xlsx_cell_category;*/char cat[7][10]={"NUMBER", "TEXT", "DATE", "TIME", "DATE_TIME", "UNKNOWN"};printf("Cell 4B has category XLSX_%s\n", cat[cell_category]);// when you're done reading the XLSX, close the workbook to properly free resourcesxlsx_close(&wb);
}
编译运行结果如下
gcc example.c -o example -L /par/xlsx_drone/build -lxlsx_drone
/par/xlsx_drone# ./example
Cell 4B has value: -1000Cell 4B has category XLSX_NUMBER
再让DeepSeek照样编写一个把xlsx转为csv的程序。
#include "src/xlsx_drone.h"
#include <stdio.h> // 添加标准IO头文件用于CSV操作int main()
{// 打开 *.xlsx 文件xlsx_workbook_t wb;xlsx_open("test/helpers/sample.xlsx", &wb);// 加载第一个工作表xlsx_sheet_t *sheet = xlsx_load_sheet(&wb, 1, NULL);// 创建并打开CSV文件用于写入FILE *csv_file = fopen("output.csv", "w");if (csv_file == NULL) {printf("无法创建CSV文件\n");xlsx_close(&wb);return 1;}// 获取工作表尺寸信息int last_row = sheet->last_row;char *last_column = xlsx_get_last_column(sheet);// 遍历所有行和列for (int row = 1; row <= last_row; row++) {for (char col_char = 'A'; col_char <= last_column[0]; col_char++) {// 处理多字母列名(如AA, AB等)char col_str[3] = {col_char, '\0'};if (last_column[1] != '\0') {// 如果需要处理两字母列名,这里需要扩展逻辑}// 读取单元格数据xlsx_cell_t cell;xlsx_read_cell(sheet, row, col_str, &cell);// 根据数据类型写入CSVswitch(cell.value_type) {case XLSX_POINTER_TO_CHAR:fprintf(csv_file, "\"%s\"", cell.value.pointer_to_char_value);break;case XLSX_INT:fprintf(csv_file, "%d", cell.value.int_value);break;case XLSX_LONG_LONG:fprintf(csv_file, "%lld", cell.value.long_long_value);break;case XLSX_DOUBLE:fprintf(csv_file, "%f", cell.value.double_value);break;default:// 空单元格留空break;}// 添加逗号分隔符(最后一列后不加逗号)if (col_char < last_column[0]) {fprintf(csv_file, ",");}}// 换行表示新的一行fprintf(csv_file, "\n");}// 关闭CSV文件fclose(csv_file);// 关闭工作簿并释放资源xlsx_close(&wb);printf("XLSX文件已成功转换为CSV格式\n");return 0;
}
编译执行成功,前6行的内容如下,可见它支持UTF-8, 包括汉字希腊文。
gcc xlsx2csv.c -o xlsx2csv -L /par/xlsx_drone/build -lxlsx_drone
/par/xlsx_drone# ./xls2csv
bash: ./xls2csv: No such file or directory
/par/xlsx_drone# ./xlsx2csv
XLSX文件已成功转换为CSV格式head -6 output.csv
"General","Number","Currency","Accounting","Date","Time","Percentage","Fraction (1.5)","Scientific (0.001)","Text","Special","Custom"
"Foo",1000,1000,147,43458,0.104792,0.500000,1.500000,0.001000,"1875",2000,12
235,1000,-14562.740000,1200.874000,43458,0.104792,0.450000,1.500000,0.001000,"Just text",2000,40955
17.890000,-1000,584,,43458,0.104792,1.600000,1.500000,0.001000,"𐐀34",543415635644,
,-1000,,,43458,0.104792,,1.500000,,"foo你bar好qaz",34580585,
,1200.561000,,,43458,0.104792,,1.500000,,,,
再用这个代码读取8*1M行的xlsx文件,看它的性能。
gcc xlsx2csv.c -o xlsx2csv -L /par/xlsx_drone/build -lxlsx_drone -O3
root@6ae32a5ffcde:/par/xlsx_drone# time ./xlsx2csv
XLSX文件已成功转换为CSV格式real 0m14.381s
user 0m10.321s
sys 0m3.695s
尽管用了-O3编译,还是不太快。