当前位置：首页 > news >正文

java解析word文档

news 2025/7/17 17:06:12

文章目录

读取段落
读取图片
读取表格内容
页码

读取段落

读取段落内容非常简单。以下是一个demo:

public static void main(String[] args) {try(FileInputStream stream = new FileInputStream("parse/pages.docx")) {XWPFDocument document = new XWPFDocument(stream);List<XWPFParagraph> paragraphs = document.getParagraphs();for (XWPFParagraph paragraph: paragraphs) {System.out.println(paragraph.getText());}} catch (FileNotFoundException e) {throw new RuntimeException(e);} catch (IOException e) {throw new RuntimeException(e);}
}

读取图片

读取word里的图片也不难了，只需要获取XWPFPictureData对象就可以了，然后就可以获取到图片内容的byte数组。


public static void main(String[] args) {try(FileInputStream stream = new FileInputStream("parse/pages.docx")) {XWPFDocument document = new XWPFDocument(stream);List<XWPFPictureData> allPictures = document.getAllPictures();for (XWPFPictureData pictureData: allPictures) {byte[] data = pictureData.getData();File file = new File(pictureData.getFileName());Files.write(file.toPath(), data);}} catch (FileNotFoundException e) {throw new RuntimeException(e);} catch (IOException e) {throw new RuntimeException(e);}
}

读取表格内容

word中的表格，是XWPFTable-XWPFTableRow-XWPFTableCell的三级结构。有个这个三级结构，就非常好写代码获取了。

public static void main(String[] args) {try(FileInputStream stream = new FileInputStream("parse/table.docx")) {XWPFDocument document = new XWPFDocument(stream);List<XWPFTable> tables = document.getTables();for (XWPFTable table: tables) {List<XWPFTableRow> rows = table.getRows();for (XWPFTableRow row: rows) {List<XWPFTableCell> tableCells = row.getTableCells();for (XWPFTableCell cell: tableCells) {System.out.print(cell.getText());System.out.print("\t");}System.out.println();}}} catch (FileNotFoundException e) {throw new RuntimeException(e);} catch (IOException e) {throw new RuntimeException(e);}
}