当前位置: 首页 > news >正文

聊聊Spring AI Alibaba的ObsidianDocumentReader

本文主要研究一下Spring AI Alibaba的ObsidianDocumentReader

ObsidianDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReader.java

public class ObsidianDocumentReader implements DocumentReader {private final Path vaultPath;private final MarkdownDocumentParser parser;/*** Constructor for reading all files in vault* @param vaultPath Path to Obsidian vault*/public ObsidianDocumentReader(Path vaultPath) {this.vaultPath = vaultPath;this.parser = new MarkdownDocumentParser();}@Overridepublic List<Document> get() {List<Document> allDocuments = new ArrayList<>();// Find all markdown files in vaultList<ObsidianResource> resources = ObsidianResource.findAllMarkdownFiles(vaultPath);// Parse each filefor (ObsidianResource resource : resources) {try {List<Document> documents = parser.parse(resource.getInputStream());String source = resource.getSource();// Add metadata to each documentfor (Document doc : documents) {doc.getMetadata().put(ObsidianResource.SOURCE, source);}allDocuments.addAll(documents);}catch (IOException e) {throw new RuntimeException("Failed to read Obsidian file: " + resource.getFilePath(), e);}}return allDocuments;}public static Builder builder() {return new Builder();}public static class Builder {private Path vaultPath;public Builder vaultPath(Path vaultPath) {this.vaultPath = vaultPath;return this;}public ObsidianDocumentReader build() {return new ObsidianDocumentReader(vaultPath);}}}

ObsidianDocumentReader的get方法通过ObsidianResource.findAllMarkdownFiles(vaultPath)来读取ObsidianResource,之后遍历resources使用MarkdownDocumentParser进行解析

ObsidianResource

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianResource.java

public class ObsidianResource implements Resource {public static final String SOURCE = "source";public static final String MARKDOWN_EXTENSION = ".md";private final Path vaultPath;private final Path filePath;private final InputStream inputStream;/*** Constructor for single file* @param vaultPath Path to Obsidian vault* @param filePath Path to markdown file*/public ObsidianResource(Path vaultPath, Path filePath) {Assert.notNull(vaultPath, "VaultPath must not be null");Assert.notNull(filePath, "FilePath must not be null");Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);Assert.isTrue(Files.exists(filePath), "File does not exist: " + filePath);Assert.isTrue(filePath.toString().endsWith(MARKDOWN_EXTENSION), "File must be a markdown file: " + filePath);this.vaultPath = vaultPath;this.filePath = filePath;try {this.inputStream = new FileInputStream(filePath.toFile());}catch (IOException e) {throw new RuntimeException("Failed to create input stream for file: " + filePath, e);}}/*** Find all markdown files in the vault Recursively searches through all* subdirectories Only includes .md files and ignores hidden files/directories* @param vaultPath Root path of the Obsidian vault* @return List of ObsidianResource for each markdown file*/public static List<ObsidianResource> findAllMarkdownFiles(Path vaultPath) {Assert.notNull(vaultPath, "VaultPath must not be null");Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);Assert.isTrue(Files.isDirectory(vaultPath), "VaultPath must be a directory: " + vaultPath);List<ObsidianResource> resources = new ArrayList<>();try (Stream<Path> paths = Files.walk(vaultPath)) {paths// Only include .md files.filter(path -> path.toString().endsWith(MARKDOWN_EXTENSION))// Ignore hidden files and files in hidden directories.filter(path -> {Path relativePath = vaultPath.relativize(path);String[] pathParts = relativePath.toString().split("/");for (String part : pathParts) {if (part.startsWith(".")) {return false;}}return true;})// Only include regular files (not directories).filter(Files::isRegularFile).forEach(path -> resources.add(new ObsidianResource(vaultPath, path)));}catch (IOException e) {throw new RuntimeException("Failed to walk vault directory: " + vaultPath, e);}return resources;}//......
}	

ObsidianResource构造器要求输入vaultPath和filePath,其findAllMarkdownFiles方法会遍历vaultPath目录,找出.md结尾的文件

示例

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/test/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReaderIT.java

@EnabledIfEnvironmentVariable(named = "OBSIDIAN_VAULT_PATH", matches = ".+")
class ObsidianDocumentReaderIT {private static final String VAULT_PATH = System.getenv("OBSIDIAN_VAULT_PATH");// Static initializer to log a message if environment variable is not setstatic {if (VAULT_PATH == null || VAULT_PATH.isEmpty()) {System.out.println("Skipping Obsidian tests because OBSIDIAN_VAULT_PATH environment variable is not set.");}}ObsidianDocumentReader reader;@BeforeEachvoid setUp() {// Only initialize if VAULT_PATH is setif (VAULT_PATH != null && !VAULT_PATH.isEmpty()) {reader = ObsidianDocumentReader.builder().vaultPath(Path.of(VAULT_PATH)).build();}}@Testvoid should_read_markdown_files() {// Skip test if reader is nullAssumptions.assumeTrue(reader != null, "Skipping test because ObsidianDocumentReader could not be initialized");// whenList<Document> documents = reader.get();// thenassertThat(documents).isNotEmpty();// Verify document content and metadatafor (Document doc : documents) {// Verify source metadataassertThat(doc.getMetadata()).containsKey(ObsidianResource.SOURCE);String source = doc.getMetadata().get(ObsidianResource.SOURCE).toString();assertThat(source).isNotEmpty().endsWith(ObsidianResource.MARKDOWN_EXTENSION);// Verify contentassertThat(doc.getText()).isNotEmpty();// Print for debuggingSystem.out.println("Document source: " + source);if (doc.getMetadata().containsKey("category")) {System.out.println("Document category: " + doc.getMetadata().get("category"));}System.out.println("Document content: " + doc.getText());System.out.println("---");}}}

小结

spring-ai-alibaba-starter-document-reader-obsidian提供了ObsidianDocumentReader用于读取指定仓库(vaultPath)下的所有markdown文件,之后使用MarkdownDocumentParser去解析为List<Document>

doc

  • java2ai
http://www.xdnf.cn/news/97993.html

相关文章:

  • 【人工智能】DeepSeek 的开源生态:释放 AI 潜能的社区协同与技术突破
  • Unity-无限滚动列表实现Timer时间管理实现
  • ubuntu24设置拼音输入法,解决chrome不能输入中文
  • 经验分享-上传ios的ipa文件
  • Windows 同步技术-计时器队列和内存屏障
  • 32单片机——GPIO的工作模式
  • 工具指南:免费将 PDF 转换为 Word 的 10 个工具
  • [蓝桥杯 2025 省 Python B] 最多次数
  • 数据一致性问题剖析与实践(三)——分布式事务的一致性问题
  • MIT IDSS深度解析:跨学科融合与系统科学实践
  • 【正则表达式】核心知识点全景解析
  • 【解决】layui layer的提示框,弹出框一闪而过的问题
  • 12、高阶组件:魔法增幅器——React 19 HOC模式
  • 深入详解Java中的@PostConstruct注解:实现简洁而高效初始化操作
  • java记忆手册(2)
  • python 更换 pip 镜像源
  • 书香换绿意,爱心已成荫|平安养老险陕西分公司“以书换植”公益活动
  • 区块链技术在物联网中的应用:构建可信的智能世界
  • 微任务与宏任务
  • Linux命令-tcpdump
  • On the Biology of a Large Language Model——Claude团队的模型理解文章【论文阅读笔记】其一CLT与LLM知识推理
  • Android APP 爬虫操作
  • 集结号海螺捕鱼游戏源码解析(第三篇):拉霸机模块开发详解与服务器开奖机制
  • 【爬虫工具】2025微博采集软件,根据搜索关键词批量爬帖子,突破50页限制!
  • 2025职业本科网络安全课程体系设计:如何培养行业急需的实战型人才?
  • VulnHub-DarkHole_2靶机渗透教程
  • 高并发下单库存扣减异常?飞算 JavaAI 自动化生成分布式事务解决方案
  • iOS18 MSSBrowse闪退
  • 【PCB工艺】推挽电路及交越失真
  • 关于大数据的基础知识(四)——大数据的意义与趋势