当前位置: 首页 > ds >正文

Elasticsearch高能指南

Elasticsearch 完整指南

目录

  • Elasticsearch简介
  • 核心概念
  • 使用技巧
  • 重难点解析
  • Spring Boot集成
  • 最佳实践

Elasticsearch简介

什么是Elasticsearch

Elasticsearch是一个基于Apache Lucene的分布式搜索引擎,提供实时搜索和分析功能。它是Elastic Stack(ELK Stack)的核心组件,广泛应用于日志分析、全文搜索、监控分析等场景。

主要特点

  • 分布式架构: 支持水平扩展,自动分片和复制
  • 实时搜索: 近实时的搜索和分析能力
  • 全文搜索: 基于Lucene的强大全文搜索功能
  • RESTful API: 简单易用的HTTP API接口
  • 多租户: 支持多索引和多类型数据
  • 聚合分析: 强大的数据聚合和分析能力

适用场景

  • 企业级搜索引擎
  • 日志分析和监控
  • 电商商品搜索
  • 内容管理系统
  • 实时数据分析
  • 安全信息分析

核心概念

集群架构

Elasticsearch集群
├── 节点(Node) - 单个ES实例
│   ├── 主节点(Master Node) - 集群管理
│   ├── 数据节点(Data Node) - 数据存储和搜索
│   └── 协调节点(Coordinating Node) - 请求路由
├── 索引(Index) - 逻辑数据容器
│   ├── 分片(Shard) - 数据物理分割
│   │   ├── 主分片(Primary Shard) - 数据写入
│   │   └── 副本分片(Replica Shard) - 数据备份
│   └── 映射(Mapping) - 字段类型定义
└── 文档(Document) - 最小数据单元

数据类型

  • 核心类型: text, keyword, long, integer, double, boolean, date
  • 复杂类型: object, nested, array
  • 地理类型: geo_point, geo_shape
  • 特殊类型: ip, completion, token_count, percolator

使用技巧

1. 索引管理

// 创建索引
PUT /my_index
{"settings": {"number_of_shards": 3,"number_of_replicas": 1,"analysis": {"analyzer": {"my_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase", "stop"]}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "my_analyzer","fields": {"keyword": {"type": "keyword"}}},"content": {"type": "text","analyzer": "my_analyzer"},"tags": {"type": "keyword"},"created_at": {"type": "date"}}}
}

2. 搜索查询

// 复杂搜索查询
GET /my_index/_search
{"query": {"bool": {"must": [{"multi_match": {"query": "elasticsearch tutorial","fields": ["title^2", "content"],"type": "best_fields","fuzziness": "AUTO"}}],"filter": [{"range": {"created_at": {"gte": "2023-01-01","lte": "2023-12-31"}}},{"terms": {"tags": ["search", "tutorial"]}}]}},"aggs": {"tag_counts": {"terms": {"field": "tags","size": 10}},"date_histogram": {"date_histogram": {"field": "created_at","calendar_interval": "month"}}},"sort": [{ "_score": { "order": "desc" } },{ "created_at": { "order": "desc" } }],"from": 0,"size": 20
}

3. 聚合分析

// 复杂聚合查询
GET /orders/_search
{"size": 0,"aggs": {"sales_by_category": {"terms": {"field": "category","size": 10},"aggs": {"total_sales": {"sum": {"field": "amount"}},"avg_order_value": {"avg": {"field": "amount"}},"sales_trend": {"date_histogram": {"field": "order_date","calendar_interval": "month"},"aggs": {"monthly_sales": {"sum": {"field": "amount"}}}}}},"global_sales_stats": {"global": {},"aggs": {"total_revenue": {"sum": {"field": "amount"}},"avg_order_value": {"avg": {"field": "amount"}}}}}
}

4. 批量操作

// 批量索引文档
POST /_bulk
{"index":{"_index":"my_index","_id":"1"}}
{"title":"Elasticsearch Guide","content":"Complete guide to ES","tags":["search","guide"]}
{"index":{"_index":"my_index","_id":"2"}}
{"title":"Spring Boot Integration","content":"How to integrate ES with Spring Boot","tags":["spring","integration"]}// 批量更新
POST /_bulk
{"update":{"_index":"my_index","_id":"1"}}
{"doc":{"views":100,"updated_at":"2023-12-01"}}
{"update":{"_index":"my_index","_id":"2"}}
{"doc":{"views":50,"updated_at":"2023-12-01"}}

重难点解析

1. 分片策略

// 分片数量计算
// 分片数 = 数据量 / 单个分片大小(建议30-50GB)
// 分片数 = CPU核心数 * 2// 分片键选择
PUT /logs/_settings
{"index.routing.allocation.require.box_type": "hot"
}// 分片预热
POST /logs/_forcemerge?max_num_segments=1

2. 性能优化

// 查询优化
GET /my_index/_search
{"query": {"constant_score": {"filter": {"term": {"status": "active"}}}},"_source": ["title", "content"], // 只返回需要的字段"size": 1000 // 避免深度分页
}// 索引优化
PUT /my_index/_settings
{"index.refresh_interval": "30s", // 降低刷新频率"index.number_of_replicas": 0,   // 写入时减少副本"index.translog.durability": "async" // 异步事务日志
}

3. 集群管理

// 集群健康检查
GET /_cluster/health?pretty// 节点信息
GET /_nodes/stats?pretty// 索引统计
GET /_stats?pretty// 分片分配
PUT /_cluster/settings
{"persistent": {"cluster.routing.allocation.enable": "all"}
}

4. 数据建模

// 父子关系建模
PUT /company
{"mappings": {"properties": {"name": { "type": "text" },"join_field": {"type": "join","relations": {"company": "employee"}}}}
}// 嵌套对象
PUT /products
{"mappings": {"properties": {"name": { "type": "text" },"variants": {"type": "nested","properties": {"color": { "type": "keyword" },"size": { "type": "keyword" },"price": { "type": "double" }}}}}
}

Spring Boot集成

1. 依赖配置

<!-- Maven -->
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
// Gradle
implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'

2. 配置文件

# application.yml
spring:elasticsearch:uris: http://localhost:9200connection-timeout: 1ssocket-timeout: 30s# 安全配置# username: elastic# password: changeme# 集群配置# uris: http://es-node1:9200,http://es-node2:9200,http://es-node3:9200

3. 实体类定义

@Document(indexName = "articles")
@Setting(settingPath = "es-settings.json")
public class Article {@Idprivate String id;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String title;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String content;@Field(type = FieldType.Keyword)private List<String> tags;@Field(type = FieldType.Keyword)private String category;@Field(type = FieldType.Date)private LocalDateTime createdAt;@Field(type = FieldType.Long)private Long views;// 构造函数、getter、setter
}

4. Repository接口

@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article, String> {// 自定义查询方法List<Article> findByTitleContaining(String title);List<Article> findByCategoryAndTagsIn(String category, List<String> tags);Page<Article> findByCreatedAtBetween(LocalDateTime start, LocalDateTime end, Pageable pageable);// 使用@Query注解@Query("{\"bool\": {\"must\": [{\"match\": {\"title\": \"?0\"}}]}}")List<Article> searchByTitle(String title);// 聚合查询@Query("{\"aggs\": {\"category_count\": {\"terms\": {\"field\": \"category\"}}}}")SearchHits<Article> getCategoryStats();
}

5. 服务层实现

@Service
public class ArticleService {@Autowiredprivate ArticleRepository articleRepository;@Autowiredprivate ElasticsearchRestTemplate elasticsearchTemplate;// 基本CRUD操作public Article createArticle(Article article) {article.setCreatedAt(LocalDateTime.now());article.setViews(0L);return articleRepository.save(article);}public Article findById(String id) {return articleRepository.findById(id).orElseThrow(() -> new ArticleNotFoundException("Article not found"));}// 复杂搜索查询public SearchPage<Article> searchArticles(SearchRequest request, Pageable pageable) {BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();// 标题和内容搜索if (StringUtils.hasText(request.getKeyword())) {queryBuilder.must(QueryBuilders.multiMatchQuery(request.getKeyword()).field("title", 2.0f).field("content").type(MultiMatchQueryBuilder.Type.BEST_FIELDS).fuzziness(Fuzziness.AUTO));}// 分类过滤if (StringUtils.hasText(request.getCategory())) {queryBuilder.filter(QueryBuilders.termQuery("category", request.getCategory()));}// 标签过滤if (request.getTags() != null && !request.getTags().isEmpty()) {queryBuilder.filter(QueryBuilders.termsQuery("tags", request.getTags()));}// 时间范围过滤if (request.getStartDate() != null && request.getEndDate() != null) {queryBuilder.filter(QueryBuilders.rangeQuery("createdAt").gte(request.getStartDate()).lte(request.getEndDate()));}// 创建搜索查询NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(queryBuilder).withSort(Sort.by(Sort.Direction.DESC, "_score")).withSort(Sort.by(Sort.Direction.DESC, "createdAt")).withPageable(pageable).build();return elasticsearchTemplate.search(searchQuery, Article.class);}// 聚合分析public Map<String, Object> getArticleStats() {// 分类统计TermsAggregationBuilder categoryAgg = AggregationBuilders.terms("category_stats").field("category").size(10);// 标签统计TermsAggregationBuilder tagAgg = AggregationBuilders.terms("tag_stats").field("tags").size(20);// 时间趋势DateHistogramAggregationBuilder timeAgg = AggregationBuilders.dateHistogram("time_trend").field("createdAt").calendarInterval(DateHistogramInterval.MONTH);NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(QueryBuilders.matchAllQuery()).addAggregation(categoryAgg).addAggregation(tagAgg).addAggregation(timeAgg).build();SearchHits<Article> searchHits = elasticsearchTemplate.search(searchQuery, Article.class);Map<String, Object> stats = new HashMap<>();stats.put("category_stats", searchHits.getAggregations().get("category_stats"));stats.put("tag_stats", searchHits.getAggregations().get("tag_stats"));stats.put("time_trend", searchHits.getAggregations().get("time_trend"));return stats;}// 批量操作@Transactionalpublic void bulkIndexArticles(List<Article> articles) {BulkOperations bulkOps = elasticsearchTemplate.bulkOps(BulkOperations.BulkMode.INDEX, Article.class);articles.forEach(article -> {article.setCreatedAt(LocalDateTime.now());article.setViews(0L);bulkOps.insert(article);});bulkOps.execute();}// 更新文档public void updateArticleViews(String id) {UpdateQuery updateQuery = UpdateQuery.builder(id).withScript(new Script(ScriptType.INLINE, "painless", "ctx._source.views += 1", Collections.emptyMap())).build();elasticsearchTemplate.update(updateQuery, IndexCoordinates.of("articles"));}
}

6. 控制器层

@RestController
@RequestMapping("/api/articles")
public class ArticleController {@Autowiredprivate ArticleService articleService;@PostMappingpublic ResponseEntity<Article> createArticle(@RequestBody Article article) {Article createdArticle = articleService.createArticle(article);return ResponseEntity.status(HttpStatus.CREATED).body(createdArticle);}@GetMapping("/{id}")public ResponseEntity<Article> getArticleById(@PathVariable String id) {Article article = articleService.findById(id);return ResponseEntity.ok(article);}@GetMapping("/search")public ResponseEntity<SearchPage<Article>> searchArticles(@ModelAttribute SearchRequest request,@PageableDefault(sort = "createdAt", direction = Sort.Direction.DESC) Pageable pageable) {SearchPage<Article> articles = articleService.searchArticles(request, pageable);return ResponseEntity.ok(articles);}@GetMapping("/stats")public ResponseEntity<Map<String, Object>> getArticleStats() {Map<String, Object> stats = articleService.getArticleStats();return ResponseEntity.ok(stats);}@PostMapping("/bulk")public ResponseEntity<Void> bulkIndexArticles(@RequestBody List<Article> articles) {articleService.bulkIndexArticles(articles);return ResponseEntity.ok().build();}@PutMapping("/{id}/views")public ResponseEntity<Void> incrementViews(@PathVariable String id) {articleService.updateArticleViews(id);return ResponseEntity.ok().build();}
}

7. 配置类

@Configuration
@EnableElasticsearchRepositories(basePackages = "com.example.repository")
public class ElasticsearchConfig extends AbstractElasticsearchConfiguration {@Value("${spring.elasticsearch.uris}")private String elasticsearchUrl;@Override@Beanpublic RestHighLevelClient elasticsearchClient() {ClientConfiguration clientConfiguration = ClientConfiguration.builder().connectedTo(elasticsearchUrl.replace("http://", "")).withConnectTimeout(Duration.ofSeconds(5)).withSocketTimeout(Duration.ofSeconds(30)).build();return RestClients.create(clientConfiguration).rest();}@Beanpublic ElasticsearchRestTemplate elasticsearchRestTemplate() {return new ElasticsearchRestTemplate(elasticsearchClient());}// 自定义转换器@Beanpublic ElasticsearchCustomConversions customConversions() {return new ElasticsearchCustomConversions(Arrays.asList(new LocalDateTimeToDateConverter(),new DateToLocalDateTimeConverter()));}
}

最佳实践

1. 索引设计

  • 分片数量: 根据数据量和硬件资源合理设置
  • 副本数量: 生产环境至少1个副本
  • 映射设计: 合理选择字段类型和分析器
  • 索引生命周期: 使用ILM管理索引生命周期

2. 查询优化

  • 使用filter context减少评分计算
  • 合理使用_source字段减少网络传输
  • 避免深度分页,使用search_after
  • 使用聚合替代应用层计算

3. 性能调优

  • 调整refresh_interval平衡实时性和性能
  • 使用bulk API进行批量操作
  • 合理设置分片大小(30-50GB)
  • 监控集群健康状态

4. 安全配置

  • 启用X-Pack安全功能
  • 配置TLS/SSL加密
  • 设置用户权限和角色
  • 定期备份数据

5. 监控和维护

  • 监控集群健康状态
  • 设置告警机制
  • 定期清理无用索引
  • 监控查询性能

总结

Elasticsearch作为强大的搜索引擎和分析平台,具有分布式架构、实时搜索、强大的聚合分析等特性。通过合理的索引设计、查询优化和Spring Boot集成,可以构建高性能的搜索和分析应用。

关键要点:

  1. 理解Elasticsearch的分布式架构和核心概念
  2. 掌握索引设计和映射配置
  3. 熟练使用查询DSL和聚合API
  4. 正确配置Spring Boot集成
  5. 遵循最佳实践确保性能和稳定性

通过本指南的学习,您应该能够熟练使用Elasticsearch并成功集成到Spring Boot项目中,构建强大的搜索和分析功能。

http://www.xdnf.cn/news/18590.html

相关文章:

  • SYBASE ASE、Oracle、MySQL/MariaDB、SQL Server及PostgreSQL在邮件/短信发送功能上的全面横向对比报告
  • Simulink不连续模块库(Hit Crossing/PWM/Rate Limiter/Rate Limiter Dynamic)
  • 【Day01】堆与字符串处理算法详解
  • uniapp轮播 轮播图内有定位样式
  • Oracle DB 10g 升级至 11.2.0.4报错-ORA-00132
  • Python办公之Excel(openpyxl)、PPT(python-pptx)、Word(python-docx)
  • NVM-Windows 命令大全
  • 量化交易 - 上证50利用动态PE增强模型 - python
  • React 学习笔记1 组件、State
  • 线性回归学习笔记
  • JAVA-15 (2025.08.20学习记录)
  • 集成电路学习:什么是Template Matching模版匹配
  • week3-[循环嵌套]好数
  • 基于Python与Tkinter开发的微博多功能自动化助手
  • Android焦点窗口变化导致遥控键值监听失效问题分析
  • # 重磅发布 | onecode 3.0.1 Base 源码正式开源:AI赋能的企业级开发框架
  • XXL-Job REST API 工具类完全解析:简化分布式任务调度集成
  • (第二十期上)HTML 超链接标签 a
  • 【python与生活】如何从视频中提取关键帧?
  • FPGA DP1.4 With DSC解决方案
  • 【华为OD-C卷-019 对称字符串 100分(python、java、c++、js、c)】
  • Vitest 测试框架完全指南 – 极速单元测试解决方案
  • C++ 常见的排序算法详解
  • AI 产业落地:从 “实验室神话” 到 “车间烟火气” 的跨越
  • Spring Cloud Netflix学习笔记06-Zuul
  • 机器学习中的集成算法与 k 均值聚类算法概述
  • uniapp跨域怎么解决
  • Go 并发编程-channel
  • 详解开源关键信息提取方案PP-ChatOCRv4的设计与实现
  • AI客服系统架构与实现:大模型、知识库与多轮对话的最佳实践