当前位置: 首页 > news >正文

ElasticSearch是什么?

ElasticSearch是什么

📖 概述

ElasticSearch (ES) 是一个基于Apache Lucene构建的分布式、实时搜索和分析引擎。它将单机的Lucene搜索库扩展为分布式架构,提供了强大的全文搜索、结构化搜索和分析能力。ES在日志分析、应用搜索、商品推荐等场景中被广泛应用。

🔍 核心存储结构详解

1. 倒排索引 (Inverted Index)

倒排索引是ElasticSearch实现高效关键词搜索的核心数据结构。

工作原理

graph LRA[原始文档] --> B[分词器]B --> C[词项Term]C --> D[倒排索引]D --> E[文档ID列表]subgraph "倒排索引结构"F[Term Dictionary] --> G[Posting List]G --> H[Doc ID + 位置信息]end

数据结构示例

// 原始文档
{"doc1": "Elasticsearch is a search engine","doc2": "Lucene is the core of Elasticsearch", "doc3": "Search engines use inverted index"
}// 分词后的倒排索引
{"elasticsearch": [1, 2],      // 出现在文档1和2中"search": [1, 3],             // 出现在文档1和3中"engine": [1, 3],             // 出现在文档1和3中"lucene": [2],                // 只出现在文档2中"core": [2],                  // 只出现在文档2中"inverted": [3],              // 只出现在文档3中"index": [3]                  // 只出现在文档3中
}

时间复杂度优化

通过倒排索引,搜索时间复杂度从暴力搜索的O(N*M)优化为O(logN):

// 传统全文搜索 - O(N*M)
public List<Document> bruteForceSearch(String keyword, List<Document> docs) {List<Document> results = new ArrayList<>();for (Document doc : docs) {  // O(N)if (doc.content.contains(keyword)) {  // O(M)results.add(doc);}}return results;
}// 倒排索引搜索 - O(logN)
public List<Document> invertedIndexSearch(String keyword, InvertedIndex index) {// 通过跳表或B+树快速定位Term - O(logN)PostingList postingList = index.getPostingList(keyword);return postingList.getDocuments();
}

详细存储结构

倒排索引的物理存储结构:Term Dictionary (词典):
├── Term: "elasticsearch" → Pointer to Posting List
├── Term: "lucene" → Pointer to Posting List  
└── Term: "search" → Pointer to Posting ListPosting List (倒排链表):
elasticsearch → [DocID:1, Freq:1, Positions:[0]] → [DocID:2, Freq:1, Positions:[5]]
lucene → [DocID:2, Freq:1, Positions:[0]]
search → [DocID:1, Freq:1, Positions:[3]] → [DocID:3, Freq:1, Positions:[0]]

2. Term Index - 内存目录树

Term Index是基于前缀复用构建的内存数据结构,用于加速Term Dictionary的磁盘检索。

FST (Finite State Transducer) 结构

graph TBRoot --> |e| Node1Node1 --> |l| Node2Node2 --> |a| Node3Node3 --> |s| Node4Node4 --> |t| Node5Node5 --> |i| Node6Node6 --> |c| Node7Node7 --> |s| Node8[Final: elasticsearch]Node2 --> |u| Node9Node9 --> |c| Node10Node10 --> |e| Node11Node11 --> |n| Node12Node12 --> |e| Node13[Final: lucene]

前缀复用优势

// 传统Trie树 - 每个节点存储完整字符
class TrieNode {char character;Map<Character, TrieNode> children;boolean isEndOfWord;// 内存使用:每个字符一个节点
}// FST - 前缀复用,压缩存储
class FSTNode {String sharedPrefix;  // 共享前缀Map<String, FSTNode> transitions;Object output;  // 指向Term Dictionary的偏移量// 内存使用:大幅减少,特别是有公共前缀的情况
}

检索流程

查找Term "elasticsearch"的流程:1. 内存中的Term Index (FST):Root → e → el → ela → elas → elast → elasti → elastic → elastics → elasticsearch找到对应的磁盘偏移量:offset_1232. 根据偏移量访问磁盘上的Term Dictionary:seek(offset_123) → 读取"elasticsearch"的Posting List指针3. 加载Posting List:获取包含"elasticsearch"的所有文档ID和位置信息

3. Stored Fields - 行式存储

Stored Fields以行式结构存储完整的文档内容,用于返回搜索结果中的原始数据。

存储结构

Document Storage Layout (行式存储):Doc 1: [field1: "value1", field2: "value2", field3: "value3"]
Doc 2: [field1: "value4", field2: "value5", field3: "value6"]  
Doc 3: [field1: "value7", field2: "value8", field3: "value9"]每行连续存储,便于根据DocID快速获取完整文档

访问模式

// 根据DocID获取完整文档
public Document getStoredDocument(int docId) {// 1. 根据docId计算在文件中的偏移量long offset = docId * averageDocSize + indexOffset;// 2. 从磁盘读取完整文档数据byte[] docData = readFromFile(offset, docLength);// 3. 反序列化为Document对象return deserialize(docData);
}

压缩优化

Stored Fields的压缩策略:1. 文档级压缩:- 使用LZ4/DEFLATE压缩算法- 批量压缩16KB块,平衡压缩比和解压速度2. 字段级优化:- 数值字段使用变长编码- 字符串字段使用字典压缩- 日期字段使用差值编码

4. Doc Values - 列式存储

Doc Values采用列式存储结构,集中存放字段值,专门优化排序和聚合操作的性能。

存储对比

行式存储 (Stored Fields):
Doc1: [name:"Alice", age:25, city:"NYC"]
Doc2: [name:"Bob", age:30, city:"LA"] 
Doc3: [name:"Carol", age:28, city:"NYC"]列式存储 (Doc Values):
name列: ["Alice", "Bob", "Carol"]
age列:  [25, 30, 28]
city列: ["NYC", "LA", "NYC"]

数据类型优化

// 数值类型的Doc Values优化
class NumericDocValues {// 使用位打包技术,根据数值范围选择最小存储位数private PackedInts.Reader values;public long get(int docId) {return values.get(docId);  // O(1)访问}
}// 字符串类型的Doc Values优化  
class SortedDocValues {private PackedInts.Reader ordinals;  // 文档到序号的映射private BytesRef[] terms;            // 去重后的字符串数组public BytesRef get(int docId) {int ord = (int) ordinals.get(docId);return terms[ord];}
}

聚合性能优化

// 基于Doc Values的高效聚合
public class AggregationOptimization {// 数值聚合 - 利用列式存储的顺序访问优势public double calculateAverage(NumericDocValues ageValues, int[] docIds) {long sum = 0;for (int docId : docIds) {sum += ageValues.get(docId);  // 顺序访问,缓存友好}return (double) sum / docIds.length;}// 分组聚合 - 利用排序的Doc Valuespublic Map<String, List<Integer>> groupByCity(SortedDocValues cityValues, int[] docIds) {Map<String, List<Integer>> groups = new HashMap<>();for (int docId : docIds) {String city = cityValues.get(docId).utf8ToString();groups.computeIfAbsent(city, k -> new ArrayList<>()).add(docId);}return groups;}
}

🗂️ Lucene核心概念

1. Segment - 最小搜索单元

Segment是Lucene的最小搜索单元,包含完整的搜索数据结构。

Segment结构

Segment文件组成:
├── .tim (Term Index)           - 内存中的FST索引
├── .tip (Term Dictionary)      - 磁盘上的词典
├── .doc (Frequency)            - 词频信息
├── .pos (Positions)            - 词位置信息
├── .pay (Payloads)            - 自定义载荷数据
├── .fdt (Stored Fields Data)   - 存储字段数据
├── .fdx (Stored Fields Index)  - 存储字段索引
├── .dvd (Doc Values Data)      - 列式数据
├── .dvm (Doc Values Metadata)  - 列式元数据
└── .si  (Segment Info)         - 段信息

不可变性特征

public class ImmutableSegment {private final String segmentName;private final int maxDoc;private final Map<String, InvertedIndex> fieldIndexes;// Segment一旦生成就不可修改public ImmutableSegment(String name, Document[] docs) {this.segmentName = name;this.maxDoc = docs.length;this.fieldIndexes = buildIndexes(docs);  // 构建时确定,之后只读}// 更新操作需要创建新的Segmentpublic ImmutableSegment addDocuments(Document[] newDocs) {Document[] allDocs = ArrayUtils.addAll(this.getDocs(), newDocs);return new ImmutableSegment(generateNewName(), allDocs);}
}

搜索过程

public class SegmentSearcher {public SearchResult search(Query query, Segment segment) {// 1. 查询Term Index,定位Term Dictionary偏移List<String> terms = query.extractTerms();Map<String, PostingList> termPostings = new HashMap<>();for (String term : terms) {// 快速定位 - O(logN)long offset = segment.getTermIndex().getOffset(term);PostingList posting = segment.getTermDict().getPosting(offset);termPostings.put(term, posting);}// 2. 合并Posting Lists,计算相关性Set<Integer> candidateDocs = intersectPostings(termPostings);// 3. 从Stored Fields获取文档内容List<Document> results = new ArrayList<>();for (Integer docId : candidateDocs) {Document doc = segment.getStoredFields().getDocument(docId);results.add(doc);}return new SearchResult(results);}
}

2. 段合并 (Segment Merging)

段合并是Lucene优化性能的重要机制,定期合并小Segment为大Segment。

合并策略

public class TieredMergePolicy {private static final double DEFAULT_SEGS_PER_TIER = 10.0;private static final int DEFAULT_MAX_MERGE_AT_ONCE = 10;public MergeSpecification findMerges(List<SegmentInfo> segments) {// 1. 按Segment大小分组List<List<SegmentInfo>> tiers = groupBySize(segments);// 2. 识别需要合并的层级MergeSpecification mergeSpec = new MergeSpecification();for (List<SegmentInfo> tier : tiers) {if (tier.size() > DEFAULT_SEGS_PER_TIER) {// 选择最小的N个Segment进行合并List<SegmentInfo> toMerge = selectSmallestSegments(tier, DEFAULT_MAX_MERGE_AT_ONCE);mergeSpec.add(new OneMerge(toMerge));}}return mergeSpec;}// 合并执行public SegmentInfo executeMerge(List<SegmentInfo> segments) {SegmentWriter writer = new SegmentWriter();// 逐个处理每个字段的数据for (String fieldName : getAllFields(segments)) {// 合并倒排索引mergeInvertedIndex(writer, segments, fieldName);// 合并Stored FieldsmergeStoredFields(writer, segments, fieldName);// 合并Doc ValuesmergeDocValues(writer, segments, fieldName);}return writer.commit();}
}

合并收益

合并前:
Segment1 (1000 docs, 10MB)
Segment2 (1200 docs, 12MB)  
Segment3 (800 docs, 8MB)
Segment4 (1500 docs, 15MB)
总计: 4个文件,45MB,4次磁盘seek合并后:
Segment_merged (4500 docs, 42MB)  // 压缩后略小
总计: 1个文件,42MB,1次磁盘seek查询性能提升:
- 减少文件打开数量:4 → 1
- 减少磁盘seek次数:4 → 1  
- 提高缓存命中率:连续存储
- 减少内存开销:合并索引结构

🌐 从Lucene到ElasticSearch的演进

1. 分布式架构设计

ElasticSearch将单机的Lucene扩展为分布式搜索引擎。

架构对比

graph TBsubgraph "单机Lucene"A[Application] --> B[Lucene Library]B --> C[Local Index Files]endsubgraph "分布式ElasticSearch"D[Client] --> E[Coordinate Node]E --> F[Data Node 1]E --> G[Data Node 2]E --> H[Data Node 3]F --> I[Shard 1]F --> J[Shard 2]G --> K[Shard 3] G --> L[Replica 1]H --> M[Replica 2]H --> N[Replica 3]end

核心改进

// Lucene单机搜索
public class LuceneSearcher {private IndexSearcher searcher;public TopDocs search(Query query, int numHits) {return searcher.search(query, numHits);  // 单机处理}
}// ElasticSearch分布式搜索
public class DistributedSearcher {private List<ShardSearcher> shards;public SearchResponse search(SearchRequest request) {// 1. 分发查询到所有分片List<Future<ShardSearchResult>> futures = new ArrayList<>();for (ShardSearcher shard : shards) {Future<ShardSearchResult> future = executor.submit(() -> shard.search(request));futures.add(future);}// 2. 收集各分片结果List<ShardSearchResult> shardResults = new ArrayList<>();for (Future<ShardSearchResult> future : futures) {shardResults.add(future.get());}// 3. 合并排序,返回Top-K结果return mergeAndSort(shardResults, request.getSize());}
}

2. 分片与副本机制

Primary Shard vs Replica Shard

分片策略:Primary Shard (主分片):
- 负责写操作的处理
- 数据的权威来源
- 创建索引时确定数量,后续不可更改Replica Shard (副本分片):
- Primary Shard的完整副本
- 负责读操作的负载均衡
- 提供高可用保障
- 数量可以动态调整

数据分布示例

public class ShardAllocation {// 文档路由算法public int getShardId(String documentId, int numberOfShards) {// 使用文档ID的哈希值确定分片return Math.abs(documentId.hashCode()) % numberOfShards;}// 分片分布策略public void allocateShards(Index index, List<Node> nodes) {int primaryShards = index.getNumberOfShards();int replicaCount = index.getNumberOfReplicas();// 分配Primary Shardsfor (int shardId = 0; shardId < primaryShards; shardId++) {Node primaryNode = selectNodeForPrimary(nodes, shardId);primaryNode.allocatePrimaryShard(index, shardId);// 分配Replica Shardsfor (int replica = 0; replica < replicaCount; replica++) {Node replicaNode = selectNodeForReplica(nodes, primaryNode, shardId);replicaNode.allocateReplicaShard(index, shardId, replica);}}}
}

读写负载均衡

public class LoadBalancedOperations {// 写操作:必须路由到Primary Shardpublic IndexResponse index(IndexRequest request) {String docId = request.getId();int shardId = getShardId(docId, numberOfShards);// 找到Primary ShardShard primaryShard = findPrimaryShard(shardId);// 在Primary上执行写操作IndexResponse response = primaryShard.index(request);// 同步到所有Replica ShardsList<Shard> replicas = findReplicaShards(shardId);for (Shard replica : replicas) {replica.index(request);  // 异步复制}return response;}// 读操作:可以路由到Primary或Replicapublic GetResponse get(GetRequest request) {String docId = request.getId();int shardId = getShardId(docId, numberOfShards);// 负载均衡选择分片(Primary + Replicas)List<Shard> availableShards = findAvailableShards(shardId);Shard selectedShard = selectShardForRead(availableShards);return selectedShard.get(request);}
}

3. 节点角色分工

ElasticSearch通过节点角色分化实现功能解耦和性能优化。

节点类型详解

// Master Node - 集群管理
public class MasterNode extends Node {private ClusterState clusterState;private AllocationService allocationService;public void handleClusterStateChange() {// 1. 处理索引创建/删除processIndexOperations();// 2. 管理分片分配rebalanceShards();// 3. 处理节点加入/离开updateNodeMembership();// 4. 广播集群状态到所有节点broadcastClusterState();}@Overridepublic boolean canHandleSearchRequests() {return false;  // 专注于集群管理,不处理搜索请求}
}// Data Node - 数据存储
public class DataNode extends Node {private Map<ShardId, Shard> localShards;private LuceneService luceneService;public SearchResponse executeSearch(SearchRequest request) {List<ShardSearchResult> results = new ArrayList<>();for (ShardId shardId : request.getShardIds()) {if (localShards.containsKey(shardId)) {Shard shard = localShards.get(shardId);ShardSearchResult result = shard.search(request);results.add(result);}}return aggregateResults(results);}@Overridepublic boolean canStorePrimaryShards() {return true;  // 可以存储主分片}
}// Coordinate Node - 请求协调
public class CoordinateNode extends Node {private LoadBalancer loadBalancer;private ResultAggregator aggregator;public SearchResponse coordinateSearch(SearchRequest request) {// 1. 查询路由规划Map<Node, List<ShardId>> shardRouting = planShardRouting(request);// 2. 并发发送到各个Data NodeMap<Node, Future<SearchResponse>> futures = new HashMap<>();for (Map.Entry<Node, List<ShardId>> entry : shardRouting.entrySet()) {Node dataNode = entry.getKey();SearchRequest shardRequest = buildShardRequest(request, entry.getValue());Future<SearchResponse> future = sendSearchRequest(dataNode, shardRequest);futures.put(dataNode, future);}// 3. 收集并聚合结果List<SearchResponse> responses = collectResponses(futures);return aggregator.aggregate(responses, request);}@Overridepublic boolean canStoreData() {return false;  // 不存储数据,专注于协调}
}

角色组合配置

# elasticsearch.yml 配置示例# 专用Master节点
node.master: true
node.data: false
node.ingest: false
node.ml: false# 专用Data节点  
node.master: false
node.data: true
node.ingest: false
node.ml: false# 专用Coordinate节点
node.master: false
node.data: false
node.ingest: true
node.ml: false# 混合节点(小集群)
node.master: true
node.data: true
node.ingest: true
node.ml: false

4. 去中心化设计

ElasticSearch采用去中心化架构,避免依赖外部组件。

Raft协议实现

public class RaftConsensus {private NodeRole currentRole = NodeRole.FOLLOWER;private int currentTerm = 0;private String votedFor = null;private List<LogEntry> log = new ArrayList<>();// Leader选举public void startElection() {currentTerm++;currentRole = NodeRole.CANDIDATE;votedFor = this.nodeId;// 向所有节点请求投票int votes = 1;  // 自己的票for (Node node : clusterNodes) {VoteRequest request = new VoteRequest(currentTerm, nodeId, getLastLogIndex(), getLastLogTerm());VoteResponse response = node.requestVote(request);if (response.isVoteGranted()) {votes++;}}// 获得多数票,成为Leaderif (votes > clusterNodes.size() / 2) {becomeLeader();} else {becomeFollower();}}// 日志复制public void replicateEntry(ClusterStateUpdate update) {if (currentRole != NodeRole.LEADER) {throw new IllegalStateException("Only leader can replicate entries");}LogEntry entry = new LogEntry(currentTerm, update);log.add(entry);// 并行复制到所有Followerint replicationCount = 1;  // Leader本身for (Node follower : followers) {AppendEntriesRequest request = new AppendEntriesRequest(currentTerm, nodeId, getLastLogIndex() - 1, getLastLogTerm(), Arrays.asList(entry), commitIndex);AppendEntriesResponse response = follower.appendEntries(request);if (response.isSuccess()) {replicationCount++;}}// 多数节点确认后提交if (replicationCount > clusterNodes.size() / 2) {commitIndex = log.size() - 1;applyToStateMachine(update);}}
}

集群发现机制

public class ClusterDiscovery {private List<String> seedNodes;private Map<String, NodeInfo> discoveredNodes;// 节点发现public void discoverNodes() {Set<String> allNodes = new HashSet<>(seedNodes);// 递归发现:从种子节点开始,获取它们知道的其他节点Queue<String> toDiscover = new LinkedList<>(seedNodes);Set<String> discovered = new HashSet<>();while (!toDiscover.isEmpty()) {String nodeAddress = toDiscover.poll();if (discovered.contains(nodeAddress)) {continue;}try {// 连接节点,获取其已知的集群成员NodeInfo nodeInfo = connectAndGetInfo(nodeAddress);discoveredNodes.put(nodeAddress, nodeInfo);discovered.add(nodeAddress);// 将新发现的节点加入待探索队列for (String knownNode : nodeInfo.getKnownNodes()) {if (!discovered.contains(knownNode)) {toDiscover.offer(knownNode);}}} catch (Exception e) {log.warn("Failed to discover node: " + nodeAddress, e);}}// 更新集群成员视图updateClusterMembership(discoveredNodes.values());}// 故障检测@Scheduled(fixedDelay = 1000)public void detectFailures() {for (Map.Entry<String, NodeInfo> entry : discoveredNodes.entrySet()) {String nodeAddress = entry.getKey();NodeInfo nodeInfo = entry.getValue();try {// 发送心跳检测boolean isAlive = sendHeartbeat(nodeAddress);if (!isAlive) {handleNodeFailure(nodeAddress, nodeInfo);}} catch (Exception e) {handleNodeFailure(nodeAddress, nodeInfo);}}}
}

🔄 核心工作流程

1. 索引流程

public class IndexingWorkflow {public IndexResponse index(IndexRequest request) {// 1. 路由到正确的分片int shardId = calculateShardId(request.getId());Shard primaryShard = getPrimaryShard(shardId);// 2. 在Primary Shard上执行索引Document doc = parseDocument(request.getSource());// 2.1 分词处理Map<String, List<String>> analyzedFields = analyzeDocument(doc);// 2.2 构建倒排索引updateInvertedIndex(analyzedFields, doc.getId());// 2.3 存储原始文档storeDocument(doc);// 2.4 更新Doc ValuesupdateDocValues(doc);// 3. 复制到Replica ShardsList<Shard> replicas = getReplicaShards(shardId);replicateToReplicas(replicas, request);// 4. 返回响应return new IndexResponse(request.getId(), shardId, "created");}
}

2. 搜索流程

public class SearchWorkflow {public SearchResponse search(SearchRequest request) {// Phase 1: Query Phase (查询阶段)Map<ShardId, ShardSearchResult> queryResults = queryPhase(request);// Phase 2: Fetch Phase (获取阶段)  List<Document> documents = fetchPhase(queryResults, request);return buildSearchResponse(documents, queryResults);}private Map<ShardId, ShardSearchResult> queryPhase(SearchRequest request) {Map<ShardId, ShardSearchResult> results = new HashMap<>();// 并行查询所有相关分片List<Future<ShardSearchResult>> futures = new ArrayList<>();for (ShardId shardId : getTargetShards(request)) {Future<ShardSearchResult> future = executor.submit(() -> {Shard shard = getShard(shardId);// 1. 解析查询Query luceneQuery = parseQuery(request.getQuery());// 2. 执行搜索,只返回DocID和ScoreTopDocs topDocs = shard.search(luceneQuery, request.getSize());// 3. 包装结果return new ShardSearchResult(shardId, topDocs);});futures.add(future);}// 收集查询结果for (Future<ShardSearchResult> future : futures) {ShardSearchResult result = future.get();results.put(result.getShardId(), result);}return results;}private List<Document> fetchPhase(Map<ShardId, ShardSearchResult> queryResults,SearchRequest request) {// 1. 全局排序,选出Top-KList<ScoreDoc> globalTopDocs = mergeAndSort(queryResults.values(), request.getSize());// 2. 根据DocID获取完整文档内容List<Document> documents = new ArrayList<>();for (ScoreDoc scoreDoc : globalTopDocs) {ShardId shardId = getShardId(scoreDoc);Shard shard = getShard(shardId);// 从Stored Fields获取完整文档Document doc = shard.getStoredDocument(scoreDoc.doc);documents.add(doc);}return documents;}
}

📊 性能特征分析

1. 查询性能

时间复杂度分析:1. Term查找:O(logN)- Term Index (FST): O(logT), T为不重复Term数量- Term Dictionary: O(1), 直接偏移访问2. Posting List遍历:O(K)- K为包含Term的文档数量- 使用跳表优化,可以跳过无关文档3. 多Term查询合并:O(K1 + K2 + ... + Kn)- 使用双指针法合并有序列表- 布尔查询的AND/OR/NOT操作4. 结果排序:O(R*logR)- R为最终返回的结果数量- 通常R << 总文档数,性能可控总体查询复杂度:O(logN + K + R*logR)

2. 存储效率

存储空间分析:1. 倒排索引:- Term Dictionary: 约为原文本的10-30%- Posting Lists: 约为原文本的20-50%- Term Index (内存): 约为Term Dictionary的1-5%2. Stored Fields:- 压缩比: 50-80% (取决于数据类型和压缩算法)- 随机访问: 需要解压缩开销3. Doc Values:- 数值类型: 原始数据的70-90% (位打包优化)- 字符串类型: 原始数据的60-85% (序号化+字典)4. 总体存储开销:- 原始数据: 100%- 索引开销: 50-100%- 总存储: 150-200% of 原始数据

3. 内存使用

public class MemoryUsageAnalysis {// 各组件内存使用估算public MemoryUsage calculateMemoryUsage(IndexStats stats) {long termIndexMemory = estimateTermIndexMemory(stats);long filterCacheMemory = estimateFilterCacheMemory(stats);long fieldDataMemory = estimateFieldDataMemory(stats);long segmentMemory = estimateSegmentMemory(stats);return new MemoryUsage(termIndexMemory, filterCacheMemory, fieldDataMemory, segmentMemory);}private long estimateTermIndexMemory(IndexStats stats) {// Term Index (FST) 大约占用:// 每个唯一Term 8-32字节 (取决于前缀压缩效果)return stats.getUniqueTermCount() * 20; // 平均20字节/Term}private long estimateFilterCacheMemory(IndexStats stats) {// 过滤器缓存:缓存常用的过滤器BitSet// 每个文档1bit,按字节对齐return stats.getDocumentCount() / 8 * stats.getCachedFilterCount();}private long estimateFieldDataMemory(IndexStats stats) {// Doc Values加载到内存的部分// 数值字段:8字节/文档,字符串字段:变长long numericMemory = stats.getNumericFieldCount() * stats.getDocumentCount() * 8;long stringMemory = stats.getStringFieldSize(); // 实际字符串长度return numericMemory + stringMemory;}
}

🎯 总结

ElasticSearch通过精巧的数据结构设计和分布式架构,将Lucene从单机搜索库演进为分布式搜索引擎:

核心数据结构优势:

  • 倒排索引:O(logN)查询复杂度,高效关键词搜索
  • Term Index:FST前缀复用,减少内存开销和磁盘IO
  • Stored Fields:行式存储,快速文档检索
  • Doc Values:列式存储,优化聚合和排序性能

分布式架构特色:

  • 分片机制:水平扩展,负载均衡
  • 副本保障:高可用,读写分离
  • 节点分工:Master、Data、Coordinate角色解耦
  • 去中心化:Raft协议,无外部依赖

性能特征:

  • 查询性能:亚秒级全文搜索响应
  • 存储效率:50-100%索引开销,可接受的空间成本
  • 扩展能力:线性扩展,支持PB级数据

ElasticSearch成功地将复杂的信息检索理论转化为实用的分布式搜索平台,在现代大数据生态中发挥着重要作用。其设计理念和技术实现为分布式系统架构提供了宝贵的参考价值。

http://www.sczhlp.com/news/43059/

相关文章:

  • 触觉传感机器人运动规划与执行技术解析
  • 管理话题
  • 做网站沧州淮北网站建设
  • wordpress 4.6.4seo查询5118
  • wordpress幻灯片跳转英文关键词seo
  • 网站只做内容 不做外链键词优化排名
  • 电子政务网站建设的挑战肇庆网站推广排名
  • 深圳网站制作工具网站seo优化徐州百度网络
  • Java集合框架-3.增加forlambda表达式
  • 哈尔滨做网站需要多少钱百度搜索关键词排名
  • 网站建设功能需求公司网站建设需要多少钱
  • 深圳市作网站的公司9个成功的市场营销案例
  • 网站建设常用结构类型上海专业的网络推广
  • 如何做单页网站视频盘多多网盘资源库
  • 网站访客qq号码获取百度下载安装2021最新版
  • 网站备案后证书广州网站建设技术外包
  • C# Avalonia 10- TwoResources
  • flex网站模板专业公司网络推广
  • 网站icp备案号怎么查知乎seo
  • app引流推广怎么做东莞seo排名优化
  • 跨境交友网站建设引流推广公司
  • 怀化找工作网站内容营销成功案例
  • 自己做的网站怎么收藏本站百度搜索指数和资讯指数
  • 插入视频网站网上代写文章一般多少钱
  • android FragmentManager 删除所有Fragment 重建
  • 优雅求模,一致性哈希算法
  • 做网站需要提供的资料营销网站的建造步骤
  • 外链网站分类青岛做网站推广
  • 乐山做网站宣传方式
  • wordpress媒体优化武汉seo收费