LogByteSizeMergePolicy
, except this merge policy is able to merge
non-adjacent segment, and separates how many segments are merged at once (setMaxMergeAtOnce(int)
) from how many segments are allowed per tier (setSegmentsPerTier(double)
).
This merge policy also does not over-merge (i.e. cascade merges).
For normal merging, this policy first computes a "budget" of how many segments are allowed to be in the index. If the index is over-budget, then the policy sorts segments by decreasing size (pro-rating by percent deletes), and then finds the least-cost merge. Merge cost is measured by a combination of the "skew" of the merge (size of largest segment divided by smallest segment), total merge size and percent deletes reclaimed, so that merges with lower skew, smaller size and those reclaiming more deletes, are favored.
If a merge will produce a segment that's larger than setMaxMergedSegmentMB(double)
, then the
policy will merge fewer segments (down to 1 at once, if that one has deletions) to keep the
segment size under budget.
NOTE: this policy freely merges non-adjacent segments; if this is a problem, use LogMergePolicy
.
NOTE: This policy always merges by byte size of the segments, always pro-rates by percent deletes
NOTE Starting with Lucene 7.5, if you call IndexWriter.forceMerge(int)
with
this (default) merge policy, if setMaxMergedSegmentMB(double)
is in conflict with
maxNumSegments
passed to IndexWriter.forceMerge(int)
then maxNumSegments
wins. For
example, if your index has 50 1 GB segments, and you have setMaxMergedSegmentMB(double)
at 1024
(1 GB), and you call forceMerge(10)
, the two settings are clearly in conflict.
TieredMergePolicy
will choose to break the setMaxMergedSegmentMB(double)
constraint and try to
merge down to at most ten segments, each up to 5 * 1.25 GB in size (since an extra 25% buffer
increase in the expected segment size is targetted).
findForcedDeletesMerges should never produce segments greater than maxSegmentSize.
NOTE: This policy returns natural merges whose size is below the floor segment size
for full-flush
merges
.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static enum
protected static class
Holds score and explanation for a single candidate merge.private static class
Nested classes/interfaces inherited from class org.apache.lucene.index.MergePolicy
MergePolicy.MergeAbortedException, MergePolicy.MergeContext, MergePolicy.MergeException, MergePolicy.MergeReader, MergePolicy.MergeSpecification, MergePolicy.OneMerge, MergePolicy.OneMergeProgress
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final double
Default noCFSRatio.private double
private long
private double
private int
private long
private double
private int
Fields inherited from class org.apache.lucene.index.MergePolicy
DEFAULT_MAX_CFS_SEGMENT_SIZE, maxCFSSegmentSize, noCFSRatio
-
Constructor Summary
ConstructorsConstructorDescriptionSole constructor, setting all settings to their defaults. -
Method Summary
Modifier and TypeMethodDescriptionprivate MergePolicy.MergeSpecification
doFindMerges
(List<TieredMergePolicy.SegmentSizeAndDocs> sortedEligibleInfos, long maxMergedSegmentBytes, int mergeFactor, int allowedSegCount, int allowedDelCount, int allowedDocCount, TieredMergePolicy.MERGE_TYPE mergeType, MergePolicy.MergeContext mergeContext, boolean maxMergeIsRunning) findForcedDeletesMerges
(SegmentInfos infos, MergePolicy.MergeContext mergeContext) Determine what set of merge operations is necessary in order to expunge all deletes from the index.findForcedMerges
(SegmentInfos infos, int maxSegmentCount, Map<SegmentCommitInfo, Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) Determine what set of merge operations is necessary in order to merge to<=
the specified segment count.findMerges
(MergeTrigger mergeTrigger, SegmentInfos infos, MergePolicy.MergeContext mergeContext) Determine what set of merge operations are now necessary on the index.private long
floorSize
(long bytes) double
Returns the current deletesPctAllowed setting.double
Returns the current floorSegmentMB.double
Returns the current forceMergeDeletesPctAllowed setting.(package private) int
getMaxAllowedDocs
(int totalMaxDoc, int totalDelDocs) int
Returns the current maxMergeAtOnce setting.double
Returns the current maxMergedSegmentMB setting.double
Returns the current segmentsPerTier setting.private List
<TieredMergePolicy.SegmentSizeAndDocs> getSortedBySegmentSize
(SegmentInfos infos, MergePolicy.MergeContext mergeContext) int
Returns the target search concurrency.protected long
Return the maximum size of segments to be included in full-flush merges by the default implementation ofMergePolicy.findFullFlushMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)
.protected TieredMergePolicy.MergeScore
score
(List<SegmentCommitInfo> candidate, boolean hitTooLarge, Map<SegmentCommitInfo, TieredMergePolicy.SegmentSizeAndDocs> segmentsSizes) Expert: scores one merge; subclasses can override.setDeletesPctAllowed
(double v) Controls the maximum percentage of deleted documents that is tolerated in the index.setFloorSegmentMB
(double v) Segments smaller than this size are merged more aggressively: They are candidates for full-flush merges, in order to reduce the number of segments in the index prior to opening a new point-in-time view of the index.setForceMergeDeletesPctAllowed
(double v) When forceMergeDeletes is called, we only merge away a segment if its delete percentage is over this threshold.setMaxMergeAtOnce
(int v) Maximum number of segments to be merged at a time during "normal" merging.setMaxMergedSegmentMB
(double v) Maximum sized segment to produce during normal merging.setSegmentsPerTier
(double v) Sets the allowed number of segments per tier.setTargetSearchConcurrency
(int targetSearchConcurrency) Sets the target search concurrency.toString()
Methods inherited from class org.apache.lucene.index.MergePolicy
assertDelCount, findFullFlushMerges, findMerges, getMaxCFSSegmentSizeMB, getNoCFSRatio, isMerged, keepFullyDeletedSegment, message, numDeletesToMerge, segString, setMaxCFSSegmentSizeMB, setNoCFSRatio, size, useCompoundFile, verbose
-
Field Details
-
DEFAULT_NO_CFS_RATIO
public static final double DEFAULT_NO_CFS_RATIODefault noCFSRatio. If a merge's size is>= 10%
of the index, then we disable compound file for it.- See Also:
-
maxMergeAtOnce
private int maxMergeAtOnce -
maxMergedSegmentBytes
private long maxMergedSegmentBytes -
floorSegmentBytes
private long floorSegmentBytes -
segsPerTier
private double segsPerTier -
forceMergeDeletesPctAllowed
private double forceMergeDeletesPctAllowed -
deletesPctAllowed
private double deletesPctAllowed -
targetSearchConcurrency
private int targetSearchConcurrency
-
-
Constructor Details
-
TieredMergePolicy
public TieredMergePolicy()Sole constructor, setting all settings to their defaults.
-
-
Method Details
-
setMaxMergeAtOnce
Maximum number of segments to be merged at a time during "normal" merging. Default is 10. -
getMaxMergeAtOnce
public int getMaxMergeAtOnce()Returns the current maxMergeAtOnce setting.- See Also:
-
setMaxMergedSegmentMB
Maximum sized segment to produce during normal merging. This setting is approximate: the estimate of the merged segment size is made by summing sizes of to-be-merged segments (compensating for percent deleted docs). Default is 5 GB. -
getMaxMergedSegmentMB
public double getMaxMergedSegmentMB()Returns the current maxMergedSegmentMB setting.- See Also:
-
setDeletesPctAllowed
Controls the maximum percentage of deleted documents that is tolerated in the index. Lower values make the index more space efficient at the expense of increased CPU and I/O activity. Values must be between 5 and 50. Default value is 20.When the maximum delete percentage is lowered, the indexing thread will call for merges more often, meaning that write amplification factor will be increased. Write amplification factor measures the number of times each document in the index is written. A higher write amplification factor will lead to higher CPU and I/O activity as indicated above.
-
getDeletesPctAllowed
public double getDeletesPctAllowed()Returns the current deletesPctAllowed setting.- See Also:
-
setFloorSegmentMB
Segments smaller than this size are merged more aggressively:- They are candidates for full-flush merges, in order to reduce the number of segments in the index prior to opening a new point-in-time view of the index.
- For background merges, smaller segments are "rounded up" to this size.
-
getFloorSegmentMB
public double getFloorSegmentMB()Returns the current floorSegmentMB.- See Also:
-
maxFullFlushMergeSize
protected long maxFullFlushMergeSize()Description copied from class:MergePolicy
Return the maximum size of segments to be included in full-flush merges by the default implementation ofMergePolicy.findFullFlushMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)
.- Overrides:
maxFullFlushMergeSize
in classMergePolicy
-
setForceMergeDeletesPctAllowed
When forceMergeDeletes is called, we only merge away a segment if its delete percentage is over this threshold. Default is 10%. -
getForceMergeDeletesPctAllowed
public double getForceMergeDeletesPctAllowed()Returns the current forceMergeDeletesPctAllowed setting.- See Also:
-
setSegmentsPerTier
Sets the allowed number of segments per tier. Smaller values mean more merging but fewer segments.Default is 10.0.
-
getSegmentsPerTier
public double getSegmentsPerTier()Returns the current segmentsPerTier setting.- See Also:
-
setTargetSearchConcurrency
Sets the target search concurrency. This prevents creating segments that are bigger than maxDoc/targetSearchConcurrency, which in turn makes the work parallelizable into targetSearchConcurrency slices of similar doc counts. It also makes merging less aggressive, as higher values result in indices that do less merging and have more segments -
getTargetSearchConcurrency
public int getTargetSearchConcurrency()Returns the target search concurrency. -
getSortedBySegmentSize
private List<TieredMergePolicy.SegmentSizeAndDocs> getSortedBySegmentSize(SegmentInfos infos, MergePolicy.MergeContext mergeContext) throws IOException - Throws:
IOException
-
findMerges
public MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos infos, MergePolicy.MergeContext mergeContext) throws IOException Description copied from class:MergePolicy
Determine what set of merge operations are now necessary on the index.IndexWriter
calls this whenever there is a change to the segments. This call is always synchronized on theIndexWriter
instance so only one thread at a time will call this method.- Specified by:
findMerges
in classMergePolicy
- Parameters:
mergeTrigger
- the event that triggered the mergeinfos
- the total set of segments in the indexmergeContext
- the IndexWriter to find the merges on- Throws:
IOException
-
doFindMerges
private MergePolicy.MergeSpecification doFindMerges(List<TieredMergePolicy.SegmentSizeAndDocs> sortedEligibleInfos, long maxMergedSegmentBytes, int mergeFactor, int allowedSegCount, int allowedDelCount, int allowedDocCount, TieredMergePolicy.MERGE_TYPE mergeType, MergePolicy.MergeContext mergeContext, boolean maxMergeIsRunning) throws IOException - Throws:
IOException
-
score
protected TieredMergePolicy.MergeScore score(List<SegmentCommitInfo> candidate, boolean hitTooLarge, Map<SegmentCommitInfo, TieredMergePolicy.SegmentSizeAndDocs> segmentsSizes) throws IOExceptionExpert: scores one merge; subclasses can override.- Throws:
IOException
-
findForcedMerges
public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxSegmentCount, Map<SegmentCommitInfo, Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOExceptionDescription copied from class:MergePolicy
Determine what set of merge operations is necessary in order to merge to<=
the specified segment count.IndexWriter
calls this when itsIndexWriter.forceMerge(int)
method is called. This call is always synchronized on theIndexWriter
instance so only one thread at a time will call this method.- Specified by:
findForcedMerges
in classMergePolicy
- Parameters:
infos
- the total set of segments in the indexmaxSegmentCount
- requested maximum number of segments in the indexsegmentsToMerge
- contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.mergeContext
- the MergeContext to find the merges on- Throws:
IOException
-
findForcedDeletesMerges
public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos infos, MergePolicy.MergeContext mergeContext) throws IOException Description copied from class:MergePolicy
Determine what set of merge operations is necessary in order to expunge all deletes from the index.- Specified by:
findForcedDeletesMerges
in classMergePolicy
- Parameters:
infos
- the total set of segments in the indexmergeContext
- the MergeContext to find the merges on- Throws:
IOException
-
getMaxAllowedDocs
int getMaxAllowedDocs(int totalMaxDoc, int totalDelDocs) -
floorSize
private long floorSize(long bytes) -
toString
-