Class UserDictionary
java.lang.Object
org.apache.lucene.analysis.ja.dict.UserDictionary
- All Implemented Interfaces:
Dictionary
Class for building a User Dictionary. This class allows for custom segmentation of phrases.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
private final String[]
private static final int[][]
private final TokenInfoFST
static final int
private static final Pattern
static final int
private final int[][]
private static final Pattern
private static final Pattern
static final int
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate String[]
getAllFeaturesArray
(int wordId) getBaseForm
(int wordId, char[] surface, int off, int len) Get base form of wordprivate String
getFeature
(int wordId, int... fields) getFST()
getInflectionForm
(int wordId) Get inflection form of tokensgetInflectionType
(int wordId) Get inflection type of tokensint
getLeftId
(int wordId) Get left id of specified wordgetPartOfSpeech
(int wordId) Get Part-Of-Speech of tokensgetPronunciation
(int wordId, char[] surface, int off, int len) Get pronunciation of tokensgetReading
(int wordId, char[] surface, int off, int len) Get reading of tokensint
getRightId
(int wordId) Get right id of specified wordint
getWordCost
(int wordId) Get word cost of specified wordint[][]
lookup
(char[] chars, int off, int len) Lookup words in textint[]
lookupSegmentation
(int phraseID) static UserDictionary
-
Field Details
-
LINE_COMMENT
-
WHITESPACE
-
SPACES
-
fst
-
segmentations
private final int[][] segmentations -
data
-
CUSTOM_DICTIONARY_WORD_ID_OFFSET
private static final int CUSTOM_DICTIONARY_WORD_ID_OFFSET- See Also:
-
WORD_COST
public static final int WORD_COST- See Also:
-
LEFT_ID
public static final int LEFT_ID- See Also:
-
RIGHT_ID
public static final int RIGHT_ID- See Also:
-
EMPTY_RESULT
private static final int[][] EMPTY_RESULT
-
-
Constructor Details
-
UserDictionary
- Throws:
IOException
-
-
Method Details
-
open
- Throws:
IOException
-
lookup
Lookup words in text- Parameters:
chars
- textoff
- offset into textlen
- length of text- Returns:
- array of {wordId, position, length}
- Throws:
IOException
-
getFST
-
lookupSegmentation
public int[] lookupSegmentation(int phraseID) -
getLeftId
public int getLeftId(int wordId) Description copied from interface:Dictionary
Get left id of specified word- Specified by:
getLeftId
in interfaceDictionary
- Returns:
- left id
-
getRightId
public int getRightId(int wordId) Description copied from interface:Dictionary
Get right id of specified word- Specified by:
getRightId
in interfaceDictionary
- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId) Description copied from interface:Dictionary
Get word cost of specified word- Specified by:
getWordCost
in interfaceDictionary
- Returns:
- word's cost
-
getReading
Description copied from interface:Dictionary
Get reading of tokens- Specified by:
getReading
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
Description copied from interface:Dictionary
Get Part-Of-Speech of tokens- Specified by:
getPartOfSpeech
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Part-Of-Speech of the token
-
getBaseForm
Description copied from interface:Dictionary
Get base form of word- Specified by:
getBaseForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getPronunciation
Description copied from interface:Dictionary
Get pronunciation of tokens- Specified by:
getPronunciation
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
Description copied from interface:Dictionary
Get inflection type of tokens- Specified by:
getInflectionType
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
Description copied from interface:Dictionary
Get inflection form of tokens- Specified by:
getInflectionForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection form, or null
-
getAllFeaturesArray
-
getFeature
-