public class HyphenationCompoundWordTokenFilter extends CompoundWordTokenFilterBase
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokensinput| Constructor and Description |
|---|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
Set dictionary) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
String[] dictionary) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
decomposeInternal(Token token) |
static HyphenationTree |
getHyphenationTree(File hyphenationFile)
Create a hyphenator tree
|
static HyphenationTree |
getHyphenationTree(Reader hyphenationReader)
Create a hyphenator tree
|
static HyphenationTree |
getHyphenationTree(String hyphenationFilename)
Create a hyphenator tree
|
addAllLowerCase, createToken, decompose, makeDictionary, makeLowerCaseCopy, nextclose, resetnextpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
input - the token stream to processhyphenator - the hyphenation pattern tree to use for hyphenationdictionary - the word dictionary to match againstminWordSize - only words longer than this get processedminSubwordSize - only subwords longer than this get to the output
streammaxSubwordSize - only subwords shorter than this get to the output
streamonlyLongestMatch - Add only the longest matching subword to the streampublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, String[] dictionary)
input - the token stream to processhyphenator - the hyphenation pattern tree to use for hyphenationdictionary - the word dictionary to match againstpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, Set dictionary)
input - the token stream to processhyphenator - the hyphenation pattern tree to use for hyphenationdictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain
lower case strings.public HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
input - the token stream to processhyphenator - the hyphenation pattern tree to use for hyphenationdictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain
lower case strings.minWordSize - only words longer than this get processedminSubwordSize - only subwords longer than this get to the output
streammaxSubwordSize - only subwords shorter than this get to the output
streamonlyLongestMatch - Add only the longest matching subword to the streampublic static HyphenationTree getHyphenationTree(String hyphenationFilename) throws Exception
hyphenationFilename - the filename of the XML grammar to loadExceptionpublic static HyphenationTree getHyphenationTree(File hyphenationFile) throws Exception
hyphenationFile - the file of the XML grammar to loadExceptionpublic static HyphenationTree getHyphenationTree(Reader hyphenationReader) throws Exception
hyphenationReader - the reader of the XML grammar to load fromExceptionprotected void decomposeInternal(Token token)
decomposeInternal in class CompoundWordTokenFilterBaseCopyright © 2000-2013 Apache Software Foundation. All Rights Reserved.