public class StandardAnalyzer extends Analyzer
StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length
|
static java.lang.String[] |
STOP_WORDS
An array containing some common English words that are usually not
useful for searching.
|
| Constructor and Description |
|---|
StandardAnalyzer()
Builds an analyzer with the default stop words (
STOP_WORDS). |
StandardAnalyzer(boolean replaceInvalidAcronym)
Deprecated.
Remove in 3.X and make true the only valid value
|
StandardAnalyzer(java.io.File stopwords)
Builds an analyzer with the stop words from the given file.
|
StandardAnalyzer(java.io.File stopwords,
boolean replaceInvalidAcronym)
Deprecated.
Remove in 3.X and make true the only valid value
|
StandardAnalyzer(java.io.Reader stopwords)
Builds an analyzer with the stop words from the given reader.
|
StandardAnalyzer(java.io.Reader stopwords,
boolean replaceInvalidAcronym)
Deprecated.
Remove in 3.X and make true the only valid value
|
StandardAnalyzer(java.util.Set stopWords)
Builds an analyzer with the given stop words.
|
StandardAnalyzer(java.util.Set stopwords,
boolean replaceInvalidAcronym)
Deprecated.
Remove in 3.X and make true the only valid value
|
StandardAnalyzer(java.lang.String[] stopWords)
Builds an analyzer with the given stop words.
|
StandardAnalyzer(java.lang.String[] stopwords,
boolean replaceInvalidAcronym)
Deprecated.
Remove in 3.X and make true the only valid value
|
| Modifier and Type | Method and Description |
|---|---|
static boolean |
getDefaultReplaceInvalidAcronym()
Deprecated.
This will be removed (hardwired to true) in 3.0
|
int |
getMaxTokenLength() |
boolean |
isReplaceInvalidAcronym()
Deprecated.
This will be removed (hardwired to true) in 3.0
|
TokenStream |
reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
static void |
setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated.
This will be removed (hardwired to true) in 3.0
|
void |
setMaxTokenLength(int length)
Set maximum allowed token length.
|
void |
setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated.
This will be removed (hardwired to true) in 3.0
|
TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
|
close, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStreampublic static final java.lang.String[] STOP_WORDS
public static final int DEFAULT_MAX_TOKEN_LENGTH
public StandardAnalyzer()
STOP_WORDS).public StandardAnalyzer(java.util.Set stopWords)
public StandardAnalyzer(java.lang.String[] stopWords)
public StandardAnalyzer(java.io.File stopwords)
throws java.io.IOException
java.io.IOExceptionWordlistLoader.getWordSet(File)public StandardAnalyzer(java.io.Reader stopwords)
throws java.io.IOException
java.io.IOExceptionWordlistLoader.getWordSet(Reader)public StandardAnalyzer(boolean replaceInvalidAcronym)
replaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068public StandardAnalyzer(java.io.Reader stopwords,
boolean replaceInvalidAcronym)
throws java.io.IOException
stopwords - The stopwords to usereplaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068java.io.IOExceptionpublic StandardAnalyzer(java.io.File stopwords,
boolean replaceInvalidAcronym)
throws java.io.IOException
stopwords - The stopwords to usereplaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068java.io.IOExceptionpublic StandardAnalyzer(java.lang.String[] stopwords,
boolean replaceInvalidAcronym)
throws java.io.IOException
stopwords - The stopwords to usereplaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068java.io.IOExceptionpublic StandardAnalyzer(java.util.Set stopwords,
boolean replaceInvalidAcronym)
throws java.io.IOException
stopwords - The stopwords to usereplaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068java.io.IOExceptionpublic static boolean getDefaultReplaceInvalidAcronym()
public static void setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym)
replaceInvalidAcronym - Set to true to have new
instances of StandardTokenizer replace mischaracterized
acronyms by default. Set to false to preseve the
previous (before 2.4) buggy behavior. Alternatively,
set the system property
org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
to false.
See https://issues.apache.org/jira/browse/LUCENE-1068public TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
tokenStream in class Analyzerpublic void setMaxTokenLength(int length)
public int getMaxTokenLength()
setMaxTokenLength(int)public TokenStream reusableTokenStream(java.lang.String fieldName, java.io.Reader reader) throws java.io.IOException
AnalyzerreusableTokenStream in class Analyzerjava.io.IOExceptionpublic boolean isReplaceInvalidAcronym()
public void setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
replaceInvalidAcronym - Set to true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.