Class | Description |
---|---|
CommonPreprocessor |
A TokenPreProcess implementation that removes puncuation marks and lower-cases.
|
CustomStemmingPreprocessor |
This is StemmingPreprocessor compatible with different StemmingProcessors defined as lucene/tartarus SnowballProgram
such as: RussianStemmer, DutchStemmer, FrenchStemmer etc.
|
EmbeddedStemmingPreprocessor |
This tokenizer preprocessor uses given preprocessor + does english Porter stemming on tokens on top of it
|
EndingPreProcessor |
Gets rid of endings:
ed,ing, ly, s, .
|
LowCasePreProcessor | |
StemmingPreprocessor |
This tokenizer preprocessor implements basic cleaning inherited from CommonPreprocessor + does english Porter stemming on tokens
PLEASE NOTE: This preprocessor is thread-safe by using synchronized method
|
StringCleaning |
Various string cleaning utils
|