public interface VocabCache
Modifier and Type | Method and Description |
---|---|
double |
idf(String word)
Number of documents word has occurred in
|
void |
incrementCount(String word)
Increment a word count by 1
|
void |
incrementCount(String word,
double by)
Increment count for a word
|
void |
incrementDocCount(String word)
Increment the doc count for a word by 1
|
void |
incrementDocCount(String word,
double by)
Increment the document count for a particular word
|
void |
incrementNumDocs(double by)
Increment the number of documents
|
void |
initialize(Configuration conf)
Configuration for initializing
|
int |
minWordFrequency()
The min word frequency
needed to be included in the vocab
(default 5)
|
double |
numDocs()
Number of documents
|
double |
tfidf(String word,
double frequency,
boolean smoothIdf)
Calculate the tfidf of the word given the document frequency
|
Index |
vocabWords()
All of the vocab words (ordered)
note that these are not all the possible tokens
|
String |
wordAt(int i)
Returns a word in the vocab at a particular index
|
double |
wordFrequency(String word)
Get the word frequency for a word
|
int |
wordIndex(String word) |
void incrementNumDocs(double by)
by
- double numDocs()
String wordAt(int i)
i
- the index to getint wordIndex(String word)
void initialize(Configuration conf)
conf
- the configuration to initialize withdouble wordFrequency(String word)
word
- the word to get frequency forint minWordFrequency()
Index vocabWords()
void incrementDocCount(String word)
word
- the word to increment the count forvoid incrementDocCount(String word, double by)
word
- the word to increment the count forby
- the amount to increment byvoid incrementCount(String word)
word
- the word to increment the count forvoid incrementCount(String word, double by)
word
- the word to increment the count forby
- the amount to increment bydouble idf(String word)
word
- the word to get the idf fordouble tfidf(String word, double frequency, boolean smoothIdf)
word
- the word to get frequency forfrequency
- the frequencyCopyright © 2020. All rights reserved.