A representation of a tokenizer.
The ansj_seg of the open source segmentation algorithm comes form github,the link: https://github.com/NLPchina/ansj_seg When the open source code that obeyed the Apache 2.0 license is reused, its latest commit ID is dedc45fdf85dfd2d4c691fb1f147d7cbf9a5d7fb and its copyright 2011-2016
OpenNLP Tokenizer annotator.
Tokenizer based on the
A thin wrapper for Japanese Morphological Analyzer Kuromoji (ver.0.9.0), it tokenizes texts which is written in languages that words are not separated by whitespaces.
Created by kepricon on 16.
Filter by part of speech tag.
Tokenizer based on the passed in analysis engine