com.twitter.common.text
Class TextTokenizer

java.lang.Object
  extended by com.twitter.common.text.TextTokenizer
Direct Known Subclasses:
DefaultTextTokenizer

public abstract class TextTokenizer
extends Object


Field Summary
protected  TokenStream tokenizationStream
           
 
Constructor Summary
TextTokenizer()
           
 
Method Summary
abstract  TokenStream applyDefaultChain(TokenStream tokenizer)
           
 TokenStream getDefaultTokenStream()
          Returns TokenStream to tokenize a text.
 TokenizedCharSequence tokenize(CharSequence input)
          Tokenizes a CharSequence, and returns a TokenizedCharSequence as a result.
 List<String> tokenizeToStrings(CharSequence input)
          Tokenizes a CharSequence into a list of Strings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tokenizationStream

protected TokenStream tokenizationStream
Constructor Detail

TextTokenizer

public TextTokenizer()
Method Detail

applyDefaultChain

public abstract TokenStream applyDefaultChain(TokenStream tokenizer)

getDefaultTokenStream

public TokenStream getDefaultTokenStream()
Returns TokenStream to tokenize a text.

Returns:
TokenStream to tokenize the text

tokenize

public TokenizedCharSequence tokenize(CharSequence input)
Tokenizes a CharSequence, and returns a TokenizedCharSequence as a result.

Parameters:
input - text to be tokenized
Returns:
TokenizedCharSequence instance

tokenizeToStrings

public List<String> tokenizeToStrings(CharSequence input)
Tokenizes a CharSequence into a list of Strings.

Parameters:
input - text to be tokenized
Returns:
a list of tokens as String objects