com.twitter.common.text.tokenizer
Class LatinTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by com.twitter.common.text.token.TokenStream
          extended by com.twitter.common.text.tokenizer.RegexTokenizer
              extended by com.twitter.common.text.tokenizer.LatinTokenizer

public class LatinTokenizer
extends RegexTokenizer

Tokenizes text written in Latin alphabets such as English, French, German.


Nested Class Summary
static class LatinTokenizer.Builder
           
 
Nested classes/interfaces inherited from class com.twitter.common.text.tokenizer.RegexTokenizer
RegexTokenizer.AbstractBuilder<N extends RegexTokenizer,T extends RegexTokenizer.AbstractBuilder<N,T>>
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Constructor Summary
protected LatinTokenizer()
           
 
Method Summary
 
Methods inherited from class com.twitter.common.text.tokenizer.RegexTokenizer
incrementToken, reset, setDelimiterPattern, setKeepPunctuation, setPunctuationGroupInDelimiterPattern
 
Methods inherited from class com.twitter.common.text.token.TokenStream
getInstanceOf, toStringList
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

LatinTokenizer

protected LatinTokenizer()