com.twitter.common.text.tokenizer
Class RegexTokenizer.AbstractBuilder<N extends RegexTokenizer,T extends RegexTokenizer.AbstractBuilder<N,T>>

java.lang.Object
  extended by com.twitter.common.text.tokenizer.RegexTokenizer.AbstractBuilder<N,T>
Direct Known Subclasses:
LatinTokenizer.Builder, RegexTokenizer.Builder
Enclosing class:
RegexTokenizer

public abstract static class RegexTokenizer.AbstractBuilder<N extends RegexTokenizer,T extends RegexTokenizer.AbstractBuilder<N,T>>
extends Object


Constructor Summary
protected RegexTokenizer.AbstractBuilder(N tokenizer)
           
 
Method Summary
 N build()
           
protected  T self()
           
 T setDelimiterPattern(Pattern delimiterPattern)
          Sets the Regex pattern of the delimiter.
 T setKeepPunctuation(boolean keepPunctuation)
          Specifies whether to keep punctuations (which is specified by delimiterPattern and punctuationGroupInDelimiterPattern) in the output token stream.
 T setPunctuationGroupInDelimiterPattern(int group)
          Sets the ID of the group in delimiterPattern that should be handled as punctuation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegexTokenizer.AbstractBuilder

protected RegexTokenizer.AbstractBuilder(N tokenizer)
Method Detail

self

protected T self()

setDelimiterPattern

public T setDelimiterPattern(Pattern delimiterPattern)
Sets the Regex pattern of the delimiter. An input text is tokenized by the CharSequence specified by this pattern.

Parameters:
delimiterPattern - Regex pattern of delimiter.
Returns:
this Builder object

setPunctuationGroupInDelimiterPattern

public T setPunctuationGroupInDelimiterPattern(int group)
Sets the ID of the group in delimiterPattern that should be handled as punctuation. For example, you can set delimiterPattern as "([.,])\\s+" and punctuationGroup as 1 in order to detect comma and period as punctuations.

Parameters:
group - group ID of punctuation in delimiterPattern.
Returns:
this Builder object

setKeepPunctuation

public T setKeepPunctuation(boolean keepPunctuation)
Specifies whether to keep punctuations (which is specified by delimiterPattern and punctuationGroupInDelimiterPattern) in the output token stream.

Parameters:
keepPunctuation - true to keep delimiters. false otherwise.
Returns:
this Builder object.

build

public N build()