Class ExtractorBasedTokenCombiner

  extended by org.apache.lucene.util.AttributeSource
      extended by com.twitter.common.text.token.TokenStream
          extended by com.twitter.common.text.token.TokenProcessor
              extended by com.twitter.common.text.combiner.ExtractorBasedTokenCombiner
Direct Known Subclasses:
EmoticonTokenCombiner, HashtagTokenCombiner, PossessiveContractionTokenCombiner, PunctuationExceptionCombiner, StockTokenCombiner, URLTokenCombiner, UserNameTokenCombiner

public class ExtractorBasedTokenCombiner
extends TokenProcessor

Combines multiple tokens into a single one if they define an entity identified by an extractor TokenStream.

Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
Constructor Summary
ExtractorBasedTokenCombiner(TokenStream inputStream)
Method Summary
 boolean incrementToken()
          Consumers call this method to advance the stream to the next token.
 void reset(CharSequence input)
          Resets this TokenStream (and also downstream tokens if they exist) to parse a new input.
protected  void setExtractor(TokenStream extractor)
protected  void setType(TokenType type)
Methods inherited from class com.twitter.common.text.token.TokenProcessor
getInputStream, getInstanceOf
Methods inherited from class com.twitter.common.text.token.TokenStream
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail


public ExtractorBasedTokenCombiner(TokenStream inputStream)
Method Detail


protected void setExtractor(TokenStream extractor)


protected void setType(TokenType type)


public void reset(CharSequence input)
Description copied from class: TokenStream
Resets this TokenStream (and also downstream tokens if they exist) to parse a new input.

reset in class TokenProcessor
input - new text to parse.


public boolean incrementToken()
Description copied from class: TokenStream
Consumers call this method to advance the stream to the next token.

Specified by:
incrementToken in class TokenStream
false for end of stream; true otherwise