com.twitter.common.text.combiner
Class ExtractorBasedTokenCombiner

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by com.twitter.common.text.token.TokenStream
          extended by com.twitter.common.text.token.TokenProcessor
              extended by com.twitter.common.text.combiner.ExtractorBasedTokenCombiner
Direct Known Subclasses:
EmoticonTokenCombiner, HashtagTokenCombiner, PossessiveContractionTokenCombiner, PunctuationExceptionCombiner, StockTokenCombiner, URLTokenCombiner, UserNameTokenCombiner

public class ExtractorBasedTokenCombiner
extends TokenProcessor

Combines multiple tokens into a single one if they define an entity identified by an extractor TokenStream.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Constructor Summary
ExtractorBasedTokenCombiner(TokenStream inputStream)
           
 
Method Summary
 boolean incrementToken()
          Consumers call this method to advance the stream to the next token.
 void reset(CharSequence input)
          Resets this TokenStream (and also downstream tokens if they exist) to parse a new input.
protected  void setExtractor(TokenStream extractor)
           
protected  void setType(TokenType type)
           
 
Methods inherited from class com.twitter.common.text.token.TokenProcessor
getInputStream, getInstanceOf
 
Methods inherited from class com.twitter.common.text.token.TokenStream
toStringList
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ExtractorBasedTokenCombiner

public ExtractorBasedTokenCombiner(TokenStream inputStream)
Method Detail

setExtractor

protected void setExtractor(TokenStream extractor)

setType

protected void setType(TokenType type)

reset

public void reset(CharSequence input)
Description copied from class: TokenStream
Resets this TokenStream (and also downstream tokens if they exist) to parse a new input.

Overrides:
reset in class TokenProcessor
Parameters:
input - new text to parse.

incrementToken

public boolean incrementToken()
Description copied from class: TokenStream
Consumers call this method to advance the stream to the next token.

Specified by:
incrementToken in class TokenStream
Returns:
false for end of stream; true otherwise