com.twitter.summingbird.example
This function will be called by the storm runner to request the info of what to run.
This function will be called by the storm runner to request the info of what to run. In local mode it will start up as a separate thread on the local machine, pulling tweets off of the TwitterSpout, generating and aggregating key-value pairs and merging the incremental counts in the memcache store.
Before running this code, make sure to start a local memcached instance with "memcached". ("brew install memcached" will get you all set up if you don't already have memcache installed locally.)
Configuration for Twitter4j.
Configuration for Twitter4j. Configuration can also be managed via a properties file, as described here:
http://tugdualgrall.blogspot.com/2012/11/couchbase-create-large-dataset-using.html
Once you've got this running in the background, fire up another repl and query memcached for some counts.
Once you've got this running in the background, fire up another repl and query memcached for some counts.
The following commands will look up words. Hitting a word twice will show that Storm is updating Memcache behind the scenes:
scala> lookup("i") // Or any other common word res7: Option[Long] = Some(1774) scala> lookup("i") res8: Option[Long] = Some(1779)
"spout" is a concrete Storm source for Status data.
"spout" is a concrete Storm source for Status data. This will act as the initial producer of Status instances in the Summingbird word count job.
the param to store is by name, so this is still not created created yet
And here's our MergeableStore supplier.
And here's our MergeableStore supplier.
A supplier is required (vs a bare store) because Storm serializes every constructor parameter to its "bolts". Serializing a live memcache store is a no-no, so the Storm platform accepts a "supplier", essentially a function0 that when called will pop out a new instance of the required store. This instance is cached when the bolt starts up and starts merging tuples.
A MergeableStore is a store that's aware of aggregation and knows how to merge in new (K, V) pairs using a Monoid[V]. The Monoid[Long] used by this particular store is being pulled in from the Monoid companion object in Algebird. (Most trivial Monoid instances will resolve this way.)
First, the backing store:
The following object contains code to execute the Summingbird WordCount job defined in ExampleJob.scala on a storm cluster.