A Heron topology is a directed acyclic graph used to process streams of data. Heron topologies consist of three basic components: spouts and bolts, which are connected via streams of tuples. Below is a visual illustration of a simple topology:
Spouts are responsible for emitting tuples into the topology, while bolts are responsible for processing those tuples. In the diagram above, spout S1 feeds tuples to bolts B1 and B2 for processing; in turn, bolt B1 feeds processed tuples to bolts B3 and B4, while bolt B2 feeds processed tuples to bolt B4.
This is just a simple example; you can use bolts and spouts to form arbitrarily complex topologies.
- submit the topology to the cluster. The topology is not yet processing streams but is ready to be activated.
- activate the topology. The topology will begin processing streams in accordance with the topology architecture that you’ve created.
- restart an active topology if, for example, you need update the topology configuration.
- deactivate the topology. Once deactivated, the topology will stop processing but remain running in the cluster.
- kill a topology to completely remove it from the cluster. It is no longer known to the Heron cluster and can no longer be activated. Once killed, the only way to run that topology is to re-submit it.
A Heron spout is a source of streams, responsible for emitting tuples into the topology. A spout may, for example, read data from a Kestrel queue or read tweets from the Twitter API and emit tuples to one or more bolts.
Information on building spouts can be found in Building Spouts.
A Heron bolt consumes streams of tuples emitted by spouts and performs some set of user-defined processing operations on those tuples, which may include performing complex stream transformations, performing storage operations, aggregating multiple streams into one, emitting tuples to other bolts within the topology, and much more.
Information on building bolts can be found in Building Bolts.
Heron has a fundamentally tuple-driven data model. You can find more information in Heron’s Data Model.
A topology’s logical plan is analagous to a database query plan. The image at the top of this page is an example logical plan for a topology.
A topology’s physical plan is related to its logical plan but with the crucial difference that a physical plan maps the actual execution logic of a topology, including the machines running each spout or bolt and more. Here’s a rough visual representation of a physical plan: