This guide provides basic help for issues frequently encountered when deploying topologies.

1. How can I get more debugging information?

Enable the --verbose flag to see more debugging information, for example

heron submit ... ExclamationTopology --verbose        

2. Why does the topology launch successfully but fail to start?

Even if the topology is submitted successfully, it could still fail to start some component. For example, TMaster may fail to start due to unfulfilled dependencies.

For example, the following message can appear:

$ heron activate local ExclamationTopology

...

[2016-05-27 12:02:38 -0600] com.twitter.heron.common.basics.FileUtils SEVERE: \
Failed to read from file.
java.nio.file.NoSuchFileException: \
/home//.herondata/repository/state/local/pplans/ExclamationTopology

...

[2016-05-27 12:02:38 -0600] com.twitter.heron.spi.utils.TMasterUtils SEVERE: \
Failed to get physical plan for topology ExclamationTopology

...

ERROR: Failed to activate topology 'ExclamationTopology'
INFO: Elapsed time: 1.883s.

What to do

  • This file will show if any specific components have failed to start.

    ~/.herondata/topologies/{cluster}/{role}/{TopologyName}/heron-executor.stdout
    

    For example, there may be errors when trying to spawn a Stream Manager process in the file:

    Running stmgr-1 process as ./heron-core/bin/heron-stmgr ExclamationTopology \
    ExclamationTopology0a9c6550-7f3d-44fb-97ea-5c779fac6924 ExclamationTopology.defn LOCALMODE \
    /Users/${USERNAME}/.herondata/repository/state/local stmgr-1 \
    container_1_word_2,container_1_exclaim1_1 58106 58110 58109 ./heron-conf/heron_internals.yaml
    2016-06-09 16:20:28:  stdout:
    2016-06-09 16:20:28:  stderr: error while loading shared libraries: libunwind.so.8: \
    cannot open shared object file: No such file or directory
    

    Then fix it correspondingly.

  • It is also possible that the host has an issue with resolving localhost. To check, run the following command in a shell.

    $ python -c "import socket; print socket.gethostbyname(socket.gethostname())"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    socket.gaierror: [Errno 8] nodename nor servname provided, or not known
    

    If the output looks like a normal IP address, such as 127.0.0.1, you don’t have this issue. If the output is similar to the above, you need to modify the /etc/hosts file to correctly resolve localhost, as shown below.

    1. Run the following command, whose output is your computer’s hostname.

      $ python -c "import socket; print socket.gethostname()"
      
    2. Open the /etc/hosts file as superuser and find a line containing

      127.0.0.1   localhost
      
    3. Append your hostname after the word “localhost” on the line. For example, if your hostname was tw-heron, then the line should look like the following:

      127.0.0.1   localhost   tw-heron
      
    4. Save the file. The change should usually be reflected immediately, although rebooting might be necessary depending on your platform.

3. Why does the process fail during runtime?

If a component (e.g., TMaster or Stream Manager) has failed during runtime, visit the component’s logs in

~/.herondata/topologies/{cluster}/{role}/{TopologyName}/log-files/

4. How to force kill and clean up a topology?

In general, it suffices to run:

heron kill ...

If returned error, the topology can still be killed by running kill pid to kill all associated running process and rm -rf ~/.herondata/ to clean up the state.