Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter. It manages both the collection and lookup of this data through a Collector and a Query service. We closely modelled Zipkin after the Google Dapper paper. Follow @ZipkinProject for updates.
Collecting traces helps developers gain deeper knowledge about how certain requests perform in a distributed system. Let's say we're having problems with user requests timing out. We can look up traced requests that timed out and display it in the web UI. We'll be able to quickly find the service responsible for adding the unexpected response time. If the service has been annotated adequately we can also find out where in that service the issue is happening.
There are two mailing lists you can use to get in touch with other users and developers.
Noticed a bug? https://github.com/twitter/zipkin/issues
Contributions are very welcome! Please create a pull request on github and we'll look at it as soon as possible.
Try to make the code in the pull request as focused and clean as possible, stick as close to our code style as you can.
If the pull request is becoming too big we ask that you split it into smaller ones.
Areas where we'd love to see contributions include: adding tracing to more libraries and protocols, interesting reports generated with Hadoop from the trace data, extending collector to support more transports and storage systems and other ways of visualizing the data in the web UI.
We intend to use the Semantic Versioning style versioning.
Thanks to everyone below for making Zipkin happen!