I recently switched the BioMAJ developments to a micro service architecture. However, using micro services complexifies the debug of the application, you get logs in multiple apps/hosts/containers and it is not easy to follow the « stream » of the application.
Regarding logs, this can be quite easilly done using central logging with servers like Graylog.
But remains the problem to look at the flow of the messages between the different micro services/applications with timing info.
I decided to use zipkin (http://zipkin.io/) which is a server with a simple API to send spans (basically a message containing a trace identifier with additional info). Then web interface shows the flow of the message with message hierarchy (app1 send message to app2 then receive an answer, …).
You can easily see when messages are sent and answers received, on which server (if a micro service is scaled to multiple instances).
There are some client libraries available, however they did not match my needs as they were all web oriented. In BioMAJ we use web services with REST APIs but also AMPQ messaging with RabbitMQ. So I implemented a simple library to track my workflow and send message info to zipkin (https://github.com/genouest/biomaj-zipkin).
This is quite simple in fact. You send HTTP with JSON to the zipkin server, each message containing a few info.
Each message contains a traceId which is a common identifer to all messages in your « workflow ». Then a spanId defines a unique identifier that maps the method you need to track.
It can also define a parentId which refers to the spanId of the task that called your method (locally or from a remote service).
At last they need a timestamp (start of the call), a name that defines the current operation and a duration.
You can also add binaryAnnotation, i.e. some basic key/value parameters to give additional info about the message/flow you want to track (the url, the result of operation, input parameters…).
Endpoints can be declared in message to track the IP and port used when receiving or sending the message.
Each message defines the type of the message :
cs : client send
cr: client receive
sr: server receive
ss: server send
Zipkin will display the event when the end of the operation is declared (cr or ss).
Here is an example output from a BioMAJ workflow
Some spans are in a common micro service, describing the flow of operation, while others are executed in remote micro services.
You can see what tasks are executed in parallel, sequentially, where and when.
With such tool, you can easily debug your distributed application, showing bottlenecks (where and when a task takes time, how they are, or not, parralllized).
Creating a library is also really easy.
Some existing libraries propose to execute some sampling, i.e. trace 1 request per X, to get a short overview in a production system.
In BioMAJ, we track requests on demand, so no sampling implemented.
Zipkin API: http://zipkin.io/zipkin-api/#/paths/%252Fspans
Some message examples: http://zipkin.io/pages/data_model.html
Using API info: http://zipkin.io/pages/instrumenting.html