tracing micro service

I recently switched the BioMAJ developments to a micro service architecture. However, using micro services complexifies the debug of the application, you get logs in multiple apps/hosts/containers and it is not easy to follow the « stream » of the application.

Regarding logs, this can be quite easilly done using central logging with servers like Graylog.

But remains the problem to look at the flow of the messages between the different micro services/applications with timing info.

I decided to use zipkin (http://zipkin.io/) which is a server with a simple API to send spans (basically a message containing a trace identifier with additional info). Then web interface shows the flow of the message with message hierarchy (app1 send message to app2 then receive an answer, …).

You can easily see when messages are sent and answers received, on which server (if a micro service is scaled to multiple instances).

There are some client libraries available, however they did not match my needs as they were all web oriented. In BioMAJ we use web services with REST APIs but also AMPQ messaging with RabbitMQ. So I implemented a simple library to track my workflow and send message info to zipkin (https://github.com/genouest/biomaj-zipkin).

This is quite simple in fact. You send HTTP with JSON to the zipkin server, each message containing a few info.

Each message contains a traceId which is a common identifer to all messages in your « workflow ». Then a spanId defines a unique identifier that maps the method you need to track.

It can also define a parentId which refers to the spanId of the task that called your method (locally or from a remote service).

At last they need a timestamp (start of the call), a name that defines the current operation and a duration.

You can also add binaryAnnotation, i.e. some basic key/value parameters to give additional info about the message/flow you want to track (the url, the result of operation, input parameters…).

Endpoints can be declared in message to track the IP and port used when receiving or sending the message.

Each message defines the type of the message :

cs : client send

cr: client receive

sr: server receive

ss: server send

Zipkin will display the event when the end of the operation is declared (cr or ss).

Here is an example output from a BioMAJ workflow

screenshot-from-2016-11-25-14-13-51

Some spans are in a common micro service, describing the flow of operation, while others are executed in remote micro services.

You can see what tasks are executed in parallel, sequentially, where and when.

With such tool, you can easily debug your distributed application, showing bottlenecks (where and when a task takes time, how they are, or not, parralllized).

Creating a library is also really easy.

Some existing libraries propose to execute some sampling, i.e. trace 1 request per X, to get a short overview in a production system.

In BioMAJ, we track requests on demand, so no sampling implemented.

 

Zipkin API: http://zipkin.io/zipkin-api/#/paths/%252Fspans

Some message examples: http://zipkin.io/pages/data_model.html

Using API info: http://zipkin.io/pages/instrumenting.html

Very simple rkt registry for meta discovery with signatures

Rkt, the Docker alternative for containers search for containers with meta discovery. Basically, it searches for the container at the url specific in the container name.
To provide a basic registry for local hosted rkt containers, here is a very simple program.

We puts rkt aci files under a directory, let’s name it /opt/rkt-registry.

On your server is named with domain https://myserver.com, we put an https proxy (Apache, Nginx, …) which forward/proxies requests to our application (see after).

Under this directory, you place your aci files and signatures (see https://coreos.com/rkt/docs/latest/signing-and-verification-guide.html#distributing-images-via-meta-discovery) following container names, example:

/opt/rkt-registry
    => sample
        => pubkeys.gpg # The public key used to sign those containers
        => myserver.com-sample-test-latest.aci # hostname + container name with dashes + tag + extension
        => myserver.com-sample-test-latest.aci.asc # signature
        => myserver.com-sample-test2-mytag.aci
        => myserver.com-sample-test2-mytag.aci

Here is a basic Python Flask server that will manage rkt discovery

from flask import Flask
from flask import render_template, abort, send_file
from flask import request
import os
import logging
app = Flask(__name__)
app.config['REGISTRY'] = 'myserver.com'
app.config['REGISTRY_DIR'] = '/opt/rkt-registry'
# Override config from RKT_REGISTRY_SETTINGS file is set in environment
if 'RKT_REGISTRY_SETTINGS' in os.environ and os.path.exists(os.environ('RKT_REGISTRY_SETTINGS')):
app.config.from_envvar('RKT_REGISTRY_SETTINGS')

@app.route("/")
def discovery(container=None):
logging.debug('#Container: '+str(container))
acdiscovery = request.args.get('ac-discovery', None)
logging.warn('ac-discovery: '+str(acdiscovery))
registry = app.config['REGISTRY']
file_prefix = registry + '-' + container.replace('/','-')
if acdiscovery == '1':
# Return an html page with the meta information about this container, see discovery.html below
return render_template('discovery.html', container=container,name=file_prefix,registry=registry)
else:
# Here, rkt requests the container or its signature, simple serve the file
return send_file(os.path.join(app.config['REGISTRY_DIR'],container))

if __name__ == "__main__":
app.run()

The template to answer to the discovery, under the templates dir of the flask application

<!DOCTYPE html> <html lang="en"> <head> <meta name="ac-discovery" content="{{registry}}/{{container}} https://{{registry}}/{{container}}/{{name}}-{version}.{ex t}"> <meta name="ac-discovery-pubkeys" content="{{registry}}/{{container}} https://{{registry}}/{{container}}/pubkeys.gpg"> </head> </html>

In this example, I do not manage the arch etc.. The template could be modified to match different container architecture with following template: https://{{registry}}/{{container}}/{name}-{version}-{os}-{arch}.{ext}

In this case, of course, the container name should not be myserver.com-sample-test-latest.aci but something like myserver.com-sample-test-linux-amd64-latest.aci, this depends on your image generation.

To convert a docker image to a rkt image, you can use docker2aci

docker save -o test.tar mydockerregistry.com/me/test # Export/save your Docker container
docker2aci test.tar # Convert it
# one can use convert directly from a docker registry with docker2aci docker://mydockerregistry.com/me/test
mv mydockerregistry.com-me-test-latest.aci /opt/rkt-registry/sample/myserver.com-me-test-latest.aci # Rename to match your rkt registry
# Sign your new image following https://coreos.com/rkt/docs/latest/signing-and-verification-guide.html#distributing-images-via-meta-discovery to generate .asc file and put you public key

To fetch/run the image from rtk, one must of course trust the key (see paragraph Trusting the example.com/hello key)

Container security

During the development of Bioshadock, a Docker private registry targeting bioinformatics tools, I wanted to provide users/developpers some security information on provided containers.

Indeed, one of the issue with containers is to keep them up-to-date against vulnerabilities. The container is usually built once and ready for download. But what if a security flaw is discuvered afterwards in an embedded process (openssh, etc….) ?

That’s why I was very interesed by CoreOS Clair announcement. Clair is an open source project for the static analysis of vulnerabilities in appc and docker containers. It basically checks for knowns VCE, etc. from Debian/Ubuntu and RedHat databases in a container and store the result of those checks.

Clair however does not provide any CLI or web interface. You can easilly start it with Docker (or build and run locally) against a Postgres database (again local or in a container).

There is a simple tool analyze-local-images that helps importing a local image in Clair and get a report. However it did not fit for integration in my development environement and did not work if Docker process is listenting on tcp instead of socks.

So I create a Python equivalent as a library, clair in pypi, provided with a very simple example to import an image and how to extract known vulnerabilities. Code is available at https://bitbucket.org/osallou/clair.

Code is really simple, and should help you to understand how to play with Clair.

Clair provides a simple REST API. All you need is to save a Docker image locally (as a tar file), extract the archive and read the manifest.json. In this file, you get the list of the layers of the image. For each layer, you inject in Clair API the layer (its name abcd , its parent, and its path abcd/layer.tar) . Of course the layer file must be accessible by Clair via a host volume if running in a container. Clair will analyse this layer and make it accesssible via the API using the layer id.

The trick is Clair does not scan nor provide reports for an image, but only layers. If you want a report for an image, it is up to you to query Clair for all the layers of your image (and to import each layer for analysis).

Why layers ? Just like for Docker itself, or its registry… Layers can be shared among several images, managing layer by layer prevents rescanning, storing, … data about something already scanned.

For my first tests, Clair was really quick to scan the layers of my image, and it is very promising (free and open source). We have integrated with our registry to provide information to container creator, but also end-user about security info for this container. Then creator can decide to update his container, or user may decide not to take a container due to critical flaws in this container.

Of course, process must be done at regular interval, not just at container creation, to get up-to-date information.

GPU scheduling with Mesos and Docker

Docker Swarm is an easy solution to dispatch Docker containers in a cluster, but Swarm does not manage available resources. If you have 2 GPU on a node, you can ask Swarm to send the container to a node where GPU resources are available but it will not know if one or many GPU are already used. You have to manage this on application side.
With Mesos, you can define arbitrary resources, and Mesos scheduler will send offers only with available/remaining resources. This means that if a previous job specified it needs 1 GPU, next offers will only present remaining GPU resources.

For the moment, GPU are not managed natively with Mesos,though there is work in progress for this (JIRA Epic: https://issues.apache.org/jira/browse/MESOS-4424).

However, here is is simple way to manage this if you use your own Mesos framework. This is what is used in the GoDocker project.

In mesos-slave configuration, simply add additional resources:


admin@ip-10-72-136-115:/etc/mesos-slave$ cat resources

gpu_10.5:[0-1]

In this example, we define 2 GPU on the node (range 0-1) with a label gpu_10.5 (to specify the kind of GPU, its version etc…, could also be something like gpu_nvidia_10.5). The label is important as the container must contain the drivers for this GPU device. User must use the appropriate container that will match the expected GPU.

With this resource, that will be in Mesos offer, you can manage like ports, cpu, etc. the available resources for this node and decide to put your job on this node if number of resource matches your requirements.

But this is the first part of the job. Indeed, you will here tell mesos to « reserve » a GPU device, but no device directory will be mounted automatically in the container. The next step is to know what should be added to the container definition.
To do this, we add additional information to the slave with attributes.


admin@ip-10-72-136-115:/etc/mesos-slave$ cat attributes
gpu_10.5_0:dev/nvidia0,/dev/nvidiactl,/dev/nvidia-uvm
gpu_10.5_1:dev/nvidia1,/dev/nvidiactl,/dev/nvidia-uvm

When we use a GPU device, resource 0 for example, we look at offer attribute and search for the label of the resources postfixed with the index of the resource. In our example, for gpu 0 we will search for gpu_10.5_0. This attribute contains the list of the volumes to use in the container.

The framework will add to the container the list of the volumes defined with this attribute.

This is quite easy and container will only see the GPU devices needed for the job and next jobs will not see them in next offers.

Docker Swarm vs Mesos

I will not present here what are Docker swam or Mesos but rather why I prefer Mesos over Swarm in some situations.

Swarm is perfect in a development environment. Super easy to install/configure and ready to run in a minute. Basically, it is a Docker proxy, proxying your Docker commands to a node with the resources you requested (X cpu, Y RAM, disk==ssd for example). Swarm is fully compatible with the Docker API that’s why it is so easy to use and quite transparent to switch from single machine with Docker to a cluster of nodes. Though Swarm fits perfectly in a production environment too, things start to be more complex when trying to manage some types of resources (port mapping, number of disk volume, gpus, …). Indeed, swarm hides all the node stuff to you (and most of the time this is fine), but it also means that if you need a port, for example, you have to manage yourself the availability of the port on the remote node (was it already used by an other container). So your program needs to record used ports or resources, keep/free them according to container status….
But problem is not over. Let’s use a GPU scenario, where a node has 2 GPU cards. This means that you cannot schedule more than 2 containers (each requiring 1 GPU) on this node at the same time. The issue is Swarm has no idea on number of GPU on this node, and will continue to put containers on this node if requested (with a constraint device==gpu for example). And you cannot tell to swarm to stop sending containers to this node (or this can end with complex constraints, something like: device==gpu and node!==mynode1 and node!==mynode3 and …).

Mesos, on the contrary, gives you visibility on the nodes. Though install is a little more complex, it is still quite easy to setup one. However, you have to develop/use a specific framework (see my previous post) to use/dialog with Mesos. Mesos sends offers at regular interval and framework tells mesos what to do on nodes in offer. The offer will present, for example, 3 nodes with their characteristics (cpu, ram, …).
The interesting thing is you can tell on each Mesos slave what are the additional characteristics of this node. You can specific for example a port range ( [20000,30000]), or a list/set of GPU device ([« /proc/nvidia1 », « /proc/nvidia2 »]), etc.
All those characteristics appear in the offer. When placing a container, your program, via the framework, can request X cpu, Y ram and Z ports/gpu devices/… Mesos will record the requested parameters and will not propose them anymore in next offers until the container is released (over or killed).

Example:

node has port range [20.000, 30.000]. I receive an offer with available ports [20.000, 30.000] and request first port. Next offer will present [20.001, 30.000].
When container terminates, offers will again be [20.000, 30.000].
For the previous GPU example, once both of the GPU devices are used on the node, mesos will send an offer with 0 available GPU device and the program can decide to place the container on an other node or wait for available resources.

This means we can define arbitrary set of ressources, managed by the Mesos framework. The program itself does not have to manage the status of each resource, on the contrary of swarm.

The constraint, however, is Mesos API is not Docker compatible, and you may not find all available options. Docker network equialent is not available in current mesos (0.26) for example.
You also have to use a specific API and framework to dialog with Mesos.

My opinion is Mesos is perfect if you need to manage set of resources, and keep control on node placement (while not managing the nodes themselves). Swarm will be better in the other cases, if you don’t care on placement and don’t need specific resource.

Creating a Mesos Docker Framework in Python

It is quite easy to develop an Apache Mesos framework to use the infracstructure and submit tasks on it.

Howeve there is little documentation for Python usage, or Docker related usage. Though this is not really complex, I decided to ive a quick-start/example here based on what I did for one of my projects.

One required support, anyway, is to get the definition of objects you will manipulate (in Python or other languages). this is mesos.proto, available here: https://raw.githubusercontent.com/apache/mesos/master/include/mesos/mesos.proto

Requirements:
you should first install (not available on Pypi yet),the mesos.interface and mesos.native packages for Python/Mesos development. If you build mesos locally, you will find it under the build directory.

Here is the basis of a framework:

import logging
import mesos.interface
from mesos.interface import mesos_pb2
import mesos.native
class MyMesosScheduler(mesos.interface.Scheduler):

    def __init__(self, implicitAcknowledgements, executor):
        self.implicitAcknowledgements = implicitAcknowledgements
        self.executor = executor

    def registered(self, driver, frameworkId, masterInfo):
        logging.info("Registered with framework ID %s" % frameworkId.value)

    def resourceOffers(self, driver, offers):
        '''
        Basic placement strategy (loop over offers and try to push as possible)
        '''
        for offer in offers:
            logging.info(offer)
            # Let's decline the offer for the moment
            driver.declineOffer(offer.id)

    def statusUpdate(self, driver, update):
        '''
        when a task is over, killed or lost (slave crash, ....), this method
        will be triggered with a status message.
        '''
        logging.info("Task %s is in state %s" % \
            (update.task_id.value, mesos_pb2.TaskState.Name(update.state)))

    def frameworkMessage(self, driver, executorId, slaveId, message):
         logging.info("Received framework message")

Now let’s « use » this framework with a « main » program. Most of this is available in documentation or in Python examples, so I will go quickly, setting only « minimal » information (skipping credentials …)

if __name__ == "__main__":
    executor = mesos_pb2.ExecutorInfo()
    executor.executor_id.value = "mydocker"
    executor.name = "My docker example executor"

    framework = mesos_pb2.FrameworkInfo()
    framework.user = "" # Have Mesos fill in the current user.
    framework.name = "MyMesosDockerExample"

    implicitAcknowledgements = 1

    framework.principal = "docker-mesos-example-framework"
    mesosScheduler = MyMesosScheduler(implicitAcknowledgements, executor)
    driver = mesos.native.MesosSchedulerDriver(
         mesosScheduler,
         framework,
         '127.0.0.1:5050') # I suppose here that mesos master url is local

    driver.run() # or driver.start(). run will block 'till driver is stopped, while start will continue. Blocking is not mandatory if you manage the wait yourself as update messages will run in a separate thread.

Now framework can be executed, and you will see in logs the received offers.

Now let’s create a Docker task. This task would be used in received offers. Let’s create a new method in MyMesosScheduler

def new_docker_task(self, offer, id):
    '''
    Creates a task for mesos

    :param offer: mesos offer
    :type offer: Offer
    :param id: Id of the task (unique)
    :type id: str
    '''
    task = mesos_pb2.TaskInfo()
    # We want of container of type Docker
    container = mesos_pb2.ContainerInfo()
    container.type = 1 # mesos_pb2.ContainerInfo.Type.DOCKER

    # Let's create a volume
    # container.volumes, in mesos.proto, is a repeated element
    # For repeated elements, we use the method "add()" that returns an object that can be updated
    volume = container.volumes.add()
    volume.container_path = "/mnt/mesosexample" # Path in container
    volume.host_path = "/tmp/mesosexample" # Path on host
    volume.mode = 1 # mesos_pb2.Volume.Mode.RW
    #volume.mode = 2 # mesos_pb2.Volume.Mode.RO

    # Define the command line to execute in the Docker container
    command = mesos_pb2.CommandInfo()
    command.value = "sleep 30"
    task.command.MergeFrom(command) # The MergeFrom allows to create an object then to use this object in an other one. Here we use the new CommandInfo object and specify to use this instance for the parameter task.command.

    task.task_id.value = id
    task.slave_id.value = offer.slave_id.value
    task.name = "my sample task"

    # CPUs are repeated elements too
    cpus = task.resources.add()
    cpus.name = "cpus"
    cpus.type = mesos_pb2.Value.SCALAR
    cpus.scalar.value = 1]

    # Memory are repeated elements too
    mem = task.resources.add()
    mem.name = "mem"
    mem.type = mesos_pb2.Value.SCALAR
    mem.scalar.value = 128

    # Let's focus on the Docker object now
    docker = mesos_pb2.ContainerInfo.DockerInfo()
    docker.image = "centos7"
    docker.network = 2 # mesos_pb2.ContainerInfo.DockerInfo.Network.BRIDGE
    docker.force_pull_image = True

    # We could (optinally of course) use some ports too available in offer
    ## First we need to tell mesos we take some ports from the offer, like any other resource
    #mesos_ports = task.resources.add()
    #mesos_ports.name = "ports"
    #mesos_ports.type = mesos_pb2.Value.RANGES
    #port_range = mesos_ports.ranges.range.add()
    #available_port = get_some_available_port_in_port_offer_resources()
    #port_range.begin = available_port
    #port_range.end = available_port
    ## We also need to tell docker to do mapping with this port
    #docker_port = docker.port_mappings.add()
    #docker_port.host_port = available_port
    #docker_port.container_port = available_port

    # Set docker info in container.docker
    container.docker.MergeFrom(docker)
    # Set docker container in task.container
    task.container.MergeFrom(container)

    # Return the object
    return task

With this code we generate a sample docker container to be used in mesos offers. Of course, mesos need to be activated on slaves ( mesos-slave.sh –master=127.0.0.1:5050 –containerizers=docker,mesos).

In def resourceOffers(self, driver, offers), we could now use the task.

# Here I simply execute one sample task on the first offer, without any check
id = 0
for offer in offers:
    offer_tasks = []
    task = self.new_docker_task(offer, id)
    offer_tasks.append(task)
    id += 1
    driver.launchTasks(offer.id, offer_tasks)
    break

Of course, we would need to check cpu/ram/… task requirements versus the offer. But again we can find examples for this and is not in the scope of the article.

Authentication and message encryption

While searching recently ways to authenticate access to an API I develop, I found JSON Web Token (JWT). It is currently a draft at ieft (https://tools.ietf.org/html/draft-ietf-oauth-json-web-token-32) but already has multiple language bindings (http://jwt.io/).

With JSON Web Token, we have a way to manage web authentication without cookies and exchange data in a secure web. It provides encode/decode functions, based on various methods.

In a web workflow, you can for example authenticate a user and extract user info from database. Then you encode your JSON user object with jwt to get a token using a passphrase. Token is encrypted and contains among other things your object, with an optional expiration time encoded in the object itself. You then return the object to the requested and he can use it in the Authorization HTTP header of the next requests.
Your application only has to decode the jwt object and checks it is still valid (automatic). If it is ok , the decoded token tells you that user is « authenticated » and you get the user data directly (no need to request again against the database, in the case of course you do not expect dynamic data here).

No cookie stored => no user info stored on the computer, even only during the session.

There is a nice explanation with code example (nodejs + angular): https://auth0.com/blog/2014/01/07/angularjs-authentication-with-cookies-vs-token/

In python, this is as simple as:

import jwt
token = jwt.encode({'some': 'payload'}, 'secret')
data = jwt.decode(token, 'secret')

Beyond this, you can also use JWT to exchange encrypted data. The encode/decode can use more sophisticated secret than passphrase. Passphrase is nice if you need to decrypt the data in a single place. But if we want to send data to a 3rt party and expect him to decode it, we cannot share the same secret. With JWT we can use public/private keys through cryptography methods to do so.

On one side, we encode message using our private key, and send the message. On other side, the other party decode the message using the public key we gave him. The message is securely transfered: without the public key, an attacker cannot decode your message.
This means it works for web, but also for client/server communication. It is really easy to use and secure. All you need is JSON support and some cryptography modules (depending on language bindings).

As I said, it is still a draft, so it may not match yet for critical/heavy/secure production systems…. but for most of developments/applications it really worth to have a look!

Multi platform mobile developement with Cordova

Cordova (ou may also know PhoneGap, a distrib of Cordova), is really suited for mobile developments for quick prototypes or people who do not have time (as myself) to spend some development, training and maintenance time on each platform.

With Cordova, you build mobile applications for multiple targets (Android, iOS, Windows Phone, …) with a single code, all based on HTML/JS/CSS. That’s a must for web developpers as you develop apps using the same language and tools you use for your web application. Cordova simply adds some plugins to access native features (contacts, camera, file system …).

You first create your project, add the required plugins (camera for example) and the target platforms you expect.
Then you develop like a classic web application with the libraries and want etc…

When you build your application for a target, Cordova embeds your application with an engine just like in your browser (I think it is WebKit), and needed native libraries. You have your package ready for the market…

You still have to run/debug your application on specific computers (Mac for iOS for example, needs XCode etc…), but for development, it does not matter.

Cordova application won’t be as nicely integrated as native apps, and have a few integration limits, but for many application it will do nicely the job.

A basic application is really quick to develop, test and package, and you don’t need to learn each mobile system environment and language. So it will not fit all needs nor maybe « professional » applications. But if you have limited time or resources, this is the perfect solution.

OpenStack for testing

I recently wanted to test OpenStack Swift to develop an application that stores its objects in an S3 like stack.
Installing OpenStack is quite a burden (lots of stacks, configuration etc…) and definitly too much for a basic test environment.

I found however http://devstack.org/ that installs a complete OpenStack setup in a all-in-one configuration (everything installed on the same machine).

It took only a few minutes to get everything completed and available for testing. There was only a trick regarding Swift that is not installed by default, but I simply had to modify the required localrc file to add it (enable_service swift) and rerun the install.

This is not for production, it uses QEMU for virtualization and everything is on a single machine and one network.

With this you can have on your computer (or a VM) a complete OpenStack cloud environment with APIs, KeyStone etc…, locally, to test your app. No need to pay for an external service, you can even work with no network.

So, thanks for the DevStack team to ease developper work! 🙂

Microsoft helps open source…

I recently expected to add an Azure (Microsoft cloud platform) provider to the bootstrap-vz project for Debian. After a few mails, I have been kindly granted by Microsoft a free MSDN grant (time limited).
This MSDN account, besides accessing tools etc… gives me a monthly credit to the Azure platform that will help me code and test my Azure provider. So thank you Microsoft for your support to Debian !

I was at the TechDays 2014, and I’ve seen an interesting talk on BabylonJS (http://www.babylonjs.com/), a 3D engine based on webgl and Javascript. This was a quick overview of course, but it looks really promising for an easy 3D integration in web pages (but don’t fool yourself, for a real 3D game, you will need designers!). This is a high level programming library, removing the complex webgl coding tasks. It is integrated with an other lib for physics and also have an export plugin for Blender.
The project is open source, and made by Microsoft guys, not as a Microsoft product but as free-time project.
By the way, the talk was leaded by David CATUHE and David Rousset who made a really nice and pleasant talk. David C. is a real « show man » and funny guy, so thank you guys for your job and this funny time at the event !

This post is not a Microsoft add ;-), I am a Linux guy working on Linux on a daily basis, but I think that this kind of support to Debian, and open source more globally, worths a « thank you »!