OpenTracing with Platform Instrumentation for Enhanced Visibility

Distributed tracing is an extremely important component when using micro-services based architectures. Moving to micro-services based architecture brings unique challenges w.r.to visibility across services and this is where distributed tracing is very helpful. The primary focus for distributed tracing has been application specific instrumentation – like latency of the function calls etc.

Our focus has been to explore different ways to complement application instrumentation with platform (hardware and operating system) specific data. Examples of platform specific instrumentation data are processor execution efficiency (cycles per instruction), memory bus bandwidth, I/O bus bandwidth and so on.

There are scenarios where a combination of application and platform specific instrumentation data is of immense help to definitively identify issues. For example, in the case of memory-intensive workload, there could be a scenario where memory capacity is available, however, memory bus bandwidth might be starved, thereby negatively affecting the performance. Having this data available in the context of application will be immensely helpful in identifying the root cause of the performance issue.

Our initial work was primarily around making the platform instrumentation data available to the orchestration engines for optimal placement decisions. You can read more about our previous work here – https://goo.gl/ZU9d9V

Recently our focus has been to add support for platform instrumentation data in distributed tracers.

Based on few internal prototypes, we approached the OpenTracing community with our idea and today we have the initial support of platform metrics in OpenTracing and Zipkin backend.
Kudos to Hemant Shaw, who worked with the community to get this done. And of-course a big thanks to the maintainers, namely Yuri ShkuroBen Sigelman and Bas Van Beek, for their help in making this happen.

The following diagram gives an overview of the components involved when using OpenTracing with Zipkin backend.

opentracing-perfevents

We make use of Linux ‘perf’ capability to retrieve platform instrumentation data. If you are new to ‘perf’ I would suggest you read the following article – http://www.brendangregg.com/perf.html

At a minimum, these steps are required to enable capturing of platform instrumentation data in your application with OpenTracing. Currently only ‘golang’ applications are supported. Pull requests are welcome to enable support for other languages.

Step-1
Ensure ‘perf’ is setup on your Linux host

Step-2
In your application code, initialise OpenTracing backend with perfevents observer

obs := perfevents.NewObserver()
tracer := zipkin.NewTracer(… , zipkin.WithObserver(obs), )

Specify the platform instrumentation data you need to collect as part of the trace span.

tracer.StartSpan(“new”, opentracing.Tag{“perfevents”, “cpu-cycles, instructions”})

This is all that is required to start collecting platform instrumentation data as part of your traces.

Note that the current version of the code only supports the following generic hardware events: cpu-cycles, instructions, cache-references, cache-misses, branch-instructions, branch-misses, bus-cycles
Going forward we’ll add architecture specific hardware events.

If you are interested in the initial discussions, I would suggest you take a look at the following github issue – https://github.com/opentracing/specification/issues/36

Let’s see all of these in action in the following short video.


The application trace with platform data looks like the following in Zipkin:

zipkin_perfevents

My environment consists of a heterogeneous Docker swarm cluster consisting of Intel (x86_64) and Power (ppc64le) nodes. I have taken the sock-shop microservices demo application and modified the ‘catalogue’ service to include collection of platform instrumentation data.

The modified code is available from my github link.

# git clone https://github.com/bpradipt/catalogue.git
# cd catalogue
# make image-ppc64le

This will create the catalogue Docker image for Power (ppc64le).

If you would like to run this on Intel (x86_64) system, then after cloning the code, build the Docker image using the following command:

# make image-amd64

Please give it a try in your applications and let us know you what you think. Happy tracing :-)

Pradipta Kumar Banerjee

I'm a Cloud and Linux/ OpenSource enthusiast, with 16 years of industry experience at IBM. You can find more details about me here - Linkedin

You may also like...