Modern diagnostics tools for C++ applications


Profiling, debugging, and investigating C++ applications doesn't have to be insanely hard. If you have been a C++ developer for many years, you might be used to memory tracing tools like Valgrind -- with a potential 10x overhead), instrumentation-based profilers -- which require recompilation and re-deployment, and invasive debuggers -- which can't be readily deployed and used in production environments.

The good news is that modern operating systems ship with the frameworks and tools that make low-overhead, non-invasive, production-ready diagnostics a reality. In my NDC Oslo talk, I'll be taking a closer look at diagnosing memory leaks, high CPU utilization, blocked threads, heavy disk accesses, and many other production issues with freely available, low-overhead tools. Both Windows and Linux have them, and in this post we're going to take a closer look on the Linux side. (If you can't wait, the Windows rough-equivalent for some of the Linux tools discussed below is Event Tracing for Windows.)


Let's start with a high-CPU scenario. You have a C++ application running on Linux and consuming close to 100% CPU (one core, or multiple cores). Perhaps the issue can't even be reproduced in the development environment. You can use the perf multi-tool, which is developed as part of the Linux kernel tree, to perform efficient CPU sampling with a controllable overhead. What perf will do is capture a call stack of your application N times per second, giving you a clear picture of where you are spending lots of CPU time. Here is an example of how'd you run perf and profile a troublesome process:

# perf record -g -F 97 -p $(pidof myapp)

In the preceding command line, -g instructs perf to record call stacks, -F 97 means "take 97 samples per second", and the -p switch accepts a process id to profile. This can be done in production, without restarting the profiled process, and with a controllable overhead: if 97 samples per second slow you down too much, you can reduce the frequency.

The result of a perf profiling session is a file, which you then analyze using perf record or visualize using tools such as flame graphs. Here is an example flame graph of a MySQL process, captured by Brendan Gregg, the inventor of this stack trace visualization technique:

flame graph

Note that the flame graph is interactive -- you can click to zoom, search, and hover over frames to see what's going on.

Memory Leak

Let's take a look at another example. Suppose you have a memory leak in a C++ application running in production. You can't run the application under a tool like Valgrind because of its sheer overhead. But what you can do is a use a tool with a similar approach, which instruments memory allocation and free requests, and produces a summary of which call stacks in your application were responsible for a lot of memory that wasn't reclaimed. I've written one such tool, called memleak, which is part of the BCC project. This is a tool based on the BPF kernel technology, which makes it possible to build low-overhead tracing tools that are designed for production use.

Here's what a memleak run can look like. You run it alongside the leaking application, and instruct it to collect leaking call stacks:

# memleak -p $(pidof myapp) | stdbuf -oL c++filt

At configurable intervals, memleak would dump out the call stacks responsible for un-reclaimed memory. We pipe the output through c++filt, which performs name demangling so we see the original C++ method names instead of the mangled compiler-generated names. Here's a sample call stack from memleak (heavily trimmed for brevity):

[20:23:57] Top 10 stacks with outstanding allocations:
    12582912 bytes in 3 allocations from stack
        operator new(unsigned long)+0x18 []
        std::back_insert_iterator<std::vector<…>>(…)+0x99 [wordcount]
        word_counter::word_count[abi:cxx11]()+0xe7 [wordcount]
        main+0x97 [wordcount]
        __libc_start_main+0xf1 []

The leaking stack is clearly demarcated, and it becomes apparent where the leak is coming from: word_counter::word_count leaks memory by inserting it into a vector.


There are many more tools like the ones above, which can be safely used in a production environment to diagnose, profile, and investigate C++ applications. These tools work on Linux and Windows, and as a C++ developer it is your responsibility to know about them and apply them as necessary. Thanks for reading, and I look forward to seeing you at my NDC Oslo talk!

You might also like my blog, where I post extensive articles, and my Twitter, where I post short snippets and tools in progress.