Golang Observability: Using the New Pprof Web UI to Debug Memory Usage

Intro

Go, for good reason, has become an incredibly popular language for everything from web applications to DevOps tools. We lean on Golang heavily here at Honeycomb, and naturally we are observability nerds, so I got ridiculously excited to use the new pprof web UI to gain more insight into the performance of our Golang programs, both internal and open source.

Let’s talk about using the new web UI to solve an issue in one of our Go programs, honeyelb.

Using New pprof UI

Recently we announced the new Honeycomb AWS Elastic Load Balancer integration. This is a tool to process the otherwise opaque access logs Amazon gives users to understand the request patterns that are hitting their load balancers. You can now use our open source tool to send this information to Honeycomb for later querying, allowing you to answer questions like “Which servers are performing poorly?” or “When are my applications returning a lot of errors?”.

Lucky for us, when we announced the tool our lovely customers jumped in and started trying it out right away. In the process of doing so, one identified a memory usage issue: the Go runtime was running out of memory and killing the program!

Rather than guess and guess until I identified the source of the issue, I opted to use the new pprof web UI mentioned above to try and track down hotspots. The command line interface to pprof can be a bit unfriendly, but the team is making great strides towards better usability. Let’s take a look at using this to solve an issue.

Getting Started

To install the pprof tool with support for the web UI:

$ go get github.com/google/pprof

This provides you with a pprof binary you can use the -http flag with to open an interactive program profile in your web browser. More on that in a second. You may also need to install graphviz, which would be something like sudo apt-get install -y graphviz on Debian-based Linuxes or brew install graphviz on OSX.

After installing pprof, I needed to actually get a profile to work with. Luckily, the Go runtime makes this fairly straightforward with some of the methods built in to the standard library. The honeyelb program is long-running (intended to eventually be used as a set-and-forget daemon), so I added this code into the init block of the main source file:

go func() {
	time.Sleep(120 * time.Second)
	memprof, err := os.Create("mem.pprof")
	if err != nil {
		logrus.Fatal(err)
	}
	pprof.WriteHeapProfile(memprof)
	memprof.Close()
}()

This code spins off a background goroutine (go func()…) which sleeps for two minutes (to allow the program to begin running and accumulate some memory usage for us to later take a look at), then writes a profile of that memory to a file mem.pprof in the working directory where the program is invoked.

Due to the solo colon, pprof should open a web browser automatically pointing towards an arbitrary port on localhost (for instance, it chose :38687 on my machine – you can also specify a port if desired) running the web UI.

Once booted up I saw something like this:

Generally speaking, the larger a box is, the higher the memory usage of that section of the program is. You can click around in this graph, zoom in and out, etc.

You can see that one box in the above jumps out as larger than the others: it references the NewEvent method in libhoney-go, used to create a new “event” (eventually rendered to JSON blob) to send to Honeycomb. OK, interesting, but how about a numerical / tabular view? We can click View > Top in the upper left hand corner to see this.

This shows us a list of the most memory-intensive functions in the program, sorted by flat usage.

Looking at the numbers in the Flat % column, we can see that runtime.hashGrow and runtime.mapassign (internal Golang methods related to map usage) are clearly the most memory intensive.

So, we are clearly eating up memory in maps, and libhoney.NewEvent seems a likely suspect for why. How can we verify this? One way is to highlight the greedy methods like shown above, and click Refine > Focus to narrow down our list of functions to the ones in the code path for this map usage (i.e., for the highlighted methods).

Once I dug down into this level of detail, I did indeed confirm that libhoney.NewEvent was a noteworthy source of increased memory usage. Knowing that this was the greedy method, I started to reason about which code paths would lead up to it. In doing so I realized that we were spinning up an excessive number of Publisher structs (used to “publish” the events parsed from the ELB access logs to Honeycomb) when we only needed one. Therefore I made this change to only spin up one. I hadn’t noticed the memory inflation in my local testing, but putting the program through its paces made the issue clear.

A new pass through the profile reveals that memory usage is now saner, and runtime.makechan and bytes.makeSlice are our new foes in that regard. However, pprof can help us understand that this seems related to JSON marshaling and is something that’s likely a bigger task to fix (if it becomes a problem that needs fixing at all).

Conclusion

Tools like this new pprof web UI really highlight the importance of an “observability mindset”. It’s becoming increasingly difficult to reason about what’s going on inside our systems, and tools like Honeycomb are emerging to help combat the fog of war. By reducing our unknown unknowns to knowns (or at least known unknowns), we can be prepared to fix issues proactively rather than when things are on fire. And we can, of course, be a lot more prepared for unforeseeable problems when they inevitably come up.

Many thanks to rakyll and the rest of the Golang team for their great work! I’m looking forward to seeing more in this area such as a built-in flame graph implementation.

Until next time, observe all the things! And please do sign up to try out Honeycomb if you haven’t already. :)

Have thoughts on this post? Let us know via Twitter @honeycombio.