Quick Start

In this quick start, we will walk through the initial steps of exploring data using Honeycomb. Once you have finished the tutorial you will be better-equipped to begin sending and visualizing your own data with our integrations and SDKs.

We will assume the role of a DevOps professional who is debugging reports of application slowness from users. To this end, we will work with the Slow App dataset, which your account is provided access to by default, to pinpoint the source of the issue. The Slow App dataset contains events which describe various aspects of HTTP requests from users that our application has served.

This walkthrough is also available to watch in video form if you prefer.

Summary

In this tutorial, we will:

  1. Construct an initial query to start analyzing our data
  2. Iterate on our initial query to dig deeper into likely problem areas
  3. Learn about how markers annotate significant changes such as code deploys
  4. Break down and order our events to quickly identify specific problem areas

Direct instructions have been denoted in purple boxes like so:

This is a direct instruction.

The estimated time to completion, including reading, is 25 minutes.

You’ll need a Honeycomb account in order to follow along with this tutorial. If you don’t have one yet, please sign up for one. By default, users have access to the Slow App dataset. If you landed here from the Honeycomb UI, you should be presently in the Query Builder for that dataset and are ready to begin.

The Setup

As mentioned above, in this scenario we are playing the role of someone in charge of developing and maintaining a web application. We have received reports of application slowness from quite a few of our users, so we are currently investigating. We assume that the integration to send request data to Honeycomb has already been set up, and we are exploring the data our app has sent (and is continuing to send) to Honeycomb. At the end of the tutorial, we will link to some references to help you send this data yourself, but for now we will focus on querying existing data.

We know that the events we have sent to Honeycomb have several properties which may be relevant to debugging. Here is a short description of some of the fields we will work with here.

Field Name Description Example Value
endpoint The URL routing pattern matching the request. /products/:product_name
response_time_ms The total amount of time in milliseconds spent serving the request. 200
fraud_latency_ms The amount of time in milliseconds the request spent calling a fraud detection service on which our application depends. 50
mysql_latency_ms The amount of time in milliseconds the request spent on database queries. 15

Make sure you have the Slow App Dataset open in another tab or window to follow along.

Note: For the tutorial, the time range we are querying over (which is usually configurable, defaulting to the last few hours) will be locked to a pre-defined range to ensure consistency.

Our First Query

Honeycomb is built to be fast at calculations such as averages, counts, and percentiles. This encourages a workflow which is exploratory and fluid even with large amounts of information.

The slowness reported to us could be caused by a lot of factors, so let’s start asking some questions, and then answering them with queries that we have built.

We’ll start with:

“How long are requests taking in general?”

To visualize this question, we will use the AVG() (average) function from the CALCULATE box.

Note: If you are concerned about using averages vs. percentiles, we also support P95(), P99() etc. style functions – but will use averages here for simplicity’s sake.

Select AVG

Click the CALCULATE box in the UI and select AVG(). We are then prompted to select which numeric metric from the data we would like to visualize an average for. In this case we are interested in response_time_ms, the total time to serve a request, so click that and then click on “Run” on the right-hand side of the Query Builder.

Your first query

There seems to be a noteworthy increase in volatility, and a gradual increase in response time, in the latter portion of the graph. This seems to be when our users likely started encountering slowness in the website. Since we can see the slowness is measured on our server side as well, we know that the issue is likely within our app.

Note that you can click View Raw Data, on the top right hand side of the visualization, to see the raw data which has been passed to Honeycomb.

View Raw Data

This can be useful for exploring data in an unfamiliar dataset, or for eyeballing patterns that may be easier to spot in tabular form. You can get back to the graph view by clicking View Graph in the same location.

Click View Raw Data and have a look around at the raw data. When you’re finished, click View Graph in the same location to return to the graph.

View Raw Data allows you to inspect raw data

More Calculations

Raw data is good to get a feel for things, but there is a lot of it – too much for humans to take in on their own.

By continuing to use Honeycomb’s aggregation and visualization facilities, we can get answers much faster.

Let’s start with asking about one of the usual suspects:

“Did a recent deploy break the app?”

We can see that there is a marker on the graph indicating when our most recent deploy happened. But the increase in latency started happening quite a while before our recent deploy, so this issue seems unrelated to the newly deployed code.

Markers help annotate Honeycomb graphs

“OK, but is it our app’s fault, or is there an issue with a service we are calling out to?”

We know that our app calls out to a fraud detection service, provided by a 3rd party. The time required to make these requests is tracked in the field fraud_latency_ms mentioned above. We can take a look at that side-by-side with our existing average of total request latency.

Click on the CALCULATE box to switch to edit mode again, so we can add another AVG. This time, select fraud_latency_ms. Click “Run” again.

Fraud duration average latency included

Hm, the time to call the fraud service looks uncorrelated. It hovers around 20-50 milliseconds, so that doesn’t explain our issue. We can rule that out.

Click the CALCULATE box and click the “X” on the left hand side of AVG(fraud_latency_ms) in the Builder to stop including it in our queries.

“What about the database, then?”

We know that we have an additional parameter of mysql_latency_ms which represents the latency talking to the database. So let’s take a look at the AVG for that as well.

Click the CALCULATE text box to add another AVG, this time for mysql_latency_ms.

MySQL average latency included

Interesting! The increase in total request latency seems directly correlated to an increase in MySQL latency.

Breaking Down and Ordering

Our issue seems to be related to database latency, but what is causing the database to respond sluggishly?

We can use the BREAK DOWN and ORDER boxes in the Query Builder to continue to suss out the issue.

Break Down and Order

We noted above that there is a field, endpoint, which describes the URL routing pattern. Since we have this field available, we can use a BREAK DOWN of endpoint to see these averages calculated per endpoint pattern - i.e., we can visualize the average request latency for every reported endpoint pattern all at once.

Additionally, by setting ORDER to order by round trip time descending, we will also receive a tabular view (located below the chart) of which endpoint patterns have the highest latency.

This data will be visualized as many distinct colored lines on the chart. If any route is contributing to the problem more than the others in our app, we should be able quickly identify it.

Click the BREAK DOWN box in the Query Builder, and select endpoint. Then, click the ORDER box and select AVG(response_time_ms) desc. Then, click “Run”.

Visualizing high cardinality data

If we scroll down to the tabular view ordering by response_time_ms descending, we can see that the slowest endpoint routes begin with /products/:product_name. We can also mouse over the rows in the table to see the associated lines highlighted in the visualization.

Mouse over the rows in the table below the graphs to highlight which lines they correspond to in the visualization.

Hover over rows to see them highlighted in the visualization

As we can see, Honeycomb allows us to explore fields which have a lot of unique possible values, and quickly spot patterns in the data. Columns with a lot of possible values are said to have high cardinality and Honeycomb is well equipped to explore this type of data.

We seem to have significantly narrowed down the source of our slowness. It is related to endpoints with :product_name in them, and sluggish database lookups. Perhaps we need to add an index to the table which is used to store product information, as :product_name is likely translated into the WHERE section of a SQL query. Once we have rolled out this solution, we can continue to use Honeycomb to verify if the issue is fixed, or if further debugging is needed.

Play Around

If you want to, feel free to continue playing with the Slow App dataset (it should always be accessible to your account via the direct link(s) in this tutorial). For instance, try adding P95(response_time_ms) and/or P99(response_time_ms) to your CALCULATE to see the high percentiles visualized alongside the averages we have been using. Or, try BREAK DOWN by other fields like hostname. Have fun, you don’t need worry about breaking anything in this dataset. It’s meant for experimentation and learning.

Conclusion

If you encounter any issues or have questions for us, please send us a message in the Intercom chat box on the right hand side of the screen, or send an e-mail to support@honeycomb.io. We’d love to hear from you! If you are struggling or confused, we are happy to help.

Otherwise, proceed to sending your first events or rigging up our various SDKs and integrations for your technology of choice.