The recommendations presented in this document are exactly that: They are not required for Honeycomb use, but do make for better Honeycomb use.
If you plan to collect data from multiple environments (like prod, dev, staging, and so on), we recommend you create a Dataset for each one and name your Datasets accordingly:
staging.queries, for example. This removes the chance of your mixing prod and dev data without realizing it, and creating bad data as a result.
In general, all events in the same Dataset should be considered equivalent either in their frequency and scope, or in the system layer in which they occur. You should separate events into different Datasets when you cannot establish equivalency between them.
You may, for example, find it useful to capture API and batch-processing events in the same Dataset if they share some request_id field. By contrast, events from two different environments with only one differentiator (like the value of some “environment” column) might appear highly similar and, as a result, be more easily confused. Relying on consistent application of some “environment” filter is risky and can create misleading results.
Here is another example, from one of our lovely customers. They’ve put API and web requests in the same Dataset, because—for them—an API request is really one type of web request that has more fields. Our customer adds the extra API fields (even though the web requests don’t have them) because Honeycomb supports sparse data and provides filters that enable our customer to look at web or API requests, and so on. Our customer does not want to filter out web requests, however, when looking at something like overall traffic.
For this same company, SQL queries reside in a different Dataset because SQL queries are not in any way equivalent to API data: There can be multiple (or no) SQL queries for a single API query, for instance.
Large Datasets can quickly come to feel unwieldy. Once you have a Dataset with more than 40 columns or so, use naming conventions to categorize your columns:
http.url, for example, or
server.buildnumber, and so on.
This practice makes columns easier to find in the Honeycomb UI. It also makes it easier for everyone on your team to have a shared understanding of a Dataset, sooner. If a column is labeled “status code,” for instance, you may know what that means, but the next person may not.
While we aim to be as flexible as possible, investing a bit more care into event construction will allow you to get the most out of Honeycomb queries:
You can configure Honeycomb to unpack nested JSON objects. See “Data Expectations” for sending JSON for more details.
While it’s nice to believe that every event is precious, the reality of monitoring high volume production infrastructure is that there are some attributes to events that make them more interesting than the rest. Failures are often more interesting than successes! Rare events are more interesting than common events! Capturing some traffic from all customers can be better than capturing all traffic from some customers.
Honeycomb is sample-native - every event has a sample rate riding along with it, meaning we can inflate that sampled event correctly in our backend. This gives you the flexibility to use sampling algorithms as simple as “one in every 5 events” or as complex as dynamically adjusting your sample rate on a per-customer, per-response-code, or per-whatever-is-important basis. (Or some combination of all of those!)
Understanding the events you’re sending in to Honeycomb allows you to construct sampling methodologies to safely drop huge numbers of events - and this is how you control costs while still getting the visibility into our high cardinality world!
To learn more about the various ways sampling can be introduced intelligently into your system, check out our Sampling Guide.
Here at Honeycomb, we like to say that the greatest single cause of chaos in systems is humanity. And in the case of observing systems, this generally manifests in two ways: aberrant customer behavior or changes in the code.
One of the easiest ways to highlight chaos caused by the latter is by using markers: dataset-wide points in time that indicate that an interesting thing has happened. Markers can be created simply via API and can be managed via our web app’s UI. Once you’ve created a marker at a specific time, any queries run on that dataset including that time period will display a dotted line (mousing over the hexagonal anchor will display the marker message):