Triggers let you receive notifications when your data in Honeycomb crosses the thresholds you configure. The graph on which to alert is as flexible as a Honeycomb query, which helps reduce false positives due to known errors.
When a trigger fires, you’ll be notified via the configured method. Currently supported are PagerDuty, Slack, and Email. The notification includes a link back to the graph showing you the current status, providing a jumping off point for further investigation.
Triggers run every 15 minutes and examine the past 15 minutes of activity to compute the values against which the thresholds are measured.
For this example, we want to know whenever the 95th percentile of our API server’s requests exceeds 30ms, but we want to exclude the
/poll endpoint because it has long-held connections, which pollute the data by being artificially high.
Start the trigger creation process by constructing the query on which you wish to configure a trigger.
From the Gear menu, select “Make Trigger”
Fill in the details for the trigger. Both the Name and Description will be included in notifications about the trigger. Make sure the name describes clearly what has happened, while the description should indicate next steps or include links back to documentation.
The threshold indicates when the trigger will send out a notification. Triggers examine 15 minutes of activity and use that to compare against the threshold. For
COUNTs make sure that your threshold is appropriate for a 15 minute window. The sample graph shown will give you 15 minute granularity over a 4 hour window, which will help you choose an appropriate threshold.
Recipients are the targets the trigger will notify when the measured value crosses the configured threshold. Recipients are configured on a per-trigger basis but exist on a team basis—any recipients you configure for a trigger will be available for all other triggers configured in datasets under the same team.
Choose from an existing recipient or create a new one. Honeycomb will remember recipients entered for any trigger within your Team. You only have to enter your PagerDuty API key or Slack webhook URL once—from then on they’ll be available to choose when building a new Trigger.
Three types of recipients exist: Pagerduty, Slack, and email. This shows the trigger configured to send a Slack PM to
@ben and an email to
Get your webhook URL by going to Slack’s Incoming Webhooks documentation and clicking on the cue to set up an “incoming webhook integration”. This will bring you to a page that lets you configure new integrations for your Slack organization.
Choose a default channel, though you will be able to override this channel when configuring a recipient for each trigger. When you submit, you’ll be handed a URL that looks something like
After you’ve configured the Slack recipient with your webhook URL, you can send alerts to a different channel by choosing
Add new channel from the
+ Slack button, then specifying it as
#channel or an individual via private message (
PagerDuty’s API Integration docs describe how to create a generic API integration to PagerDuty. Following those steps will give you an API key that you’ll enter in the Triggers form.
When you save the trigger, it is immediately active and will trigger at the next 15-minute interval after the hour (
You can see a list of configured triggers by clicking on the link in the top nav bar when looking at your dataset.
You’ll see a full list of the triggers active for the dataset. You’ll be able to click through and edit each trigger from this list.
To delete a trigger, scroll to the bottom of the edit page.
Queries done for triggers run your selected calculation over a 15 minute period.
COUNTs, for example, will be the total count for that 15 minute period, not a per-second count. Averages and percentiles are likewise covering the entire period—so to detect spikes, it is better to use
MAX instead of
AVG over a period. Another alternative is to use a
COUNT with a filter restricting the set to the threshold you’re interested in—for example, you could count the number of events over 100ms and use that with a threshold instead of asking for the average to exceed 100ms.
>100ms) with a
COUNT. Your result will be the number of events that exceed your threshold.
P99calculations. These will be more representative of the majority of traffic than
AVG, which can be polluted by large outliers.
== 500, use several filters to look for events that don’t have status codes 200, 301, 302, 404, etc.