Connecting Logstash to Honeycomb

The ELK stack is perhaps the most common open source tool to use to collect and examine log data in a web-UI way. It has gotten many people off the ground and beginning to learn the wonders of structured logging. Honeycomb relies on the same kind of quality data that powers ELK, so it’s a natural progression to also send the same kind of data to Honeycomb in parallel.

Thanks to Logstash’s flexible plugin architecture, it is easy to send a copy of all the traffic that Logstash is processing to Honeycomb in addition to (or instead of) Elastic Search or other output plugins.

The first step in configuring Logstash to make events that are friendly to Honeycomb is to pull out all the good data into top-level fields. If you’ve been using ELK for a while, you may already be doing this. Skip ahead to the Forking section below.

Getting Your Data in the Right Format

There are a number of filters that can massage your data to get the content in to top-level keys based on the original source of your data:

Let’s say that you’re sending haproxy logs (in HTTP mode) to Logstash, a log line describing an individual request would look something like this (borrowed from the haproxy config manual):

Feb  6 12:14:14 localhost \
          haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in \
          static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} \
          {} "GET /index.html HTTP/1.1"

Logstash will have this line in the message field, so in the filter section of the config, we’ll tell it to use grok to parse the message and make all the content available in top-level fields. Finally, since we don’t need it anymore, we’ll also ask grok to remove the message field. The next filter takes the numeric fields extracted by haproxy and turns them into integers so that Logstash and Honeycomb can do math on them.

filter {
  grok {
    match => ["message", "%{HAPROXYHTTP}"]
    remove_field => ["message"]
  }
  mutate {
    convert => {
      "actconn" => "integer"
      "backend_queue" => "integer"
      "beconn" => "integer"
      "bytes_read" => "integer"
      "feconn" => "integer"
      "http_status_code" => "integer"
      "retries" => "integer"
      "srv_queue" => "integer"
      "srvconn" => "integer"
      "time_backend_connect" => "integer"
      "time_backend_response" => "integer"
      "time_duration" => "integer"
      "time_queue" => "integer"
      "time_request" => "integer"
    }
  }
}

Sampling traffic

High volume sites will want to send only a fraction of all traffic to Honeycomb. The drop filter can drop a portion of your traffic. When dropping traffic, you should also set the sample rate header in the HTTP output plugin in order to tell Honeycomb how much was dropped. The drop filter takes a percentage of traffic to drop and the Honeycomb sample rate header takes the ratio of events dropped, so the relationship between the two is as follows: drop_percentage = 100 - (100/sample_rate). For example, a drop percentage of 50% corresponds to a sample rate of 2. A drop percentage of 85.715% corresponds to a sample rate of 7.

filter {
  drop {
    # keep 1/7 events, percentage is 100-(100/7)
    percentage => 85.715
  }
}

Then in the output section below, you would set the X-Honeycomb-Samplerate header to 7.

Dynamic sampling

For more flexibility, this example uses different sample rates depending on the type of traffic flowing through Logstash. When a request has the status code of 200, that means everything worked correctly. This traffic is not as interesting as when things are going wrong. In this example, we will drop

filter {
  if [http_status_code] {
    if [http_status_code] >= 200 and [http_status_code] < 300 {
      drop { percentage => 90 }
      mutate { add_field => { "@samplerate" => "10" } }
    } else if [http_status_code] >= 500 {
      mutate { add_field => { "@samplerate" => "1" } }
    } else {
      drop { percentage => 50 }
      mutate { add_field => { "@samplerate" => "2" } }
    }
  }
}
output {
  http {
    ...
    headers => {
      "X-Honeycomb-Samplerate" => "%{@samplerate}"
    }
  }
}

Note: The filter here is doing numerical comparisons on the http_status_code field. Make sure it’s an integer type!

If you want to do dynamic sampling on data you’ll send to Honeycomb but preserve the full stream to send to a different output target, check out this sample logstash config. It sends a copy of all traffic to STDOUT while sending sampled traffic to Honeycomb.

Forking your stream and sending data to Honeycomb

Now that all the fields in the message are in the main Logstash message, let’s send them on to Honeycomb! We’re going to use the HTTP output plugin to do so. We’ll need to add our Team Write Key to send messages in to Honeycomb. This example will send them to a dataset called “logstash”. Since the incoming JSON logs we filtered on above have the log time in a field called timestamp let’s use that to set the time of the event for Honeycomb. We want Logstash to send events to Honeycomb in JSON format, and to make sure that it can keep up we’ll configure it to have 10 threads sending events.

output {
  http {
    url => "https://api.honeycomb.io/1/events/logstash"
    http_method => "post"
    headers => {
      "X-Honeycomb-Team" => "YOUR_WRITE_KEY"
      "X-Honeycomb-Event-Time" => "%{@timestamp}"
      "X-Honeycomb-Samplerate" => "1"
    }
    format => "json"
    workers => 10
  }
}

Once Logstash has been restarted with the new configs, head over to your Dashboard and look for your new dataset.

Open Source

A slightly more full-featured logstash-output-honeycomb_json_batch is open source, Apache 2.0 licensed. Its source is on GitHub.