Connecting Logstash to Honeycomb

Thanks to Logstash’s flexible plugin architecture, you can send a copy of all the traffic that Logstash is processing to Honeycomb. This topic explains how to use Logstash plugins to convert incoming log data into events and then send them to Honeycomb.

Data format requirements

Honeycomb is at its best when the events you send are broad and capture lots of information about a given process or transaction. For guidance on how to think about building events, start with Building Better Events. To learn more, check out the rest of the “Event Foo” series on our blog.

To process the log data coming into Logstash into Honeycomb events, you can use Logstash filter plugins. These filter plugins transform the data into top-level keys based on the original source of the data. We’ve found these to be especially useful:

To add and configure filter plugins, refer to Working with Filter Plugins on the Logstash documentation site.

Example: Using Logstash filter plugins to process haproxy logs for Honeycomb ingestion

Let’s say you’re sending haproxy logs (in HTTP mode) to Logstash. A log line describing an individual request looks something like this (borrowed from the haproxy config manual):

Feb  6 12:14:14 localhost \
          haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in \
          static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} \
          {} "GET /index.html HTTP/1.1"

Logstash puts this line in a message field, so in the filter parameter of the logstash.yaml config fragment below, we use the grok filter plugin and tell it to parse the message and make all the content available in top-level fields. And, since we don’t need it anymore, we tell grok to remove the message field.

The mutate filter plugin takes the numeric fields extracted by haproxy and turns them into integers so that Honeycomb can do math on them (later).

filter {
  grok {
    match => ["message", "%{HAPROXYHTTP}"]
    remove_field => ["message"]
  }
  mutate {
    convert => {
      "actconn" => "integer"
      "backend_queue" => "integer"
      "beconn" => "integer"
      "bytes_read" => "integer"
      "feconn" => "integer"
      "http_status_code" => "integer"
      "retries" => "integer"
      "srv_queue" => "integer"
      "srvconn" => "integer"
      "time_backend_connect" => "integer"
      "time_backend_response" => "integer"
      "time_duration" => "integer"
      "time_queue" => "integer"
      "time_request" => "integer"
    }
  }
}

Sending data to Honeycomb

Now that all the fields in the message are nicely extracted into events, send them on to Honeycomb! To send events, configure an output plugin.

Send data to Honeycomb with our open source Logstash output plugin

Honeycomb offers an open source, Apache 2.0 licensed plugin: logstash-output-honeycomb_json_batch. It’s available on GitHub, and you can install the plugin through rubygems like any other logstash plugin.

To get the latest version, run the following command:

bin/logstash-plugin install logstash-output-honeycomb_json_batch

To configure the Honeycomb Logstash output plugin, edit its configuration as described in this example.

Example: Configuring the honeycomb_json_batch output plugin to send data to Honeycomb

This config example sends the data to a dataset called “Logstash Batch Test.”

input {
  stdin {
    codec => json_lines
  }
}
output {
  honeycomb_json_batch {
    write_key => "TEAM_WRITE_KEY"
    dataset => "Logstash Batch Test"
  }
}

The following special logstash fields are extracted automatically:

Send data to Honeycomb using the Logstash HTTP output plugin

If you don’t want to install the honeycomb_json_batch plugin, you can use Logstash’s HTTP output plugin to craft HTTP requests to the Honeycomb API.

To configure the Logstash HTTP output plugin to send to Honeycomb, edit its configuration as described in this example:

Example: Configuring the Logstash HTTP output plugin to send data to Honeycomb

This config example sends the data to a dataset called “logstash.”

output {
  http {
    url => "https://api.honeycomb.io/1/events/logstash"
    http_method => "post"
    headers => {
      "X-Honeycomb-Team" => "YOUR_WRITE_KEY"
      "X-Honeycomb-Event-Time" => "%{@timestamp}"
      "X-Honeycomb-Samplerate" => "1"
    }
    format => "json"
    workers => 10
  }
}

Then, restart Logstash. When it’s back up, you will find the new dataset on your landing page.

Sampling traffic from Logstash

This section currently applies to sampling configuration in the Logstash HTTP output plugin. We will be updating this soon to include information regarding the honeycomb_json_batch output plugin.

High volume sites will want to send only a fraction of all traffic to Honeycomb. The drop filter can drop a portion of your traffic. When dropping traffic, you should also set the sample rate header in the HTTP output plugin in order to tell Honeycomb how much was dropped. The drop filter takes a percentage of traffic to drop and the Honeycomb sample rate header takes the ratio of events dropped, so the relationship between the two is as follows: drop_percentage = 100 - (100/sample_rate). For example, a drop percentage of 50% corresponds to a sample rate of 2. A drop percentage of 85.715% corresponds to a sample rate of 7.

filter {
  drop {
    # keep 1/7 events, percentage is 100-(100/7)
    percentage => 85.715
  }
}

Then in the output section below, you would set the X-Honeycomb-Samplerate header to 7.

To read more about sampling techniques, check out our Sampling Guide.

Dynamic sampling in Logstash

For more flexibility, this example uses different sample rates depending on the type of traffic flowing through Logstash. When a request has the status code of 200, that means everything worked correctly. This traffic is not as interesting as when things are going wrong. In this example, we will drop

filter {
  if [http_status_code] {
    if [http_status_code] >= 200 and [http_status_code] < 300 {
      drop { percentage => 90 }
      mutate { add_field => { "@samplerate" => "10" } }
    } else if [http_status_code] >= 500 {
      mutate { add_field => { "@samplerate" => "1" } }
    } else {
      drop { percentage => 50 }
      mutate { add_field => { "@samplerate" => "2" } }
    }
  }
}
output {
  http {
    ...
    headers => {
      "X-Honeycomb-Samplerate" => "%{@samplerate}"
    }
  }
}

Note: The filter here is doing numerical comparisons on the http_status_code field. Make sure it’s an integer type!

If you want to do dynamic sampling on data you’ll send to Honeycomb but preserve the full stream to send to a different output target, check out this sample logstash config. It sends a copy of all traffic to STDOUT while sending sampled traffic to Honeycomb.