Getting Other Webserver Logs into Honeycomb

Our agent’s nginx parser can be easily tricked in to parsing other webservers’ logs. You’ll create a config that contains the log format of your web server and pass it to the nginx parser.

As an example, this page describes how to consume HAProxy and Apache logs using the honeytail nginx parser.

Overview

To use the nginx parser to consume a non-nginx log file, we will create a config that looks something like nginx config and use it to define the log format. We’ll then run honeycomb on the log using the config file containing the format. The config file will have one statement log_format name '<format>'; (maybe broken up in to multiple lines). The format will be a series of labels identifying each field - the character following each label is the field separator. For example, to collect the HAProxy name, pid, ip address, and port from a log snippet of haproxy[291] 127.0.0.1:4715, you would use $process[$pid] $ip:$port as your format string. You can use any names you like for the labels—they will be used as the column names in Honeycomb.

Below are two examples—HAProxy’s http log formats and the default apache log format.

You will likely have to tailor these examples to your specific config depending on the version of the web server you’re running and other options you may have in their configs.

HAProxy http format

The HAProxy’s http format for logs has a wealth of detail packed in to a very compact form. Here’s a sample log line (from the HAProxy docs):

Feb  6 12:12:56 localhost \
  haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in static/srv1 \
  10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} {}\
  "GET /index.html HTTP/1.1"

Here’s the description of those fields (again, from the haproxy docs):

  Field   Format                                Extract from the example above
      1   process_name '[' pid ']:'                            haproxy[14389]:
      2   client_ip ':' client_port                             10.0.1.2:33317
      3   '[' accept_date ']'                       [06/Feb/2009:12:14:14.655]
      4   frontend_name                                                http-in
      5   backend_name '/' server_name                             static/srv1
      6   Tq '/' Tw '/' Tc '/' Tr '/' Tt*                       10/0/30/69/109
      7   status_code                                                      200
      8   bytes_read*                                                     2750
      9   captured_request_cookie                                            -
     10   captured_response_cookie                                           -
     11   termination_state                                               ----
     12   actconn '/' feconn '/' beconn '/' srv_conn '/' retries*    1/1/1/1/0
     13   srv_queue '/' backend_queue                                      0/0
     14   '{' captured_request_headers* '}'                   {haproxy.1wt.eu}
     15   '{' captured_response_headers* '}'                                {}
     16   '"' http_request '"'                      "GET /index.html HTTP/1.1"

Here’s the config snippet used to match that log line. Because there are two date fields, we’re going to use the one in square brackets [] because it’s easier to parse. We’ll stub out the syslog-provided date at the beginning of the line by using dots (to match any character). We can split up the log_format line into multiple lines for easier editing. Make sure the last line ends with a semicolon. For this example, let’s call this file hny-haproxy.conf

log_format haproxy '... .. ..:..:.. $hostname $process[$pid]: '
    '$client_ip:$client_port [$timestamp] $frontend $backend/$backend_server '
    '$time_client_connect/$time_queued/$time_backend_conn/$time_backend_resp/$time_total '
    '$status_code $bytes_read $request_cookie $response_cookie $termination_state '
    '$act_conn/$fe_conn/$be_conn/$srv_conn/$retries $srv_queue/$backend_queue '
    '{$request_headers} {$response_headers} "$request"';

To use this config, you’d run our agent honeytail like this:

honeytail -k <writekey> -p nginx -d haproxy -f /path/to/haproxy.log \
  --nginx.conf hny-haproxy.conf --nginx.format haproxy

Apache log format

Apache’s configuration can truly go as far as you want to take it. For this example, let’s just stick with the default log format.

Here’s an example line (split into two for readability)

207.46.1.2 - - [03/Nov/2016:16:11:43 -0700] "GET /robots.txt HTTP/1.1" 200 334 \
  "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

There’s not nearly as much there as in the HAProxy log, but let’s pull out what we can, taking a hint from the Apache docs to decipher the fields. Let’s call this file hny-apache.conf

log_format apache '$remote_ip $identd $user [$timestamp] "$request" $status_code '
  '$bytes_sent "$referrer" "$user_agent"';

To use this config, you’d run our agent honeytail like this:

honeytail -k <writekey> -p nginx -d apache -f /path/to/apache/access.log \
  --nginx.conf hny-apache.conf --nginx.format apache

Suggested Queries in Honeycomb

Just to whet your appetite, we’d like to suggest a few graphs to explore with your haproxy dataset:

Scrubbing Personally Identifiable Information

While we believe strongly in the value of being able to track down the precise query causing a problem, we understand the concerns of exporting log data which may contain sensitive user information.

With that in mind, we recommend using honeytail’s nginx parser, but adding a --scrub_field=sensitive_field_name flag to hash the concrete sensitive_field_name value, or --drop_field=sensitive_field_name to drop it altogether and prevent it being sent to Honeycomb’s servers.

More information about dropping or scrubbing sensitive fields can be found here.

Parsing URL Patterns

honeytail can break URLs up into their component parts, storing extra information in additional columns. This behavior is turned on by default for the request field on nginx datasets, but can become more useful with a little bit of guidance from you.

See honeytail’s documentation for details on configuring our agent to parse URL strings.