Getting RDS Logs for MySQL into Honeycomb

Amazon’s Relational Database Service, RDS, lets you use a number of databases without having to administer them yourself. Our RDS connector gives our MySQL connector access to the same data as if you were running MySQL on your own server.

At the moment, this connector is only compatible with MySQL running on RDS; let us know if you’d like another database!

Our connector surfaces attributes like:

Honeycomb is unique in its ability to calculate metrics and statistics on the fly, while retaining the full-resolution log lines (and the original MySQL query that started it all!).

Once you’ve got data flowing, be sure to take a look at our starter queries! Our entry points will help you see how we recommend comparing lock retention by normalized query, scan efficiency by collection, or read vs. write distribution by host.

Note: This document is for folks running MySQL on RDS. The following commands don’t need to be run from your RDS host; instead, they can be run from any Linux host with the appropriate AWS credentials to access the RDS API.

Important Configuration

Before running the RDS connector, you need to configure MySQL running on RDS to output the slow query log to a file. Amazon’s documentation on setting Parameter Groups can help you get started, and more detail about the configuration options below are in the MySQL docs for the slow query log.

Set the following options in the Parameter Group:

If, as part of making these changes, you needed to switch to a new Parameter Group, make sure you restart the database.

Once you’ve made these changes, you can verify you are getting RDS logs via the RDS Console

Download rdslogs

rdslogs will stream the MySQL slow query log from RDS or download older log files. It can stream them to STDOUT or directly to Honeycomb. You can view the rdslogs source here.

Get and verify the current Linux version of rdslogs:

wget -q && \
      echo '83f6ce950a38c59bce114b564e6b3b74fbaa278cd5c5205536bee3f522a72ebb  rdslogs_1.46_amd64.deb' | sha256sum -c && \
      sudo dpkg -i rdslogs_1.46_amd64.deb

Usage: Backfill existing logs

We suggest loading the past 24 hours of logs in to Honeycomb to start finding interesting things right away. You can launch this command to run in the background (it will take some time) while you hook up the live stream. (If you just enabled the slow query log, you won’t have the past 24 hours of logs. You can skip this step and go straight to streaming.)

The following command will download all available slow query logs to a newly created slow_logs directory and then start up honeytail to send the parsed events to Honeycomb. You’ll need your RDS instance identifier (from the instances page of the RDS Console) and your Honeycomb write key (from your Honeycomb account page).

mkdir slow_logs && \
  rdslogs -i <instance-identifier> --download --download_dir=slow_logs && \
  honeytail --writekey=YOUR_WRITE_KEY --dataset='RDS MySQL' --parser=mysql \
  --file='slow_logs/*' --backfill

Once you’ve finished backfilling your old logs, we recommend transitioning to the default streaming behavior to tail -f current logs.

Usage: tail -f current logs

The following command will connect to RDS and stream down current data for the slow query log. Piping this data in to honeytail will send it up to Honeycomb where you can examine the data in realtime. You’ll need your RDS instance identifier (from the instances page of the RDS Console) and your Honeycomb write key (from your Honeycomb account page).

rdslogs -i <instance-identifier> | honeytail --writekey=YOUR_WRITE_KEY \
  --dataset='RDS MySQL' --parser=mysql --file=-

Scrubbing Personally Identifiable Information

While we believe strongly in the value of being able to track down the precise query causing a problem, we understand the concerns of exporting log data which may contain sensitive user information.

With that in mind, we recommend using honeytail’s MySQL parser, but adding a --scrub_field=query flag to hash the concrete query value. The normalized_query attribute will still be representative of the shape of the query, and identifying patterns including specific queries will still be possible—but the sensitive information will be completely obscured before leaving your servers.

More information about dropping or scrubbing sensitive fields can be found here.