The MySQL Slow Query Log, AWS Golang SDK, RDS and You

5 Min. Read

Did you know you can do fun things with the MySQL slow query log, using the nothing but the AWS golang SDK for RDS?

It’s true! In order to let you submit RDS logs directly to Honeycomb, we’ve been exploring how to use the golang RDS SDK to tail the mysql slow query log. The things we’ve discovered that we didn’t find in the docs are fascinating. (All of it may very well be there somewhere, but if it is we failed to find it.)

For example:

  • The docs say the logs rotate every hour, but they don’t say it’s on the hour or what the rotating style is. The current log (slowquery.log) is rotated to slowquery.log.#, where # is the current hour {0..23} UTC. So at 18:00 UTC, the file slowquery.log.18 is written with the logs from 17:00 to 18:00 (This is mostly true. More on that later.)
  • When asking for a portion of the log file, you can hand it an index from which to start reading and the number of lines you would like to retrieve (both optional). You get back a struct with a LogFileData *string containing log lines and a Marker *string containing the index for the next segment of the log file. The Marker looks like #:#, where the first number is the hour and the second is the distance into the file.

Asking for current data is straightforward. First, ask for one line of the current log file with no marker. You’ll get back the most recent line (which you throw away) and the current marker. Wait a bit, then ask for the current file with the marker, and you’ll get all lines since that first one. Repeat.

There are two places where things get interesting: log rotation (aka the top of the hour) and asking for older data.

Log Rotation and Tiny Little Holes In Your Data

Let’s say it’s 17:58 UTC and you’re happily tailing content, fetching additional lines every 5 seconds. You get to 18:00:02 or so and the marker switches from 18:xxxx to 19:0 and suddenly you’re not getting any new data anymore. You wait and wait and … at 18:05:00, you start getting data! But it’s from 18:05:00; the data from between 18:00 and 18:05 is missing.

You take the last marker you got from your tail before data stopped flowing (with the 18: prefix) and use that marker as an index into slowquery.log.18. This gets you the data from between 18:00:02 and 18:05:00, then stops. Or! Once some amount of time (not necessarily the full 5 minutes) has passed, you can ask the slowquery.log for data since the 18:offset marker and you’ll get the data that’s missing.

Back you go to tailing the slowquery.log.

Ok, so here’s the next fun part — you can hand a marker to either the previous hour or the current hour to AWS when asking for the slowquery.log and it will give you data since then. If you do this after the 5 minute log rotation period has ended, you get all the intermediate data too! However, if you ask it for data where the number before the colon is not the current or previous hour, it ignores that and hands you back current data at the offset, specified by the second half of the marker.

You can ask for data from older files (the numbered log files) and pass them a marker, but in that case it always ignores the hour portion of the marker. It will happily return the hour portion you gave it, but it doesn’t affect the data you get. In other words, when asking for data from slowquery.log.8, passing it a marker of 12:123456 and 4:123456 are both treated as though they were 8:123456.

Try It Out For Yourself

To poke around for yourself, you’ll need to have RDS configured to write the slow query log to a file. If that’s the case, grab the DB instance identifier aws rds describe-db-instances --output table --region us-east-1 | grep DBInstanceIdentifier to list identifiers) and the log file aws rds describe-db-log-files --db-instance-identifier <ID> --output table --region <region> | sort -n to show log files.

Once you’ve got your instance identifier, log file, and region handy, download this go code 1 and give it a whirl:

go run tailrdslog.go --identifier myrdsinstanceid

Try passing it different markers and log files:

go run tailrdslog.go -i prod-mysql -l slowquery/mysql-slowquery.log.10 -m 14.123456

and see what happens!

It is especially interesting to run this across an hour boundary to see what happens as the log rolls over. The script will attempt to grab some of the data hiding in the previous log until everything returns to normal at 5 past the hour.

Output

It will output a line every 5s (the default polling interval) with:

  • the current time
  • the starting marker
  • the length of the content downloaded
  • the ending marker
  • the first timestamp encountered in the slow query log
  • the last timestamp encountered in the slow query log
  • (optional) sleep duration before the next poll
Oct  7 21:35:43 starting from 22:20523279, got 37424 content, next section from 22:20560703. first time is 21:35:39, last time is 21:35:43 sleeping 5 sec...
Oct  7 21:35:49 starting from 22:20560703, got 59056 content, next section from 22:20619759. first time is 21:35:43, last time is 21:35:48 sleeping 5 sec...
Oct  7 21:35:54 starting from 22:20619759, got 54649 content, next section from 22:20674408. first time is 21:35:48, last time is 21:35:53 sleeping 5 sec...
Oct  7 21:35:59 starting from 22:20674408, got 42533 content, next section from 22:20716941. first time is 21:35:54, last time is 21:35:59 sleeping 5 sec...

One nice thing to notice – the starting marker plus the content length gives you the ending marker. o/

Learn More

Check out our docs for all Honeycomb SDKs.


  1. It’s terrible code! It doesn’t check that pointers aren’t nil before derefenencing them. The control flow is awkward. Ifs are nested too deeply. Code is repeated. I encourage you to do none of those things when you cargo cult snippets into your production application. 🙂
    [return]
Don’t forget to share!
Ben Hartshorne

Ben Hartshorne

Principal Software Engineer

Ben has spent much of his career setting up monitoring systems for startups and now is thrilled to help the industry see a better way. He is always eager to find the right graph to understand a service and will look for every excuse to include a whiteboard in the discussion.

Related posts