You Could Have Invented Structured Logging

Sometimes we hear from folks who are a little bit intimidated by the notion of structured logging. Some common issues:

  • There’s no approachable library for structured logging in my language.
  • My logging today is messy, and changing it all is a daunting project.

These are legitimate concerns! But I have some opinions:

  • You might not need a fancy pants library
  • You can make incremental changes to your logging setup.

Structured logging is really all about giving yourself — and your team — a logging API to help you provide consistent context in events. An unstructured logger accepts strings. A structured logger accepts a map, hash, or dictionary that describes all the attributes you can think of for an event:

  • the function name and line number that the log line came from
  • the server’s host name
  • the application’s build ID or git SHA
  • information about the client or user issuing a request
  • timing information
  • etc.

The format and transport details — whether you choose JSON or something else, whether you log to a file or stdout or straight to a network API — might just be less important!

Structured logging basically just means you make a map and print it out or shove it in a queue

Here, let’s write a structured logging library!

def log(**data):
   print json.dumps(data)

All right, maybe we’re not quite ready for open-source fame yet. But there are two nice things about this “library”: first, it doesn’t even let you pass a bare string message! You have to pass a dictionary of key-value pairs. Secondly, it produces structured, self-describing output that can be consumed by humans and machines alike.

For example, this log line is not self-describing:

127.0.0.1 - - [12/Oct/2017 17:36:36] "GET / HTTP/1.1" 200 -

I mean, we can kinda guess at what it’s about. But if we start adding more data, it’s going to be really hard to remember which dash means what. In contrast, we can write code such as:

log(upstream_address="127.0.0.1",
    hostname="my-awesome-appserver",
    date=datetime.now().isoformat(),
    request_method="GET",
    request_path="/",
    status=200)

and get this output:

{
    "upstream_address": "127.0.0.1",
    "hostname": "my-awesome-appserver",
    "date": "2017-10-21T17:36:36",
    "request_method": "GET",
    "request_path": "/"
    "status": 200
}

That’s comprehensible both to machines that might parse these logs later, and to our fellow humans. Awesome.

There are a few more features worth adding. For example, we might want to add the same context at different places in our code. So let’s wrap the logger in a class that we can bind context to:

class Logger(object):
   def __init__(self):
       self.context = {}

   def log(self, **data):
       data.update(self.context)
       print json.dumps(data)

   def bind(self, key, value):
       self.context[key] = value

Now we can write

logger = Logger()
logger.bind("hostname", "my-awesome-appserver")
logger.bind("build_id", 2309)

And all calls to logger.log will automatically include hostname and build_id.

We also might want more flexible control over the output:

class Logger(object):
   def __init__(self, writer=None):
       self.context = {}
       self.writer = writer or jsonwriter

   def log(self, **data):
       data.update(self.context)
       self.writer(data)

   # ....

def jsonwriter(data):
        print json.dumps(data)

Now when our colleagues complain that JSON hurts their eyes, we can substitute a different writer for their local development. We can also test our logging by using a writer that accumulates records in memory, or introduce some buffering for performance if needed.

Finally, we want to do some post-processing of events: e.g., automatically add timestamps, or redact certain fields:

class Logger(object):
   def __init__(self, writer=None):
       self.context = {}
       self.processors = []
       self.writer = writer or jsonwriter

   def add_processor(self, processor):
       if processor not in self.processors:
           self.processors.append(processor)

   def log(self, **data):
       data.update(self.context)
       for p in self.processors:
           p(data)
       self.writer(data)

   # ...

def timestamper(data):
    data["timestamp"] = datetime.now().isoformat()

def caller_recorder(data):
    # Python magic, don't worry about it
    frame = sys._getframe()
    caller_frame = frame.f_back.f_back
    data["caller"] = caller_frame.f_globals.get("__name__")

Now if we instantiate our logger like so:

logger = Logger()
logger.add_processor(timestamper)
logger.add_processor(caller_recorder)

Then all our logs automatically include the timestamp and call site:

# Application code
def square(arg):
    result = arg**2
    logger.log(arg=arg, result=result)
    return result

square(22)
# produces the log:
#{
#    "caller": "square",
#    "timestamp": "2017-10-21T17:36:36",
#    "arg": 22,
#    "result": 484
#}

All right. So with fewer than 30 lines of code, we have testable structured logging, we can choose how to transport those logs, and we have composable output transformation. Not bad.

janeway says not bad!

Some suggestions for effective structured logging

Don’t be afraid to develop your own idioms

It’s totally reasonable for a mature project or organization to maintain its own module of logging conveniences. You can have some startup configuration that outputs nice colors and pretty formatting when you’re developing locally, but just emits JSON when your service runs in production. You can have helpers for adding domain-specific context to all your requests (e.g.customer name, ID and pricing plan). You can be creative.

Don’t be afraid to make incremental changes

You might have lots of log statements all over your code base. You might have some gnarly logging pipeline. Don’t worry about that for now. First, identify your service’s core unit of work: is it serving an HTTP request? Is it handling a job in a job queue?

Then, write one structured event for that unit of work. “Every time we’re done handling a request, we record its status code, duration, client ID, and whatever else we can think of.” Don’t sink weeks into changing all of your logging at once. If structured logs for even a subset of your code base help you better understand what’s happening in production (and they probably will), then you can invest in standardization. Don’t be daunted!

Have thoughts on this post? Let us know via Twitter @honeycombio.