Using Derived Columns

New to running queries in Honeycomb? Check out the introduction to building queries!

Derived columns allow you to run queries based on the value of an expression that’s derived from the columns in an event, without having to modify the structure of the event itself.

For example, let’s say you have events from a web server. You want to compare latency for small requests and large requests, where a request is “small” if the value of its content_length field is less than 1000. Adding content_length to the “Break Down” clause of a query would create a separate time series for every single value of content_length. That’s not ideal.

Instead, you can use a derived column. Derived columns are similar to functions in a spreadsheet. In this case, we’ll want to break down by the expression

LTE($content_length, 1000)

which evaluates to true if the value of the content_length column is less than 1000, and false otherwise.

You can add a derived column by clicking “Create derived column” in the dropdown of the Break Down clause:

Derived column selector

That’ll open a modal where you can craft your expression:

Modal for creating a derived column

and see sample values computed from recent data. After you click “create column”, the query builder will update to include your new derived column:

Query builder after adding a derived column

You can now run the updated query to get the breakdown you wanted:

Query result with derived column

After you create a derived column, it’ll remain available for reuse in later queries. You can view the list of derived columns, add new ones, and delete old derived columns by going to your dataset’s schema page (from the query page, click “Overview”, then “Schema”).

Derived Column Syntax

Derived columns are composed of functions with some number of arguments enclosed in parenthesis. All function names are all-capital, case-sensitive:

Arguments can be any of the following:

Function Reference

The following functions can be used in derived column expressions. Functions operate within the context of a single event, meaning that each function takes column values from an event, and produces a new “derived” column attached to that event.

Conditional Operators

IF(condition, then-val, else-val)

If condition evaluates to true, evaluates to then-val, otherwise, else-val.

# Examples
IF(CONTAINS($team_name, "acme"), "important-customer", "everyone else")
IF(GTE($http_status, 400), true, false)

COALESCE(arg1, arg2, ...)

Evaluates to the first non-empty argument.

# Examples
COALESCE($full_name, $email)
COALESCE($container_id, $hostname, $availability_zone, "unknown")

Comparison Operators

LT(left, right)

True if left is less than right. Always false if either argument is empty, or a non-numeric value.

# Examples
LT($roundtrip_us, 500)
LT($mysql_read_ms, $mysql_write_ms)

LTE(left, right)

True if left is less than right. True if left equals right regardless of type, otherwise false if either argument is empty, or a non-numeric value.

# Examples
LTE($roundtrip_ms, 0.5)
LTE($get_schema_ms, $persist_schema_ms)

GT(left, right)

True if left is greater than right. Always false if either argument is empty, or a non-numeric value.

# Examples
GT($payload_size_kb, 300)
GT($num_invalid_columns, $num_valid_columns)

GTE(left, right)

True if left is greater than right. True if left equals right regardless of type, otherwise false if either argument is empty, or a non-numeric value.

# Examples
GTE($payload_size_mb, 0.3)
GTE($memory_inuse, MUL($max_memory_process, 0.75))

EQUALS(arg1, arg2)

True if arg1 and arg2 are equal. (Arguments of different types, such as the integer 200 and the string "200", are not considered equal.)

# Examples
EQUALS($remote_addr, "216.3.123.12")
EQUALS($gzipped, true)
EQUALS($oversize_num_columns, 0)

IN(arg1, compare1, ...)

True if arg1 is equal to any of the subsequent arguments.

# Examples
IN($method, "DELETE", "POST", "PUT")
IN($build_id, "9051", "9052")
IN($num_invalid_payloads, 0, 1, -1)

EXISTS(arg1)

True for an event where the supplied argument has a value, false where it does not.

# Examples
EXISTS($batch_size)
EXISTS($team_name)
EXISTS($json_serialization_ms)

Boolean Operators

NOT(arg1)

Evaluates arg1 to a boolean value, then inverts that value.

# Examples
NOT(EXISTS($batch))
NOT(IN($build_id, "175", "176"))
NOT(IN($company, "acme", "globex", "soylent"))

AND(arg1, arg2, ...)

True if all arguments are true.

# Examples
AND(GTE($roundtrip_ms, 100), EQUALS($method, "POST"))
AND(NOT(IN($method, "GET", "DELETE")), EXISTS($batch))
AND(EQUALS($api_version, "v3"), OR(LT($request_ms, 30), GT($request_ms, 300)))

OR(arg1, arg2, ...)

True if any argument is true.

# Examples
OR(EQUALS($company, "acme"), GTE($priority, 5))
OR(GTE($mysql_latency_ms, 20), AND(GTE($s3_latency_ms, 100), EQUALS($method, "GET")))

Math Operators

MIN(arg1, arg2, ...)

Evaluates to the smallest numeric argument. Empty and non-numeric arguments are ignored.

# Examples
MIN($memory_inuse_local, $memory_inuse_merge, $memory_inuse_fetch)

MAX(arg1, arg2, ...)

Evaluates to the largest numeric argument. Empty and non-numeric arguments are ignored.

# Examples
MAX($mysql_latency_ms, $redis_latency_ms)
MAX(1, DIV($total_volume, $count))

SUM(arg1, arg2, ...)

Evaluates to the sum of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Examples
SUM($serialize_ms, $scan_ms, $publish_ms)
SUM(1.0, 5, "2.3")

SUB(arg1, arg2)

Evaluates to arg1 - arg2. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Examples
SUB($serialization_ms, 100)
SUB($total_ms, SUM($local_ms, $merge_ms, $serialize_ms))

MUL(arg1, arg2, ...)

Evaluates to the product of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Examples
MUL($count, $time_per_item)
MUL(100, DIV($json_decode_ms, $total_ms))

DIV(numerator, denominator)

Evaluates to numerator / denominator. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero. A division by zero evaluates to null.

# Examples
DIV($io_bytes_read, 1024)
DIV($total_ms, $rows_examined)
DIV($json_parse_time_ms, $total_request_time_ms)

String Operators

CONCAT(arg1, arg2, ...)

Concatenates string representations of all arguments into a single string result. Non-string arguments are converted to strings, empty arguments are ignored.

# Examples
CONCAT($api_version, $sdk)
IF($is_batch, CONCAT($url, "-batch"), $url)

STARTS_WITH(string, prefix)

True if string starts with the substring prefix. False if either argument is not a string.

# Examples
STARTS_WITH($url, "https")
STARTS_WITH($user_agent, "ELB-")

CONTAINS(string, substr)

True if string contains the substring substr. False if either argument is not a string.

# Examples
CONTAINS($email, "@honeycomb.io")
CONTAINS($header_accept_encoding, "gzip")
IF(CONTAINS($url, "/v1/"), "api_v1", "api_v2")

REG_MATCH(string, regex)

True if string matches regex. False if string is not a string or is empty. regex must be a string literal containing a valid regular expression. Golang regex syntax can be tested here.

# Examples
REG_MATCH($error_msg, "^[a-z]+\[[0-9]+\]$")
REG_MATCH($referrer, "[\w-_]+\.(s3\.)?amazonaws.com")

REG_VALUE(string, regex)

Evaluates to the first regex submatch found in string. Evaluates to an empty value if string contains no matches or is not a string. regex must be a string literal containing a valid regular expression. Golang regex syntax can be tested here.

# Examples
REG_VALUE($user_agent, "Chrome/[\d.]+")
REG_VALUE($source, "^(ui-\d+|log|app-\d+)")

The first example above yields a string like Chrome/1.2.3 and the second could be any one of ui-123, log, or app-456.

REG_VALUE is most effective when commbined with other functions. As an example, the honeytail agent sets its User-Agent header to a string like libhoney-go/1.3.0 honeytail/1.378 (nginx), but there are also User-Agents like "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36.... In order to extract only the name of the parser used and not get caught up with other things in parentheses (such as the Macintosh... bit), we use this as a derived column:

IF(CONTAINS($user_agent, "honeytail"), REG_VALUE($user_agent, `\([a-z]+\)`), null)`

This results in fields that contain (nginx), (mysql), and so on. Combining CONTAINS or REG_MATCH with REG_VALUE is an easy way to limit the total number of strings available to the match and more effectively grab only the values you’re expecting.

UNIX_TIMESTAMP(string)

Converts a date string in RFC3339 format (e.g., 2017-07-20T11:22:44.888Z) to a Unix timestamp (1500549764.888). This is useful for comparing two timestamps in an event; for example, to calculate a duration from a start and an end timestamp.

# Examples
UNIX_TIMESTAMP($timestamp)