Using Derived Columns

New to running queries in Honeycomb? Check out the introduction to building queries!

Derived columns allow you to run queries based on the value of an expression that’s derived from the columns in an event, without having to modify the structure of the event itself.

For example, let’s say you have events from a web server. You want to compare latency for small requests and large requests, where a request is “small” if the value of its content_length field is less than 1000. Adding content_length to the “Break Down” clause of a query would create a separate time series for every single value of content_length. That’s not ideal.

Instead, you can use a derived column. Derived columns are similar to functions in a spreadsheet. In this case, we’ll want to break down by the expression

LTE($content_length, 1000)

which evaluates to true if the value of the content_length column is less than 1000, and false otherwise.

You can add a derived column by clicking “Create derived column” in the dropdown of the Break Down clause:

Derived column selector

That’ll open a modal where you can craft your expression:

Modal for creating a derived column

and see sample values computed from recent data. After you click “create column”, the query builder will update to include your new derived column:

Query builder after adding a derived column

You can now run the updated query to get the breakdown you wanted:

Query result with derived column

After you create a derived column, it’ll remain available for reuse in later queries. You can view the list of derived columns, add new ones, and delete old derived columns by going to your dataset’s schema page (from the query page, click “Overview”, then “Schema”).

Derived Column Syntax

Derived columns are composed of functions with some number of arguments enclosed in parenthesis. All function names are all-capital, case-sensitive:

Arguments can be any of the following:

Function Reference

The following functions can be used in derived column expressions. Functions operate within the context of a single event, meaning that each function takes column values from an event, and produces a new “derived” column attached to that event.

Conditional Operators

IF

If condition evaluates to true, evaluates to then-val, otherwise, else-val.

# Usage: IF(condition, then-val, else-val)
# Examples
IF(CONTAINS($team_name, "acme"), "important-customer", "everyone else")
IF(GTE($http_status, 400), true, false)

COALESCE

Evaluates to the first non-empty argument.

# Usage: COALESCE(arg1, arg2, ...)
# Examples
COALESCE($full_name, $email)
COALESCE($container_id, $hostname, $availability_zone, "unknown")

Comparison Operators

LT

True if the first provided value is less than the second. Always false if either argument is empty, or a non-numeric value.

# Usage: LT(left, right)
# Examples
LT($roundtrip_us, 500)
LT($mysql_read_ms, $mysql_write_ms)

LTE

True if the first provided value is less than the second. True if the operands equal each other (regardless of type; strings will be coerced to numbers), otherwise false if either argument is empty, or a non-numeric value.

# Usage: LTE(left, right)
# Examples
LTE($roundtrip_ms, 0.5)
LTE($get_schema_ms, $persist_schema_ms)

GT

True if the first provided value is greater than the second. Always false if either argument is empty, or a non-numeric value.

# Usage: GT(left, right)
# Examples
GT($payload_size_kb, 300)
GT($num_invalid_columns, $num_valid_columns)

GTE

True if the first provided value is greater than the second. True if the operands equal each other (regardless of type; strings will be coerced to numbers), otherwise false if either argument is empty, or a non-numeric value.

# Usage: GTE(left, right)
# Examples
GTE($payload_size_mb, 0.3)
GTE($memory_inuse, MUL($max_memory_process, 0.75))

EQUALS

True if the two provided arguments are equal. (Arguments of different types, such as the integer 200 and the string "200", are not considered equal.)

# Usage: EQUALS(arg1, arg2)
# Examples
EQUALS($remote_addr, "216.3.123.12")
EQUALS($gzipped, true)
EQUALS($oversize_num_columns, 0)

IN

True if the first provided argument is equal to any of the subsequent arguments. IN can be thought of as a more compact form of a series of OR equality checks.

# Usage: IN(arg1, compare1, ...)
# Examples
IN($method, "DELETE", "POST", "PUT")
IN($build_id, "9051", "9052")
IN($num_invalid_payloads, 0, 1, -1)

EXISTS

True when the supplied argument has a defined value, false where it does not.

# Usage: EXISTS(arg1)
# Examples
EXISTS($batch_size)
EXISTS($team_name)
EXISTS($json_serialization_ms)

Boolean Operators

NOT

Evaluates the provided argument to a boolean value, then inverts that value.

# Usage: NOT(arg1)
# Examples
NOT(EXISTS($batch))
NOT(IN($build_id, "175", "176"))
NOT(IN($company, "acme", "globex", "soylent"))

AND

Takes a variable number of arguments and returns true if all arguments are truthy.

# Usage: AND(arg1, arg2, ...)
# Examples
AND(GTE($roundtrip_ms, 100), EQUALS($method, "POST"))
AND(NOT(IN($method, "GET", "DELETE")), EXISTS($batch))
AND(EQUALS($api_version, "v3"), OR(LT($request_ms, 30), GT($request_ms, 300)))

OR

Takes a variable number of arguments and returns true if any arguments are truthy.

# Usage: OR(arg1, arg2, ...)
# Examples
OR(EQUALS($company, "acme"), GTE($priority, 5))
OR(GTE($mysql_latency_ms, 20), AND(GTE($s3_latency_ms, 100), EQUALS($method, "GET")))

Math Operators

MIN

Evaluates to the smallest numeric argument. Empty and non-numeric arguments are ignored.

# Usage: MIN(arg1, arg2, ...)
# Examples
MIN($memory_inuse_local, $memory_inuse_merge, $memory_inuse_fetch)

MAX

Evaluates to the largest numeric argument. Empty and non-numeric arguments are ignored.

# Usage: MAX(arg1, arg2, ...)
# Examples
MAX($mysql_latency_ms, $redis_latency_ms)
MAX(1, DIV($total_volume, $count))

SUM

Evaluates to the sum of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Usage: SUM(arg1, arg2, ...)
# Examples
SUM($serialize_ms, $scan_ms, $publish_ms)
SUM(1.0, 5, "2.3")

SUB

Evaluates to the first argument subtracted by the second, or arg1 - arg2. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Usage: SUB(arg1, arg2)
# Examples
SUB($serialization_ms, 100)
SUB($total_ms, SUM($local_ms, $merge_ms, $serialize_ms))

MUL

Evaluates to the product of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Usage: MUL(arg1, arg2, ...)
# Examples
MUL($count, $time_per_item)
MUL(100, DIV($json_decode_ms, $total_ms))

DIV

Divides the first argument by the second, or arg1 / arg2. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero. A division by zero evaluates to null.

# Usage: DIV(arg1, arg2)
# Examples
DIV($io_bytes_read, 1024)
DIV($total_ms, $rows_examined)
DIV($json_parse_time_ms, $total_request_time_ms)

String Operators

CONCAT

Concatenates string representations of all arguments into a single string result. Non-string arguments are converted to strings, empty arguments are ignored.

# Usage: CONCAT(arg1, arg2, ...)
# Examples
CONCAT($api_version, $sdk)
IF($is_batch, CONCAT($url, "-batch"), $url)

STARTS_WITH

True if the first argument starts with the second argument. False if either argument is not a string.

# Usage: STARTS_WITH(string, prefix)
# Examples
STARTS_WITH($url, "https")
STARTS_WITH($user_agent, "ELB-")

CONTAINS

True if the first argument contains the second argument. False if either argument is not a string.

# Usage: CONTAINS(string, substr)`
# Examples
CONTAINS($email, "@honeycomb.io")
CONTAINS($header_accept_encoding, "gzip")
IF(CONTAINS($url, "/v1/"), "api_v1", "api_v2")

REG_MATCH

True if the first argument matches the second argument, which must be a defined regular expression. False if the first argument is not a string or is empty. The provided regex must be a string literal containing a valid regular expression.

Note: Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it’s treated as a raw string literal.

# Usage: REG_MATCH(string, regex)
# Examples
REG_MATCH($error_msg, `^[a-z]+\[[0-9]+\]$`)
REG_MATCH($referrer, `[\w-_]+\.(s3\.)?amazonaws.com`)

REG_VALUE

Evaluates to the first regex submatch found in the first argument. Evaluates to an empty value if the first argument contains no matches or is not a string. The provided regex must be a string literal containing a valid regular expression.

Note: Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it’s treated as a raw string literal.

# Usage: REG_VALUE(string, regex)
# Examples
REG_VALUE($user_agent, `Chrome/[\d.]+`)
REG_VALUE($source, `^(ui-\d+|log|app-\d+)`)

The first example above yields a string like Chrome/1.2.3 and the second could be any one of ui-123, log, or app-456.

REG_VALUE is most effective when combined with other functions. As an example, the honeytail agent sets its User-Agent header to a string like libhoney-go/1.3.0 honeytail/1.378 (nginx), but there are also User-Agents like "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36.... In order to extract only the name of the parser used and not get caught up with other things in parentheses (such as the Macintosh... bit), we use this as a derived column:

IF(CONTAINS($user_agent, "honeytail"), REG_VALUE($user_agent, `\([a-z]+\)`), null)`

This results in fields that contain (nginx), (mysql), and so on. Combining CONTAINS or REG_MATCH with REG_VALUE is an easy way to limit the total number of strings available to the match and more effectively grab only the values you’re expecting.

REG_COUNT

Returns the number of non-overlapping successive matches yielded by the provided regex. Returns 0 if the first argument contains no matches or is not a string. The provided regex must be a string literal containing a valid regular expression.

Note: Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it’s treated as a raw string literal.

# Usage: REG_COUNT(string, regex)
# Examples
REG_COUNT($sql, `JOIN`)
REG_COUNT($ip, `19.`)

UNIX_TIMESTAMP

Converts a date string in RFC3339 format (e.g., 2017-07-20T11:22:44.888Z) to a Unix timestamp (1500549764.888). This is useful for comparing two timestamps in an event; for example, to calculate a duration from a start and an end timestamp.

# Usage: UNIX_TIMESTAMP(string)
# Examples
UNIX_TIMESTAMP($timestamp)