Generating Calculated Fields From Natural Language

Generating Calculated Fields From Natural Language

5 Min. Read

If you’ve been using Honeycomb for a bit, you know that Calculated Fields (otherwise known as derived columns) are a powerful way to transform your events to a format that’s easier to query and understand. However, they use a lisp-esque language that can be difficult to read and a pain to write.

If you dislike making Calculated Fields and want something a little easier, here’s a generative AI prompt that can generate them from natural language.

I tested this prompt against GitHub Copilot using Claude 3.5 and GPT 4o, as well as ChatGPT 4o, and all worked well.
Go to your favorite generative AI tool and provide this prompt.

Now, enter a natural language expression for your Calculated Fields. Here are some examples with the output from generative AI.

Input:

if raw_size_bytes exists and request.content_length is greater than 0, divide app.raw_size_bytes by request.content_length

Output:

IF(AND(EXISTS($raw_size_bytes), GT($request.content_length, 0)), DIV($raw_size_bytes, $request.content_length))
AI creating a derived column expression.

Wonderful! Now, simply take that expression and paste it into the Calculated Fields editor.

Note: If the expression is poorly formatted, as it is above, simply click the Format button in the editor.

Before:

Create a Calculated Field

After:

Create a Calculated Field

You can also iterate on an existing Calculated Field. For example, say you created a field like this:

Input:

If name = "executeQuery" and query-source = "api" and there are no errors then, if duration_ms < 15000 return true

Output:

IF(AND(EQUALS($name, "executeQuery"), EQUALS($app.query-source, "api"), NOT(EXISTS($error))),LT($duration_ms, 15000))

Input:

Now, add to that derived column that if team.id is not equal to abcdef, then the duration_m should be less than 15000

Output:

IF(AND(
   EQUALS($name, "executeQuery"),
   EQUALS($app.query-source, "api"),
   NOT(EXISTS($error))
   ),
   IF(NOT(EQUALS($team.id, "abcdef")),
      LT($duration_ms, 15000),
      true
)

That isn’t a great expression, it’s hard to read and honestly, I’m not sure it’ll work, so just ask it to try again.

Input:

That's not what I wanted. With the original derived column you created, add to the IF statement $team.id is not equal to abcdef

Output:

IF(AND(
   EQUALS($name, "executeQuery"),
   EQUALS($app.query-source, "api"),
   NOT(EXISTS($error)),
   NOT(EQUALS($team.id, "abcdef"))
   ),
LT($duration_ms, 15000)
)

That’s perfect!

How about bucketing?

Input:

If nlq.input_tokens_used exists, bucket the values into these ranges
   less than 1000
   1000-1250
   1250-1500
   greater than 1500

Output:

CONCAT(BUCKET($app.nlq.input_tokens_used, 250, 1000, 1500), " tokens")

Wonderful!

How about with uneven bucket sizes?

Input:

If nlq.input_tokens_used exists, bucket the values into these ranges:
  less than 1000
  1000-1250
  1250-1500
  1500-1750
  1750-2000
  2000-3000
  More than 3000

Output:

Beautiful. It correctly recognizes the BUCKET expression won’t work and instead uses nested IF statements.

You can also use the prompt to explain a Calculated Field!

Input:

Explain this derived column expression: IF( AND( EXISTS( $app.raw_size_bytes ), GT( $request.content_length, 0 ) ), DIV( $app.raw_size_bytes, $request.content_length ) )

If you don’t like the verbosity, just tell it to do the work succinctly.

Play around with the prompt and see what works for you!

Side note:
I was having trouble getting it to properly do a switch statement, and having trouble wording it properly so it would know what to do – so I used generative AI =>

Input:

The prompt for the SWITCH statement is this:
SWITCH: the first arg is the value to switch on, this must be a column name or an expression; pairs of subsequent arguments are the case to check if the first arg is equal to the case and return expression. If no matching case, return the final unpaired value if one exists, otherwise null. Can you rewrite this prompt so that it is clear that the cases cannot be expressions? They must be strings.

Output:

Output:
SWITCH: The first argument is the value to switch on, which must be a column name or an expression. The subsequent arguments must come in pairs, where each pair consists of:
- A **case**, which must be a **string literal**.
- A corresponding return expression.

If the value to switch on matches a case, the corresponding expression is returned. If no matching case is found, the final unpaired argument (if provided) is returned as the default. If no default is provided, the function returns null.

And now the prompt for SWITCH works great!

This is what generative AI does best

Generative AI has some room to grow until it can write reliable code. But one of the things it does well is shielding us from having to learn complex syntaxes. Being able to ask questions and refine prompts in plain English is a huge boon to software engineers everywhere.

If you’re interesting in seeing more posts like these, here’s a piece from Phillip Carter on how he codes with LLMs these days.

Don’t forget to share!
Molly Stamos

Molly Stamos

Software Engineer

Molly is a professional potato farmer by night and works on the Honeycomb codebase by day.

Related posts