v0.55

RegexExtract

⚠️ regexExtract is unavailable for MongoDB, SQLite, and SQL Server. For Druid, regexExtract is only available for the Druid-JDBC driver.

regexExtract uses regular expressions (regex) to get a specific part of your text.

regexExtract is ideal for text that has little to no structure, like URLs or freeform survey responses. If you’re working with strings in predictable formats like SKU numbers, IDs, or other types of codes, check out the simpler substring expression instead.

Use regexExtract to create custom columns with shorter, more readable labels for things like:

filter dropdown menus,
chart labels, or
embedding parameters.

Syntax	Example
`regexExtract(text, regular_expression)`	`regexExtract("regexExtract", "ex(.*)")`
Gets a specific part of your text using a regular expression.	“extract”

Searching and cleaning text

Let’s say that you have web data with a lot of different URLs, and you want to map each URL to a shorter, more readable campaign name.

URL	Campaign Name
https://www.metabase.com/docs/?utm_campaign=alice	alice
https://www.metabase.com/learn/?utm_campaign=neo	neo
https://www.metabase.com/glossary/?utm_campaign=candy	candy

You can create a custom column Campaign Name with the expression:

regexExtract([URL], "^[^?#]+\?utm_campaign=(.*)")

Here, the regex pattern ^[^?#]+\? matches all valid URL strings. You can replace utm_campaign= with whatever query parameter you like. At the end of the regex pattern, the capturing group (.*) gets all of the characters that appear after the query parameter utm_campaign=.

Now, you can use Campaign Name in places where you need clean labels, such as filter dropdown menus, charts, and embedding parameters.

Accepted data types

Data type	Works with `regexExtract`
String	✅
Number	❌
Timestamp	❌
Boolean	❌
JSON	❌

Limitations

regexExtract is unavailable for MongoDB, SQLite, and SQL Server. For Druid, regexExtract is only available for the Druid-JDBC driver.

Regex can be a dark art. You have been warned.

This section covers functions and formulas that work the same way as the Metabase regexExtract expression, with notes on how to choose the best option for your use case.

Metabase expressions

substring

Other tools

SQL
Spreadsheets
Python

Substring

Use substring when you want to search text that has a consistent format (the same number of characters, and the same relative order of those characters).

For example, you wouldn’t be able to use substring to get the query parameter from the URL sample data, because the URL paths and the parameter names both have variable lengths.

But if you wanted to pull out everything after https://www. and before .com, you could do that with either:

substring([URL], 13, 8)

regexExtract([URL], "^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/.\n]+)")

SQL

When you run a question using the notebook editor, Metabase will convert your graphical query settings (filters, summaries, etc.) into a query, and run that query against your database to get your results.

If our sample data is stored in a PostgreSQL database:

SELECT
    url,
    SUBSTRING(url, '^[^?#]+\?utm_campaign=(.*)') AS campaign_name
FROM follow_the_white_rabbit

is equivalent to the Metabase regexExtract expression:

regexExtract([URL], "^[^?#]+\?utm_campaign=(.*)")

Spreadsheets

If our sample data is in a spreadsheet where “URL” is in column A, the spreadsheet function

regexExtract(A2, "^[^?#]+\?utm_campaign=(.*)")

uses pretty much the same syntax as the Metabase expression:

regexExtract([URL], "^[^?#]+\?utm_campaign=(.*)")

Python

Assuming the sample data is in a dataframe column called df,

df['Campaign Name'] = df['URL'].str.extract(r'^[^?#]+\?utm_campaign=(.*)')

does the same thing as the Metabase regexExtract expression:

regexExtract([URL], "^[^?#]+\?utm_campaign=(.*)")

Business Intelligence

Embedded Analytics

Documentation

Learn

Analytics

Embedding

Administration

Other resources

RegexExtract

Searching and cleaning text

Accepted data types

Limitations

Substring

SQL

Spreadsheets

Python

Further reading

Was this helpful?

Thanks for your feedback!

Business Intelligence

Embedded Analytics

Documentation

Learn

Analytics

Embedding

Administration

Other resources

RegexExtract

Searching and cleaning text

Accepted data types

Limitations

Related functions

Substring

SQL

Spreadsheets

Python

Further reading

Was this helpful?

Thanks for your feedback!