Substring
substring
extracts part of some text. This function is useful for cleaning up text (or any value with a string data type) that has a consistent format.
For example, substring
should work well on strings like SKU numbers, ISO codes, and standardized email addresses.
Syntax | Example |
---|---|
substring(text, position, length) |
substring("user_id@email.com", 1, 7) |
Extracts part of the text given a starting point (position) and a length (number of characters). | “user_id” |
Parameters
- The first character in your string is at position 1.
- The length of your substring should always be a positive number.
Getting a substring from the left
Mission ID | Agent |
---|---|
19951113006 | 006 |
20061114007 | 007 |
19640917008 | 008 |
Agent is a custom column with the expression:
substring([Mission ID], 9, 3)
Getting a substring from the right
Instead of using a number for the position, you’ll use the formula
1 + length([column]) - position_from_right
where position_from_right
is the number of characters you want to count from right to left.
Mission ID | Agent |
---|---|
19951113006 | 006 |
20061114007 | 007 |
19640917008 | 008 |
Here, Agent is a custom column with the expression:
substring([Mission ID], (1 + length([Mission ID]) - 3), 3)
Accepted data types
Data type | Works with substring |
---|---|
String | ✅ |
Number | ❌ |
Timestamp | ❌ |
Boolean | ❌ |
JSON | ❌ |
Limitations
substring
extracts text by counting a fixed number of characters. If you need to extract text based on some more complicated logic, try regexextract
.
And if you only need to clean up extra whitespace around your text, you can use the trim
, ltrim
, or rtrim
expressions instead.
Related functions
This section covers functions and formulas that work the same way as the Metabase substring
expression, with notes on how to choose the best option for your use case.
Other tools
Regexextract
Use regexextract if you need to extract text based on more specific rules. For example, you could get the agent ID with a regex pattern that finds the last occurrence of “00” (and everything after it):
regexextract([Mission ID], ".+(00.+)$")
should return the same result as
substring([Mission ID], 9, 3)
SQL
When you run a question using the notebook editor, Metabase will convert your graphical query settings (filters, summaries, etc.) into a query, and run that query against your database to get your results.
If our sample data is stored in a PostgreSQL database:
SELECT
mission_id,
SUBSTRING(mission_id, 9, 3) AS agent
FROM
this_message_will_self_destruct;
is equivalent to the Metabase substring
expression:
substring([Mission ID], 9, 3)
Spreadsheets
If our sample data is in a spreadsheet where “Mission ID” is in column A,
=mid(A2,9,3)
is the same as the Metabase substring
expression:
substring([Mission ID], 9, 3)
Python
Assuming the sample data is in a dataframe column called df
,
df['Agent'] = df['Mission ID'].str.slice(8, 11)
does the same thing as the Metabase substring
expression:
substring([Mission ID], 9, 3)
Further reading
Read docs for other versions of Metabase.