Summary: in this tutorial, you will learn how to use the MySQL REGEXP_SUBSTR()
function to get the substring of the string that matches a regular expression.
Introduction to the MySQL REGEXP_SUBSTR() function
The REGEXP_SUBSTR()
function allows you to extract a substring of the string that matches a regular expression.
Here’s the syntax of the REGEXP_SUBSTR()
function:
REGEXP_SUBSTR (
string,
pattern,
position,
occurrence,
match_type
)
Code language: SQL (Structured Query Language) (sql)
In this syntax:
string
: This is the input string or value in a column that you want to extract the substring.pattern
: this is the pattern of the regular expression that you want to match.position
: This is the starting position in the string that you want the function to start matching. The position is optional. It defaults to 1 which starts matching from the beginning of the string.occurrence
: The string may have multiple matches. Theoccurrence
parameter specifies the match that you want to extract. The default value of theoccurrence
is 1, meaning that the function will return the first match.match_type
: This is a string that contains one or more options that instruct the function on how to perform matching. For example, c is case-sensitive matching whereas i is case-insensitive matching.
The function returns NULL
if there is no match. If the string or pattern is NULL
, the REGEXP_SUBSTR()
function also returns NULL
.
Prior to MySQL 8.0.17, the REGEXP_SUBSTR()
returns a string that uses the UTF-16 character set. In MySQL 8.0.17 and later, the function returns a substring that uses the character set and collation of the expression.
MySQL REGEXP_SUBSTR() function examples
Let’s take some examples of using the REGEXP_SUBSTR()
function.
1) Simple REGEXP_SUBSTR() function example
The following example uses the REGEXP_SUBSTR()
function to get the version of MySQL:
SELECT
REGEXP_SUBSTR('MySQL 8.0', '\\d+\.\\d+') version;
Code language: SQL (Structured Query Language) (sql)
Output:
+---------+
| version |
+---------+
| 8.0 |
+---------+
1 row in set (0.00 sec)
Code language: SQL (Structured Query Language) (sql)
In this example, we use the pattern '\\d+\.\\d+'
that matches one or more digits, followed by a (.) character, and followed by one or more digits. Therefore, the substring 8.0
matches the pattern.
2) Position example
By default, the REGEXP_SUBSTR()
starts matching from the beginning of the string. If you want the function to start searching for the matching a specified position, you can use a position argument. For example:
SELECT
REGEXP_SUBSTR(
'3 apples weighs 400 grams ', '\\d+',
2
) weight;
Code language: SQL (Structured Query Language) (sql)
Output:
+--------+
| weight |
+--------+
| 400 |
+--------+
1 row in set (0.00 sec)
Code language: SQL (Structured Query Language) (sql)
In this example, the \\d+
pattern matches one or more digits. However, we instruct the function to search from the position 2. Therefore, it returns the number 400 instead of 3.
3) Occurrence example
If there are multiple matches, you can specify which match you want to return by using the occurrence parameter. For example:
SELECT
REGEXP_SUBSTR(
'3 apples weighs 400 grams ', '\\d+',
1,
2
) weight;
Code language: SQL (Structured Query Language) (sql)
Output:
+--------+
| weight |
+--------+
| 400 |
+--------+
1 row in set (0.00 sec)
Code language: SQL (Structured Query Language) (sql)
In this example, the \\d+ matches two numbers 2 and 400. We instruct it to return the second match, which is 400.
4) Using REGEXP_SUBSTR() with table data
We’ll use the products
table from the sample database:
The following example uses the REGEXP_SUBSTR()
function to extract the year from the product names:
SELECT
productName,
REGEXP_SUBSTR(productName, '\\d{4}') year
FROM
products
WHERE
REGEXP_SUBSTR(productName, '\\d{4}') IS NOT NULL;
Code language: SQL (Structured Query Language) (sql)
Output:
+---------------------------------------------+------+
| productName | year |
+---------------------------------------------+------+
| 1969 Harley Davidson Ultimate Chopper | 1969 |
| 1952 Alpine Renault 1300 | 1952 |
| 1996 Moto Guzzi 1100i | 1996 |
| 2003 Harley-Davidson Eagle Drag Bike | 2003 |
| 1972 Alfa Romeo GTA | 1972 |
...
Code language: SQL (Structured Query Language) (sql)
Summary
- Use the MySQL
REGEXP_SUBSTR()
function to extract the substring of the string that matches a regular expression.