pyspark.sql.functions.regexp_count#

pyspark.sql.functions.regexp_count(str, regexp)[source]#

Returns a count of the number of times that the Java regex pattern regexp is matched in the string str.

New in version 3.5.0.

Parameters
strColumn or column name

target column to work on.

regexpColumn or column name

regex pattern to apply.

Returns
Column

the number of times that a Java regex pattern is matched in the string.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"])
>>> df.select('*', sf.regexp_count('str', sf.lit(r'\d+'))).show()
+---------+------+----------------------+
|      str|regexp|regexp_count(str, \d+)|
+---------+------+----------------------+
|1a 2b 14m|   \d+|                     3|
+---------+------+----------------------+
>>> df.select('*', sf.regexp_count('str', sf.lit(r'mmm'))).show()
+---------+------+----------------------+
|      str|regexp|regexp_count(str, mmm)|
+---------+------+----------------------+
|1a 2b 14m|   \d+|                     0|
+---------+------+----------------------+
>>> df.select('*', sf.regexp_count("str", sf.col("regexp"))).show()
+---------+------+-------------------------+
|      str|regexp|regexp_count(str, regexp)|
+---------+------+-------------------------+
|1a 2b 14m|   \d+|                        3|
+---------+------+-------------------------+
>>> df.select('*', sf.regexp_count(sf.col('str'), "regexp")).show()
+---------+------+-------------------------+
|      str|regexp|regexp_count(str, regexp)|
+---------+------+-------------------------+
|1a 2b 14m|   \d+|                        3|
+---------+------+-------------------------+