arrays_overlap

arrays_overlap #

pyspark.sql.functions.arrays_overlap(a1, a2) #

version: since 2.4.0

Collection function: returns true if the arrays contain any common non-null element; if not, returns null if both the arrays are non-empty and any of them contains a null element; returns false otherwise.

arrays_overlap

Runnable Code:

from pyspark.sql import functions as F
# Set up dataframe
data = [{"a": [1,2,2],"b": [3,2,2]},{"b": [1,2,2]},{"a": [4,5,5],"b": [1,2,2]}]
df = spark.createDataFrame(data)
# Use function
df = (df
     .withColumn("arrays_overlap",
       F.arrays_overlap(F.col("a"),F.col("b")))
     )
df.show()
a b arrays_overlap
[1, 2, 2] [3, 2, 2] true
null [1, 2, 2] null
[4, 5, 5] [1, 2, 2] false

Usage:

Simple array function. Similar to a python set intersection.



returns: Column(sc.\_jvm.functions.arrays_overlap(\_to_java_column(a1), \_to_java_column(a2)))

PySpark manual

tags: overlap array, overlap list, from both lists, set intersection




© 2023 PySpark Is Rad