array_union #
pyspark.sql.functions.array_union(col1, col2) #
version: since 2.4.0
Collection function: returns an array of the elements in the union of col1 and col2, without duplicates.

Runnable Code:
from pyspark.sql import functions as F
# Set up dataframe
data = [{"a": [1,2,2],"b": [3,2,2]},{"b": [1,2,2]},{"a": [4,5,5],"b": [1,2,2]}]
df = spark.createDataFrame(data)
# Use function
df = (df
     .withColumn("array_union",
       F.array_union(F.col("a"),F.col("b")))
     )
df.show()
| a | b | array_union | 
|---|---|---|
| [1, 2, 2] | [3, 2, 2] | [1, 2, 3] | 
| null | [1, 2, 2] | null | 
| [4, 5, 5] | [1, 2, 2] | [4, 5, 1, 2] | 
Usage:
Simple array function. Like a python set union.
returns: Column(sc.\_jvm.functions.array_union(\_to_java_column(col1), \_to_java_column(col2)))
tags: union array, union list, from both lists
© 2023 PySpark Is Rad