arrays_zip

arrays_zip #

pyspark.sql.functions.arrays_zip(cols*) #

version: since 2.4.0

Collection function: Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

arrays_zip

Runnable Code:

from pyspark.sql import functions as F
# Set up dataframe
data = [{"a": [1,2],"b": [3,2]},{"b": [1,2]}]
df = spark.createDataFrame(data)
# Use function
df = (df
     .withColumn("arrays_zip",
       F.arrays_zip(F.col("a"),F.col("b")))
     )
df.show()
a b arrays_zip
[1, 2] [3, 2] [{1, 3}, {2, 2}]
null [1, 2] null

Usage:

Simple array function. Similar to a python zip.



returns: Column(sc.\_jvm.functions.arrays_zip(\_to_seq(sc, cols, \_to_java_column)))

PySpark manual

tags: zip array, zip list, from both lists




© 2023 PySpark Is Rad