array

array #

pyspark.sql.functions.array(col*) #

version: since 1.4.0

Creates a new array column.

array

Runnable Code:

from pyspark.sql import functions as F
# Set up dataframe
data = [{"a": 1,"b": 2},{"a": 3,"b": 4},{"a": 5,"b": 6}]
df = spark.createDataFrame(data)
# Use function
df = (df
     .withColumn("array",
       F.array(F.col("a"),F.col("b")))
     )
df.show()
a b array
1 2 [1, 2]
3 4 [3, 4]
5 6 [5, 6]

Usage:

I use this often. Also used to create an empty array if needed by filling the array with none. F.array(F.lit(None))



returns: Column(jc)

PySpark manual

tags: create list, empty array, empty list




© 2023 PySpark Is Rad