array #
pyspark.sql.functions.array(col*) #
version: since 1.4.0
Creates a new array column.
Runnable Code:
from pyspark.sql import functions as F
# Set up dataframe
data = [{"a": 1,"b": 2},{"a": 3,"b": 4},{"a": 5,"b": 6}]
df = spark.createDataFrame(data)
# Use function
df = (df
.withColumn("array",
F.array(F.col("a"),F.col("b")))
)
df.show()
a | b | array |
---|---|---|
1 | 2 | [1, 2] |
3 | 4 | [3, 4] |
5 | 6 | [5, 6] |
Usage:
I use this often. Also used to create an empty array if needed by filling the array with none.
F.array(F.lit(None))
returns: Column(jc)
tags: create list, empty array, empty list
© 2023 PySpark Is Rad