dataframe

dataframe #

creating a simple dataframe #

Often times you want to create a simple dataframe to try out PySpark functions.

Here is my favorite way to create a simple dataframe.

data = [{"a": "hi"},{"a": "bye"},{"a": "fly"}]
df = spark.createDataFrame(data)
a
hi
bye
fly

For me, using a dictionary is the easiest way.

You can use this technique to create arrays (lists). You can create nulls by not including a key. You can create maps (dictionaries).

data = [{"a": "hi","b":[1,2,3],"c":{"one":1}},{"a": "bye"},{"a": "fly","b":[4,5,6]}]
df = spark.createDataFrame(data)
a b c
hi [1, 2, 3] {one -> 1}
bye null null
fly [4, 5, 6] null

tags: create dataframe, sample dataframe, createDataFrame




© 2023 PySpark Is Rad