dataframe #
creating a simple dataframe #
Often times you want to create a simple dataframe to try out PySpark functions.
Here is my favorite way to create a simple dataframe.
data = [{"a": "hi"},{"a": "bye"},{"a": "fly"}]
df = spark.createDataFrame(data)
a |
---|
hi |
bye |
fly |
For me, using a dictionary is the easiest way.
You can use this technique to create arrays (lists). You can create nulls by not including a key. You can create maps (dictionaries).
data = [{"a": "hi","b":[1,2,3],"c":{"one":1}},{"a": "bye"},{"a": "fly","b":[4,5,6]}]
df = spark.createDataFrame(data)
a | b | c |
---|---|---|
hi | [1, 2, 3] | {one -> 1} |
bye | null | null |
fly | [4, 5, 6] | null |
tags: create dataframe, sample dataframe, createDataFrame
© 2023 PySpark Is Rad