<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Functions on PySpark Is Rad</title>
    <link>https://pysparkisrad.com/functions/</link>
    <description>Recent content in Functions on PySpark Is Rad</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language><atom:link href="https://pysparkisrad.com/functions/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>abs</title>
      <link>https://pysparkisrad.com/functions/abs/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/abs/</guid>
      <description>abs # pyspark.sql.functions.abs(col) # version: since 1.3 Computes the absolute value.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1},{&amp;#34;num&amp;#34;: -2},{&amp;#34;num&amp;#34;: 0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;absolute&amp;#34;, F.abs(&amp;#34;num&amp;#34;)) ) df.show() num absolute 1 1 -2 2 0 0 Usage:
This is just a basic math function. Nothing special about it. I&amp;rsquo;ve used it before when doing subtraction between two columns.</description>
    </item>
    
    <item>
      <title>acos</title>
      <link>https://pysparkisrad.com/functions/acos/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/acos/</guid>
      <description>acos # pyspark.sql.functions.acos(col) # version: since 1.4 Inverse cosine of col, as if computed by java.lang.Math.acos()
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: 0.5},{&amp;#34;num&amp;#34;: 0.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;acos&amp;#34;, F.acos(&amp;#34;num&amp;#34;)) ) df.show() num acos 1.0 0.0 0.5 1.0471975511965979 0.0 1.5707963267948966 Usage:
This is just a basic math function. Nothing special about it. Never used it.</description>
    </item>
    
    <item>
      <title>acosh</title>
      <link>https://pysparkisrad.com/functions/acosh/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/acosh/</guid>
      <description>acosh # pyspark.sql.functions.acosh(col) # version: since 3.1.0 Computes inverse hyperbolic cosine of the input column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: 2.5},{&amp;#34;num&amp;#34;: 5.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;acosh&amp;#34;, F.acosh(&amp;#34;num&amp;#34;)) ) df.show() num acosh 1.0 0.0 2.5 1.566799236972411 5.0 2.2924316695611777 Usage:
This is just a basic math function. Nothing special about it. In fact I have no idea what it means.</description>
    </item>
    
    <item>
      <title>add_months</title>
      <link>https://pysparkisrad.com/functions/add_months/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/add_months/</guid>
      <description>add_months # pyspark.sql.functions.add_months(start, months) # version: since 1.5.0 Returns the date that is months months after start
start: date column
months: integer
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;date&amp;#34;: &amp;#39;2047-04-08&amp;#39;}, {&amp;#34;date&amp;#34;: &amp;#39;1999-12-31&amp;#39;}, {&amp;#34;date&amp;#34;: &amp;#39;1906-02-28&amp;#39;}] df = spark.createDataFrame(data) df = df.select(F.to_date(df.date, &amp;#39;yyyy-MM-dd&amp;#39;) .alias(&amp;#34;date&amp;#34;)) # Use function df = (df .withColumn(&amp;#34;add_months&amp;#34;, F.add_months(F.col(&amp;#34;date&amp;#34;),3)) ) df.show() date add_months 2047-04-08 2047-07-08 1999-12-31 2000-03-31 1906-02-28 1906-05-28 Usage:</description>
    </item>
    
    <item>
      <title>array</title>
      <link>https://pysparkisrad.com/functions/array/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array/</guid>
      <description>array # pyspark.sql.functions.array(col*) # version: since 1.4.0 Creates a new array column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: 1,&amp;#34;b&amp;#34;: 2},{&amp;#34;a&amp;#34;: 3,&amp;#34;b&amp;#34;: 4},{&amp;#34;a&amp;#34;: 5,&amp;#34;b&amp;#34;: 6}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;array&amp;#34;, F.array(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;))) ) df.show() a b array 1 2 [1, 2] 3 4 [3, 4] 5 6 [5, 6] Usage:
I use this often. Also used to create an empty array if needed by filling the array with none.</description>
    </item>
    
    <item>
      <title>array_contains</title>
      <link>https://pysparkisrad.com/functions/array_contains/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_contains/</guid>
      <description>array_contains # pyspark.sql.functions.array_contains(col, value) # version: since 1.5.0 Collection function: returns null if the array is null, true if the array contains the given value, and false otherwise.
value: value or column to check for in an array
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [],&amp;#34;b&amp;#34;: 1},{&amp;#34;a&amp;#34;: [1,2,2],&amp;#34;b&amp;#34;: 1}, {&amp;#34;a&amp;#34;: [4,5,5],&amp;#34;b&amp;#34;: 1}] df = spark.createDataFrame(data) # Use function df = (df .</description>
    </item>
    
    <item>
      <title>array_distinct</title>
      <link>https://pysparkisrad.com/functions/array_distinct/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_distinct/</guid>
      <description>array_distinct # pyspark.sql.functions.array_distinct(col) # version: since 2.4.0 Collection function: removes duplicate values from the array.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [1,2,2],&amp;#34;b&amp;#34;: 1},{&amp;#34;b&amp;#34;: 1}] df = spark.createDataFrame(data).drop(&amp;#34;b&amp;#34;) # Use function df = (df .withColumn(&amp;#34;array_distinct&amp;#34;, F.array_distinct(F.col(&amp;#34;a&amp;#34;))) ) df.show() a array_distinct [1, 2, 2] [1, 2] null null Usage:
Simple array function. I have used it a lot. Especially when combining two columns of arrays that may have the same values in them.</description>
    </item>
    
    <item>
      <title>array_except</title>
      <link>https://pysparkisrad.com/functions/array_except/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_except/</guid>
      <description>array_except # pyspark.sql.functions.**array_except(col1, col2) # version: since 2.4.0 Collection function: returns an array of the elements in col1 but not in col2, without duplicates.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [1,2,2],&amp;#34;b&amp;#34;: [3,2,2]},{&amp;#34;b&amp;#34;: [1,2,2]},{&amp;#34;a&amp;#34;: [4,5,5],&amp;#34;b&amp;#34;: [1,2,2]},{&amp;#34;a&amp;#34;: [4,5,5]}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;array_except&amp;#34;, F.array_except(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;))) ) df.show() a b array_except [1, 2, 2] [3, 2, 2] [1] null [1, 2, 2] null [4, 5, 5] [1, 2, 2] [4, 5] [4, 5, 5] null null Usage:</description>
    </item>
    
    <item>
      <title>array_intersect</title>
      <link>https://pysparkisrad.com/functions/array_intersect/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_intersect/</guid>
      <description>array_intersect # pyspark.sql.functions.array_intersect(col1, col2) # version: since 2.4.0 Collection function: returns an array of the elements in the intersection of col1 and col2, without duplicates.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [1,2,2],&amp;#34;b&amp;#34;: [3,2,2]},{&amp;#34;b&amp;#34;: [1,2,2]},{&amp;#34;a&amp;#34;: [4,5,5],&amp;#34;b&amp;#34;: [1,2,2]},{&amp;#34;a&amp;#34;: [4,5,5]}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;array_intersect&amp;#34;, F.array_intersect(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;))) ) df.show() a b array_intersect [1, 2, 2] [3, 2, 2] [2] null [1, 2, 2] null [4, 5, 5] [1, 2, 2] [] [4, 5, 5] null null Usage:</description>
    </item>
    
    <item>
      <title>array_join</title>
      <link>https://pysparkisrad.com/functions/array_join/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_join/</guid>
      <description>array_join # pyspark.sql.functions.array_join(col, delimiter, null_replacement=None) # version: since 2.4.0 Concatenates the elements of column using the delimiter. Null values are replaced with null_replacement if set, otherwise they are ignored.
delimeter: string that goes between elements
null_replacement: string instead of None for null
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:1,&amp;#34;b&amp;#34;:2,&amp;#34;c&amp;#34;:2},{&amp;#34;a&amp;#34;:3,&amp;#34;c&amp;#34;:5}] df = spark.createDataFrame(data) df = df.select(F.array(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;),F.col(&amp;#34;c&amp;#34;)).alias(&amp;#34;a&amp;#34;)) # Use function df = (df .</description>
    </item>
    
    <item>
      <title>array_max</title>
      <link>https://pysparkisrad.com/functions/array_max/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_max/</guid>
      <description>array_max # pyspark.sql.functions.array_max(col) # version: since 2.4.0 Collection function: returns the maximum value of the array.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:1,&amp;#34;b&amp;#34;:2,&amp;#34;c&amp;#34;:2},{&amp;#34;a&amp;#34;:3,&amp;#34;c&amp;#34;:5}] df = spark.createDataFrame(data) df = df.select(F.array(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;),F.col(&amp;#34;c&amp;#34;)).alias(&amp;#34;a&amp;#34;)) # Use function df = (df .withColumn(&amp;#34;array_max&amp;#34;, F.array_max(F.col(&amp;#34;a&amp;#34;))) ) df.show() a array_max [1, 2, 2] 2 [3, null, 5] 5 Usage:
Simple array function.
returns: Column(sc.\_jvm.functions.array_max(\_to_java_column(col))) PySpark manual
tags: largest number in array, highest in array, highest in list</description>
    </item>
    
    <item>
      <title>array_min</title>
      <link>https://pysparkisrad.com/functions/array_min/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_min/</guid>
      <description>array_min # pyspark.sql.functions.array_min(col) # version: since 2.4.0 Collection function: returns the minimum value of the array.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:1,&amp;#34;b&amp;#34;:2,&amp;#34;c&amp;#34;:2},{&amp;#34;a&amp;#34;:3,&amp;#34;c&amp;#34;:5}] df = spark.createDataFrame(data) df = df.select(F.array(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;),F.col(&amp;#34;c&amp;#34;)).alias(&amp;#34;a&amp;#34;)) # Use function df = (df .withColumn(&amp;#34;array_min&amp;#34;, F.array_min(F.col(&amp;#34;a&amp;#34;))) ) df.show() a array_min [1, 2, 2] 1 [3, null, 5] 3 Usage:
Simple array function.
returns: Column(sc.\_jvm.functions.array_min(\_to_java_column(col))) PySpark manual
tags: smallest number in array, lowest in array, lowest in list</description>
    </item>
    
    <item>
      <title>array_position</title>
      <link>https://pysparkisrad.com/functions/array_position/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_position/</guid>
      <description>array_position # pyspark.sql.functions.array_position(col, value) # version: since 2.4.0 Collection function: Locates the position of the first occurrence of the given value in the given array. Returns null if either of the arguments are null. Returns 0 if value not found.
Note that the return value is the cardinal position, not zero based.
value: string or number
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:1,&amp;#34;b&amp;#34;:2,&amp;#34;c&amp;#34;:2},{&amp;#34;a&amp;#34;:3,&amp;#34;c&amp;#34;:5}] df = spark.</description>
    </item>
    
    <item>
      <title>array_remove</title>
      <link>https://pysparkisrad.com/functions/array_remove/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_remove/</guid>
      <description>array_remove # pyspark.sql.functions.array_remove(col, element) # version: since 2.4.0 Collection function: Remove all elements that equal to element from the given array.
element: string or number
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:1,&amp;#34;b&amp;#34;:2,&amp;#34;c&amp;#34;:2},{&amp;#34;a&amp;#34;:3,&amp;#34;c&amp;#34;:5}] df = spark.createDataFrame(data) df = df.select(F.array(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;),F.col(&amp;#34;c&amp;#34;)).alias(&amp;#34;a&amp;#34;)) # Use function df = (df .withColumn(&amp;#34;array_remove&amp;#34;, F.array_remove(F.col(&amp;#34;a&amp;#34;),2)) ) df.show() a array_remove [1, 2, 2] [1] [3, null, 5] [3, null, 5] Usage:</description>
    </item>
    
    <item>
      <title>array_repeat</title>
      <link>https://pysparkisrad.com/functions/array_repeat/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_repeat/</guid>
      <description>array_repeat # pyspark.sql.functions.array_repeat(col, count) # version: since 2.4.0 Collection function: creates an array containing a column repeated count times.
count: int
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:1},{&amp;#34;a&amp;#34;:2},{&amp;#34;a&amp;#34;:5}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;array_repeat&amp;#34;, F.array_repeat(F.col(&amp;#34;a&amp;#34;),3)) ) df.show() a array_repeat 1 [1, 1, 1] 2 [2, 2, 2] 5 [5, 5, 5] Usage:
Simple array function.
return Column(sc.\_jvm.functions.array_repeat(\_to_java_column(col),\_to_java_column(count) if isinstance(count, Column) else count)) PySpark manual</description>
    </item>
    
    <item>
      <title>array_sort</title>
      <link>https://pysparkisrad.com/functions/array_sort/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_sort/</guid>
      <description>array_sort # pyspark.sql.functions.array_sort(col) # version: since 2.4.0 Collection function: sorts the input array in ascending order. The elements of the input arraymust be orderable. Null elements will be placed at the end of the returned array.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;:3,&amp;#34;b&amp;#34;:2,&amp;#34;c&amp;#34;:2},{&amp;#34;a&amp;#34;:3,&amp;#34;c&amp;#34;:5}] df = spark.createDataFrame(data) df = df.select(F.array(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;),F.col(&amp;#34;c&amp;#34;)).alias(&amp;#34;a&amp;#34;)) # Use function df = (df .withColumn(&amp;#34;array_sort&amp;#34;, F.array_sort(F.col(&amp;#34;a&amp;#34;))) ) df.show() a array_sort [3, 2, 2] [2, 2, 3] [3, null, 5] [3, 5, null] Usage:</description>
    </item>
    
    <item>
      <title>array_union</title>
      <link>https://pysparkisrad.com/functions/array_union/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/array_union/</guid>
      <description>array_union # pyspark.sql.functions.array_union(col1, col2) # version: since 2.4.0 Collection function: returns an array of the elements in the union of col1 and col2, without duplicates.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [1,2,2],&amp;#34;b&amp;#34;: [3,2,2]},{&amp;#34;b&amp;#34;: [1,2,2]},{&amp;#34;a&amp;#34;: [4,5,5],&amp;#34;b&amp;#34;: [1,2,2]}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;array_union&amp;#34;, F.array_union(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;))) ) df.show() a b array_union [1, 2, 2] [3, 2, 2] [1, 2, 3] null [1, 2, 2] null [4, 5, 5] [1, 2, 2] [4, 5, 1, 2] Usage:</description>
    </item>
    
    <item>
      <title>arrays_overlap</title>
      <link>https://pysparkisrad.com/functions/arrays_overlap/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/arrays_overlap/</guid>
      <description>arrays_overlap # pyspark.sql.functions.arrays_overlap(a1, a2) # version: since 2.4.0 Collection function: returns true if the arrays contain any common non-null element; if not, returns null if both the arrays are non-empty and any of them contains a null element; returns false otherwise.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [1,2,2],&amp;#34;b&amp;#34;: [3,2,2]},{&amp;#34;b&amp;#34;: [1,2,2]},{&amp;#34;a&amp;#34;: [4,5,5],&amp;#34;b&amp;#34;: [1,2,2]}] df = spark.createDataFrame(data) # Use function df = (df .</description>
    </item>
    
    <item>
      <title>arrays_zip</title>
      <link>https://pysparkisrad.com/functions/arrays_zip/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/arrays_zip/</guid>
      <description>arrays_zip # pyspark.sql.functions.arrays_zip(cols*) # version: since 2.4.0 Collection function: Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: [1,2],&amp;#34;b&amp;#34;: [3,2]},{&amp;#34;b&amp;#34;: [1,2]}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;arrays_zip&amp;#34;, F.arrays_zip(F.col(&amp;#34;a&amp;#34;),F.col(&amp;#34;b&amp;#34;))) ) df.show() a b arrays_zip [1, 2] [3, 2] [{1, 3}, {2, 2}] null [1, 2] null Usage:</description>
    </item>
    
    <item>
      <title>ascii</title>
      <link>https://pysparkisrad.com/functions/ascii/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/ascii/</guid>
      <description>ascii # pyspark.sql.functions.ascii(col) # version: since 1.5.0 Computes the numeric value of the first character of the string column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;b&amp;#34;: &amp;#34;hi&amp;#34;,&amp;#34;a&amp;#34;: &amp;#34;aa&amp;#34;},{&amp;#34;a&amp;#34;: &amp;#34;&amp;#34;},{&amp;#34;b&amp;#34;: &amp;#34;bob&amp;#34;}] df = spark.createDataFrame(data).drop(&amp;#34;b&amp;#34;) # Use function df = (df .withColumn(&amp;#34;ascii&amp;#34;, F.ascii(F.col(&amp;#34;a&amp;#34;))) ) df.show() a ascii aa 97 0 null null Usage:
Simple function. Just gets the ascii code of the first letter.</description>
    </item>
    
    <item>
      <title>asin</title>
      <link>https://pysparkisrad.com/functions/asin/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/asin/</guid>
      <description>asin # pyspark.sql.functions.asin(col) # version: since 1.3 Inverse sine of col, as if computed by java.lang.Math.asin()
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: .5},{&amp;#34;num&amp;#34;: 0.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;asin&amp;#34;, F.asin(&amp;#34;num&amp;#34;)) ) df.show() num asin 1.0 1.5707963267948966 0.5 0.5235987755982989 0.0 0.0 Usage:
This is just a basic math function. Nothing special about it. Never used it.</description>
    </item>
    
    <item>
      <title>asinh</title>
      <link>https://pysparkisrad.com/functions/asinh/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/asinh/</guid>
      <description>asinh # pyspark.sql.functions.asinh(col) # version: since 3.1.0 Computes inverse hyperbolic sine of the input column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: .5},{&amp;#34;num&amp;#34;: 0.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;asinh&amp;#34;, F.asinh(&amp;#34;num&amp;#34;)) ) df.show() num asinh 1.0 0.8813735870195429 0.5 0.48121182505960347 0.0 0.0 Usage:
This is just a basic math function. Nothing special about it. Never used it.</description>
    </item>
    
    <item>
      <title>assert_true</title>
      <link>https://pysparkisrad.com/functions/assert_true/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/assert_true/</guid>
      <description>assert_true # pyspark.sql.functions.assert_true(col, errMsg=None) # version: since 3.1.0 Returns null if the input column is true; throws an exception with the provided error message otherwise.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: 1,&amp;#34;b&amp;#34;: 2}] df = spark.createDataFrame(data) # Use function f = (df .withColumn(&amp;#34;assert_true&amp;#34;, F.assert_true( F.col(&amp;#34;a&amp;#34;) &amp;lt; F.col(&amp;#34;b&amp;#34;))) ) df.show() a b assert_true 1 2 null Usage:
Never used it. But I could see it being useful for some form of validation.</description>
    </item>
    
    <item>
      <title>atan</title>
      <link>https://pysparkisrad.com/functions/atan/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/atan/</guid>
      <description>atan # pyspark.sql.functions.atan(col) # version: since 1.4 Inverse tangent of col, as if computed by java.lang.Math.atan()
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: .5},{&amp;#34;num&amp;#34;: 0.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;atan&amp;#34;, F.atan(&amp;#34;num&amp;#34;)) ) df.show() num atan 1.0 0.7853981633974483 0.5 0.4636476090008061 0.0 0.0 Usage:
This is just a basic math function. Nothing special about it. Never used it.</description>
    </item>
    
    <item>
      <title>atan2</title>
      <link>https://pysparkisrad.com/functions/atan2/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/atan2/</guid>
      <description>atan2 # pyspark.sql.functions.atan2(col1, col2) # version: since 1.4 The theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2()
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num1&amp;#34;: 1.0,&amp;#34;num2&amp;#34;: 1.0},{&amp;#34;num1&amp;#34;: .5,&amp;#34;num2&amp;#34;: 1.0},{&amp;#34;num1&amp;#34;: 0.0,&amp;#34;num2&amp;#34;: 1.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;atan2&amp;#34;, F.atan2(&amp;#34;num1&amp;#34;,&amp;#34;num2&amp;#34;)) ) df.show() num1 num2 atan2 1.</description>
    </item>
    
    <item>
      <title>atanh</title>
      <link>https://pysparkisrad.com/functions/atanh/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/atanh/</guid>
      <description>atanh # pyspark.sql.functions.atanh(col1) # version: since 3.1.0 Computes inverse hyperbolic tangent of the input column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: .5},{&amp;#34;num&amp;#34;: 0.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;atan&amp;#34;, F.atan(&amp;#34;num&amp;#34;)) ) df.show() num atan 1.0 0.7853981633974483 0.5 0.4636476090008061 0.0 0.0 Usage:
This is just a basic math function. Nothing special about it. Never used it.</description>
    </item>
    
    <item>
      <title>avg</title>
      <link>https://pysparkisrad.com/functions/avg/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/avg/</guid>
      <description>avg # pyspark.sql.functions.avg(col) # version: since 1.3 Aggregate function: returns the average of the values in a group.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;num&amp;#34;: 1.0},{&amp;#34;num&amp;#34;: .5},{&amp;#34;num&amp;#34;: 0.0}] df = spark.createDataFrame(data) # Use function df = (df .withColumn(&amp;#34;avg&amp;#34;, F.avg(&amp;#34;num&amp;#34;)) ) df.show() avg(num) 0.5 Usage:
Often used aggregation function.
returns: \_invoke_function_over_column(&#34;avg&#34;, col) PySpark manual
tags: average value, median, mean, average price
© 2023 PySpark Is Rad</description>
    </item>
    
    <item>
      <title>base64</title>
      <link>https://pysparkisrad.com/functions/base64/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/base64/</guid>
      <description>base64 # pyspark.sql.functions.base64(col) # version: since 1.5 Computes the BASE64 encoding of a binary column and returns it as a string column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe df = spark.createDataFrame([ (bytearray(b&amp;#39;0001&amp;#39;), 1) ], schema=T.StructType([ T.StructField(&amp;#34;bin&amp;#34;, T.BinaryType()), T.StructField(&amp;#34;number&amp;#34;, T.IntegerType()) ])) df= df.drop(&amp;#34;number&amp;#34;) # Use function df = (df .withColumn(&amp;#34;base64&amp;#34;, F.base64(F.col(&amp;#34;bin&amp;#34;))) ) df.show() bin base64 [30 30 30 31] MDAwMQ== Usage:
I&amp;rsquo;ve never used this one.</description>
    </item>
    
    <item>
      <title>bin</title>
      <link>https://pysparkisrad.com/functions/bin/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/bin/</guid>
      <description>bin # pyspark.sql.functions.bin(col) # version: since 1.5.0 Returns the string representation of the binary value of the given column.
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: 1,&amp;#34;b&amp;#34;: 2},{&amp;#34;a&amp;#34;: 3,&amp;#34;b&amp;#34;: 2},{&amp;#34;a&amp;#34;: 5},{&amp;#34;b&amp;#34;: 5}] df = spark.createDataFrame(data) df = df.drop(&amp;#34;b&amp;#34;) # Use function df = (df .withColumn(&amp;#34;bin&amp;#34;, F.bin(F.col(&amp;#34;a&amp;#34;))) ) df.show() a bin 1 1 3 11 5 101 null null Usage:
I don&amp;rsquo;t find myself needing binary very often.</description>
    </item>
    
    <item>
      <title>bround</title>
      <link>https://pysparkisrad.com/functions/bround/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://pysparkisrad.com/functions/bround/</guid>
      <description>bround # pyspark.sql.functions.bround(col, scale=0) # version: since 2.0.0 Round the given value to scale decimal places using HALF_EVEN rounding mode if scale &amp;gt;= 0 or at integral part when scale &amp;lt; 0.
Scale: Decimal places
Runnable Code:
from pyspark.sql import functions as F # Set up dataframe data = [{&amp;#34;a&amp;#34;: 1.85,&amp;#34;b&amp;#34;: 2},{&amp;#34;a&amp;#34;: 1.86},{&amp;#34;b&amp;#34;: 5}]#,{}] df = spark.createDataFrame(data) df = df.drop(&amp;#34;b&amp;#34;) # Use function df = (df .withColumn(&amp;#34;bround&amp;#34;, F.bround(F.col(&amp;#34;a&amp;#34;), scale=1)) ) df.</description>
    </item>
    
  </channel>
</rss>
