Spark source code snippets

Posted by Jason Feng on May 12, 2019

I put the source code snippets from the book Spark: The Definitive Guide into one piece. They cover most of the operations and common functions for DataFrames and Spark SQL in our daily life when writing Spark code.

Apache Spark is arguably one of the most popular big data processing frameworks. Spark: The Definitive Guide is written by its creator Matei Zaharia. This book is a must-have comprehensive guide for anyone who wants to learn Spark.

The source code snippets include sections as:

  • DataFrame operations, such as filtering rows, adding columns, sorting rows, etc.
  • Work with different data types, string manipulation, regexp, timestamp, etc.
  • Perform aggregations, such as grouping, window functions
  • Join multiple DataFrames with different types: inner join, outer join, left/right join, cross join
  • Read and write with different data sources, such as csv, JSON, Parquet, SQL Databases

It can be found here.