Apache Spark is one of the most popular computational frameworks in the world of big data. Spark provides different programming language interfaces, a rich set of APIs for batch and streaming processing, as well as machine learning tasks. However, it is required a lot of efforts to optimise and tune Spark applications in order to utilise the resource of the cluster and run efficiently.
Here are some of the techniques I have employed when building Spark applications for various projects.