PlaygRound

Apache Spark Performance Tuning

Apache Spark is one of the most popular computational frameworks in the world of big data. Spark provides different programming language interfaces, a rich set of APIs for batch and streaming processing, as well as machine learning tasks. However, it is required a lot of efforts to optimise and tune Spark applications in order to utilise the resource of the cluster and run efficiently.

Here are some of the techniques I have employed when building Spark applications for various projects.


The Shiny Moment of R

R has risen to number 8 from number 20 last year in the latest TIOBE index (August 2020), and is on the way to become TIOBE’s programming language of the year 2020. It is definitely worth writing down my journey with R.


头痛病史

困扰自己十多年的头痛,需要不断的吃止痛药,头痛程度却不断加重,也越来越频繁,虽然也有喝中药治疗,但效果也不大。机缘巧合,疫情期间在家工作,在老婆的大力支持、鼓励和帮助下,决定放弃止痛药,开始每天艾灸,并且尝试给自己开中药。此文记录下头痛的病史,希望对以后的诊断治疗有帮助。


艾灸和降龍十八掌,有关系吗?

徐老师的好文,艾灸的取穴方法和降龍十八掌的招式原来有异曲同工之妙。


Spark Summit 2020

Some links of Apache Spark Summit and Delta Lake demo notebooks.


Streaming Process with AWS Kinesis

Notes from the workshops of building a streaming data platform on AWS.


新冠防治要點

正气存内,邪不可干。面对疫情,心安不惧。徐文兵老師的新冠防治要點。


AWS Machine Learning Foundations Part Three

This is part three of the notes of the course AWS Machine Learning Foundations on Udacity.


AWS Machine Learning Foundations Part Two

This is part two of the notes of the course AWS Machine Learning Foundations on Udacity.


AWS Machine Learning Foundations Part One

This is part one of the notes of the course AWS Machine Learning Foundations on Udacity.