Jason Feng's blog

Parsing XML files made simple by PySpark

Imagine you are given a task to parse thousands of xml files to extract the information, write the records into table format with proper data types, the task must be done in a timely manner and is repeated every hour. What are you going to do? With Apache Spark, the embarrassingly parallel processing framework, it can be done with much less effort.

Posted by Jason Feng on July 14, 2019

Data visualization made easy with Flexdashboard

Flexdashboard is like a hidden gem which is mainly known for those people using R. In my opinion, it is a great visualization tool. You can substitute flexdashboard for those expensive commercial tools like Tableau and Power BI. Most importantly, it is totally free! You just need to know writing code in R.

Posted by Jason Feng on July 13, 2019

Terraform At a Glance

This is an excerption from qwiklabs. It is a quick introduction of Terraform which is an open source Infrastructure as Code tool to create, change and version the infrastructure safely and efficiently.

Posted by Jason Feng on July 7, 2019

Build a serverless text to speech endpoint

Implement a serverless and event-driven HTTP endpoint to convert text to speech using Cloud Function, Cloud Text-to-Speech API and Cloud Storage.

Try it out with this link. You can replace the text you want. Have fun!

Posted by Jason Feng on July 6, 2019

Streaming process NASA web access logs on GCP

This is the drafted solution to ingest nasa web access logs, process and clean the data, store into a data warehouse for further analysis.

It is implemented with GCP products, including Pub/Sub, Dataflow, BigQuery.

Posted by Jason Feng on June 10, 2019

Spark source code snippets

I put the source code snippets from the book Spark: The Definitive Guide into one piece. They cover most of the operations and common functions for DataFrames and Spark SQL in our daily life when writing Spark code.

Posted by Jason Feng on May 12, 2019

Install Docker on Debian/Ubuntu

A quick reference for myself regarding the steps to install Docker on Debian/Ubuntu.

Posted by Jason Feng on May 2, 2019

PlaygRound