AWS Data Analytics Goes Serverless

Posted by Jason Feng on December 4, 2021

AWS data analytics goes serverless. It is announced during re:invent 2021 that more tools in the data analytics stack are joining serverless family, namely EMR Serverless, Redshift Serverless, MSK Serverless and KDS on-demand.

In fact, with Lambda, Athena, Glue, Quicksight, DynamoDB and S3 which are serverless from the beginning, we can already tackle most of the small to medium size data analytics tasks. Now it is possible to run complex and really big data applications without managing the underlying servers.

However, I still prefer to have fine-grained control of EMR cluster. It allows me to optimize for cost and performance based on workload requirements. I can select the desired EC2 instances, install the additional packages, tweak Spark configuration suitable for the particular jobs. More importantly, I can have the ability to use Spot instances. With Lamdba and Step Functions to define the re-try mechanism in case of Spot instances termination, it can save a lot of cost in the long run.

Reference

Image by Free-Photos from Pixabay