"Apache Spark Fundamentals"


Level: Intermediate

Author: Justin Pihony


Our ever-connected world is creating data faster than Moore’s law can keep up, making it so that we have to be smarter in our decisions on how to analyze it. Previously, we had Hadoop’s MapReduce framework for batch processing, but modern big data processing demands have outgrown this framework. That’s where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark’s general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. In this course, you’ll learn Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark’s other libraries, such as the streaming and SQL APIs. Finally, you’ll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.

If you don’t have a Pluralsight account, you can still take this course! Use this link to get a free trial.

Sign up for a Free trial here

If you have a Pluralsight Account, you can start the course now!

Start Apache Spark Fundamentals Now

by Justin Pihony

Justin Pihony

If you liked it share and comment!