Spark Developer In Real World - Big Data In Real World

OUR GUARANTEE

Demystify Spark

Spark is more of a mystery even for the ones who are working in Spark. This is because most don't understand how Spark works, how it achieves the efficiency with job execution, how Spark interact with other sources. This is exactly what scares beginners in Spark to get in to Spark as well. Don't worry. We have got your back. We will untie the tangled parts of Spark and demystify Spark in an easy and simple way in which you could understand. You are in good hands with us.

Look under the hood

We are well known for helping our students see what is under the hood behind a technology. The reason we do this is because when you understand what happens behind the scenes, you can be better prepared for the problems that the technology or tool may throw at you and even better; you can design better and efficient solutions knowing how things work internally. For eg. our explanation of how Shuffle works in Spark is something you will never find elsewhere.

Streaming & Machine Learning

We go beyond RDD, DataFrame and Dataset. Spark is much more than RDDs and Spak SQL. Spark is a data analytics platform. So we go beyond the RDDs and talk about all the important and interesting modules in Spark like Spark streaming and Machine Learning.

Become Real World Ready

Our number one goal in all our courses is to make you production and real world ready. We have done just that in all our courses and Spark Developer In Real World is no exception. We talk about internals, troubleshooting, optimizations, issues you might expect in production. We have designed this course to make sure it gives you the confidence you need to get the dream job you wanted and succeed from day one once you land on the job.

Spark and more..

Spark is an interesting tool but real world problems and use cases are solved not just with Spark. Spark is usually used in conjunction with other tools in the big data ecosystem. So to give you a taste of how real world looks like, we have included projects that include Spark along with other tools in the ecosystem like Kafka, HBase and Elasticsearch.

Interesting Projects

We love practical over theory and we include lot of interesting projects and use interesting datasets to demonstrate the concepts. This course is no exception to our principle. We use dataset from Stackoverflow along with Elasticsearch, predict results with 2016 US presidential election data, machine learning with Yelp dataset just to name a few.

Practice in our cluster for Free

Practicing Hadoop or Spark with a packaged sandbox VM in your Laptop is like learning to play a guitar with out a guitar. To learn Hadoop right, you need access to a multi-node environment. You will get free access to our multi node cluster along with this course.

30 Day Money Back Guarantee

Don't like the course for any reason. No Worries. Let us know with in 30 days and we will do a 100% refund. No questions asked.

Excellent & Caring Support

Our students satisfaction is of utmost importance and every thing else is secondary. We are here for you, every step of the way and you can count on us.

ENROLL IN COURSE | $199

TECHNICAL HIGHLIGHTS

TOPICS

Shuffle in Depth
Your code to Spark tasks
Spark with other Sources and Formats
Catalyst Optimizer and Tungsten
Resource Management
Cluster Setup
Optimizations & Troubleshooting Tips
Spark Streaming
Spark Machine Learning
Spark with Kafka, Elasticsearch, HBase

PROJECTS

Page Ranking pages from Wikipedia DataFrames | RDD
Analyzing Trending YouTube videos (CSV & JSON) | Datasources & Formats
Steaming with activity data from IoT devices | Spark Streaming
Streaming data from Meetup.com with Kafka | Spark Streaming
Predicting Country’s Happiness Rank from Happiness Score | Machine Learning
Predicting 2016 US Elections | Machine Learning
Predicting Yelp Rating (+ve / -ve) | Machine Learning
Build mini site with Stackoverfow data with Elasticsearch | End to End Project

COURSE CURRICULUM

ENROLL IN COURSE | $199

REVIEWS FROM SOMEONE LIKE YOU

CHECK OUT MORE UNEDITED REVIEWS

" I have taken numerous courses at Coursera and a couple on Hadoop from them, but I never had such a clear understanding of the subject. " - Monika

" Five stars and two thumbs up. Excellent course for both beginner and experience Hadoop developers. " - Edward

" The case studies are excellent and gets you prepared to face any Big Data problem. " - Sanjay

" I would recommend this to freshers as well as to people with some experience who are looking to get a good grasp of Hadoop concepts & associated tools. " - Prithvi

" It has been an outstanding resource, aiding me in both my academic and professional pursuits. It is practical, insightful, easy to understand, and interesting. I cannot say enough good things about it. My only regret is that I did not find this course earlier. " - Sean

" This is what needs to become a Hadoop developer. Things told from interviewer point of view is bonus. " - Shashank

ENROLL IN COURSE | $199

FAQ

1Is this course right for me?

This course is great for someone who is trying to either launch a career in Big Data or already working in Hadoop or other related tools and would like to move in to Spark. We have designed the course in a way it will give you confidence to attend interviews and give you the skills to work in real world production environment from day one.

2What skills do I need to start with the course?

Basic Linux knowledge. Simple commands to change directories, open/close files etc. Basic SQL knowledge. Simple selects, inserts & simple join statements. Basic Java or Python knowledge won't hurt because we write and walk over the programs in the projects that are covered in the course. But don't get intimated if you are not a programmer. We totally undestand that some students are not programmers and we will walk over all the code step by step so it will be super easy to follow. You will be in good hands. We make sure you are not lost.

3I am looking to learn a specific tool. How do I know whether that tool is covered?

We have detailed up-to-date curriculum explaining every topic that is covered in the course. Please check the curriculum below to find out whether the tool you are looking for is in the curriculum.

4I am still not sure whether this course is good for me..

No worries. We totally understand. Let's us know your expectations by emailing us - info@hadoopinrealworld.com and we will give our HONEST opinion whether this course will be a good fit for you or not.

5What if I have questions while I take the course?

You can ask us questions anytime by posting your questions or comments below the video in each lesson and we will answer promptly.

6Do I get access to a Spark cluster?

Yes. You get access to a 3 node Spark cluster for free hosted in AWS.

7I don't see a topic. Will it be added?

Big Data ecosystem is evolving fast. We update our courses frequently. So all our courses are living courses. You can check out our release schedule @ https://www.bigdatainrealworld.com/upcoming-releases/

8Do I get lifetime access?

Yes. Absolutely. You get lifetime access to the course, all the future updates to the course and lifetime access to the cluster.

CURRICULUM

Chapter 1: Let's Get Started

Thank you and Welcome | 11:35
Tools and Setup | 8:30

Chapter 2: Introduction To Spark

Hadoop vs. Spark - Who Wins | 15:30
Challenges Spark Tries To Address | 12:24
How Spark Is Faster Than Hadoop | 8:39

Chapter 3: RDD - Core Of Spark

The Need For RDD | 11:29
What Is RDD | 12:30
What An RDD Is Not | 7:31

Chapter 4: Execution In Spark (Behind the scenes)

First Program In Spark | 16:04
What are Dependencies and Why They are Important | 11:11
Program to Execution | Part 1
Program to Execution | Part 2
Caching Data In Spark | 15:04
Fault Tolerance | 7:34

Chapter 5: Shuffle in Spark

Need for Shuffle | 10:45
Hash Shuffle Manager - Part 1 | 11:44
Hash Shuffle Manager - Part 2 | 14:29
Sort Shuffle Manager | 8:15

Chapter 6: Spark Transformations

reduceByKey vs groupByKey | 9:34
Cogroup, Join and Avoiding Shuffle - Part 1 | 14:19
Cogroup, Join and Avoiding Shuffle - Part 2 | 8:23
Resizing Partitions | 7:46

Chapter 7: PageRanking with RDDs

PageRanking Algorithm
PageRank Walk-through
Implementing PageRank with RDDs

Chapter 8: Beyond RDDs

What's the Problem with RDDs | 11:53

DataFrame vs DataSet vs SQL | 12:25

Simple Selects | 8:26

Filtering DataFrames | 2:24

Aggregating DataFrames | 5:19

Joining DataFrames | 8:20

PageRanking with DataFrames | 16:39

Chapter 9: Spark with Other Datasources & File Formats

Spark & Hive | 8:26

Spark & Hive with XML, Parquet & ORC | 14:23

Spark & RDBMS | 8:49

Spark & HBase | Part - 1
| 18:47
Spark & HBase | Part - 2
| 9:03

Chapter 10: Spark Optimizations

Number of Tasks

Join Algorithms

Picking a Join Algorithm

Join Hints

Chapter 11: Spark - Under the Hood

Inside the Catalyst Optimizer | 12:05

Catalyst Optimizer - Plan Walkthrough | 6:27

Project Tungsten - Better Memory Management | 13:09

Project Tungsten - CPU Cache Aware Optimizations | 11:05

Chapter 12: Resource Management

Spark Architecture
Memory Layout In Executor
Resource Management - Standalone
Resource Management - YARN
Dynamic Resource Allocation | 7:47

Chapter 13: Cluster Installation

Spark Installation | 5:28

Hadoop Cluster Setup | Part 1 | 23:43

Hadoop Cluster Setup | Part 2 | 25:35

Hadoop Cluster Setup | Part 3 | 18:01

Chapter 14: An end to end project (Spark, Elasticsearch, Kibana, REST and Angular)

Start End to End Project Introduction | 8:09

Start Elasticsearch (A quick introduction) | 8:18

Start Hands-on with Elasticsearch | 10:45

Start Stackoverflow Dataset | 8:58

Start Spark ETL | 12:53

Start Visualizations with Kibana | 8:44

Start REST Service with Spring framework | 19:29

Start Building an Angular application | 12:28

Chapter 15: Introduction to Kafka

Kafka - The Why and the What | 8:43

Key Concepts | 12:32

Experiments with Kafka | 19:18

Chapter 16: Machine Learning

Introduction to Machine Learning | 11:38

Machine Learning Blueprint | 5:49

Feature Engineering | 10:39

Linear Regression | 8:17

World Happiness Project
Decision Trees | 9:55

Random Forest | 3:14

Predicting 2016 US Elections | 11:46

Predicting Yelp Ratings | +ve or -ve
| 15:55

Chapter 17: Streaming with Spark

Why Streaming and How Spark Does Streaming | 11:51

Core Concepts in Streaming | 8:36

Output Modes With Non Aggregate Queries | 13:40

Output Modes With Aggregate Queries | 8:50

Event Time, Window and Late Events | 10:39

Handling Late Events In Streaming | 10:47

Late Events and Append Mode | 8:05

Streaming Meetup with Spark | Part 1
| 5:31
Streaming Meetup with Spark | Part 2
| 8:53

Chapter 18: A Short Chapter On Scala

Introduction to Scala | 12:05

First Program in Scala | not HelloWorld
| 11:45
Scala Functions | 11:43

ENROLL IN COURSE | $199