OUR GUARANTEE

TECHNICAL HIGHLIGHTS

Coverage Topics

TOPICS

  • Shuffle in Depth
  • Your code to Spark tasks
  • Spark with other Sources and Formats
  • Catalyst Optimizer and Tungsten
  • Resource Management
  • Cluster Setup
  • Optimizations & Troubleshooting Tips
  • Spark Streaming
  • Spark Machine Learning
  • Spark with Kafka, Elasticsearch, HBase
Spark Developer In Real World

PROJECTS

  • Page Ranking pages from Wikipedia DataFrames | RDD
  • Analyzing Trending YouTube videos (CSV & JSON) | Datasources & Formats
  • Steaming with activity data from IoT devices | Spark Streaming
  • Streaming data from Meetup.com with Kafka | Spark Streaming
  • Predicting Country’s Happiness Rank from Happiness Score | Machine Learning
  • Predicting 2016 US Elections | Machine Learning
  • Predicting Yelp Rating (+ve / -ve) | Machine Learning
  • Build mini site with Stackoverfow data with Elasticsearch | End to End Project

REVIEWS FROM SOMEONE LIKE YOU

FAQ

1Is this course right for me?
This course is great for someone who is trying to either launch a career in Big Data or already working in Hadoop or other related tools and would like to move in to Spark. We have designed the course in a way it will give you confidence to attend interviews and give you the skills to work in real world production environment from day one.
2What skills do I need to start with the course?
Basic Linux knowledge. Simple commands to change directories, open/close files etc. Basic SQL knowledge. Simple selects, inserts & simple join statements. Basic Java or Python knowledge won't hurt because we write and walk over the programs in the projects that are covered in the course. But don't get intimated if you are not a programmer. We totally undestand that some students are not programmers and we will walk over all the code step by step so it will be super easy to follow. You will be in good hands. We make sure you are not lost.
3I am looking to learn a specific tool. How do I know whether that tool is covered?
We have detailed up-to-date curriculum explaining every topic that is covered in the course. Please check the curriculum below to find out whether the tool you are looking for is in the curriculum.
4I am still not sure whether this course is good for me..
No worries. We totally understand. Let's us know your expectations by emailing us - info@hadoopinrealworld.com and we will give our HONEST opinion whether this course will be a good fit for you or not.
5What if I have questions while I take the course?
You can ask us questions anytime by posting your questions or comments below the video in each lesson and we will answer promptly.
6Do I get access to a Spark cluster?
Yes. You get access to a 3 node Spark cluster for free hosted in AWS.
7I don't see a topic. Will it be added?
Big Data ecosystem is evolving fast. We update our courses frequently. So all our courses are living courses. You can check out our release schedule @ https://www.bigdatainrealworld.com/upcoming-releases/
8Do I get lifetime access?
Yes. Absolutely. You get lifetime access to the course, all the future updates to the course and lifetime access to the cluster.

CURRICULUM

Chapter 1: Let's Get Started
  • Thank you and Welcome | 11:35
  • Tools and Setup | 8:30
Chapter 2: Introduction To Spark
  • Hadoop vs. Spark - Who Wins | 15:30
  • Challenges Spark Tries To Address | 12:24
  • How Spark Is Faster Than Hadoop | 8:39
Chapter 3: RDD - Core Of Spark
  • The Need For RDD | 11:29
  • What Is RDD | 12:30
  • What An RDD Is Not | 7:31
Chapter 4: Execution In Spark (Behind the scenes)
  • First Program In Spark | 16:04
  • What are Dependencies and Why They are Important | 11:11
  • Program to Execution | Part 1
  • | 13:01
  • Program to Execution | Part 2
  • | 19:10
  • Caching Data In Spark | 15:04
  • Fault Tolerance | 7:34
Chapter 5: Shuffle in Spark
  • Need for Shuffle | 10:45
  • Hash Shuffle Manager - Part 1 | 11:44
  • Hash Shuffle Manager - Part 2 | 14:29
  • Sort Shuffle Manager | 8:15
Chapter 6: Spark Transformations
  • reduceByKey vs groupByKey | 9:34
  • Cogroup, Join and Avoiding Shuffle - Part 1 | 14:19
  • Cogroup, Join and Avoiding Shuffle - Part 2 | 8:23
  • Resizing Partitions | 7:46
Chapter 7: PageRanking with RDDs
  • PageRanking Algorithm
  • PageRank Walk-through
  • Implementing PageRank with RDDs
Chapter 8: Beyond RDDs
  • What's the Problem with RDDs | 11:53
  • DataFrame vs DataSet vs SQL | 12:25
  • Simple Selects | 8:26
  • Filtering DataFrames | 2:24
  • Aggregating DataFrames | 5:19
  • Joining DataFrames | 8:20
  • PageRanking with DataFrames | 16:39
Chapter 9: Spark with Other Datasources & File Formats
  • Spark & Hive | 8:26
  • Spark & Hive with XML, Parquet & ORC | 14:23
  • Spark & RDBMS | 8:49
  • Spark & HBase | Part - 1
  • | 18:47
  • Spark & HBase | Part - 2
  • | 9:03
Chapter 10: Spark Optimizations
  • Number of Tasks
  • Join Algorithms
  • Picking a Join Algorithm
  • Join Hints
Chapter 11: Spark - Under the Hood
  • Inside the Catalyst Optimizer | 12:05
  • Catalyst Optimizer - Plan Walkthrough | 6:27
  • Project Tungsten - Better Memory Management | 13:09
  • Project Tungsten - CPU Cache Aware Optimizations | 11:05
Chapter 12: Resource Management
  • Spark Architecture
  • Memory Layout In Executor
  • Resource Management - Standalone
  • Resource Management - YARN
  • Dynamic Resource Allocation | 7:47
Chapter 13: Cluster Installation
  • Spark Installation | 5:28
  • Hadoop Cluster Setup | Part 1 | 23:43
  • Hadoop Cluster Setup | Part 2 | 25:35
  • Hadoop Cluster Setup | Part 3 | 18:01
Chapter 14: An end to end project (Spark, Elasticsearch, Kibana, REST and Angular)
  • Start End to End Project Introduction | 8:09
  • Start Elasticsearch (A quick introduction) | 8:18
  • Start Hands-on with Elasticsearch | 10:45
  • Start Stackoverflow Dataset | 8:58
  • Start Spark ETL | 12:53
  • Start Visualizations with Kibana | 8:44
  • Start REST Service with Spring framework | 19:29
  • Start Building an Angular application | 12:28
Chapter 15: Introduction to Kafka
  • Kafka - The Why and the What | 8:43
  • Key Concepts | 12:32
  • Experiments with Kafka | 19:18
Chapter 16: Machine Learning
  • Introduction to Machine Learning | 11:38
  • Machine Learning Blueprint | 5:49
  • Feature Engineering | 10:39
  • Linear Regression | 8:17
  • World Happiness Project
  • Decision Trees | 9:55
  • Random Forest | 3:14
  • Predicting 2016 US Elections | 11:46
  • Predicting Yelp Ratings | +ve or -ve
  • | 15:55
Chapter 17: Streaming with Spark
  • Why Streaming and How Spark Does Streaming | 11:51
  • Core Concepts in Streaming | 8:36
  • Output Modes With Non Aggregate Queries | 13:40
  • Output Modes With Aggregate Queries | 8:50
  • Event Time, Window and Late Events | 10:39
  • Handling Late Events In Streaming | 10:47
  • Late Events and Append Mode | 8:05
  • Streaming Meetup with Spark | Part 1
  • | 5:31
  • Streaming Meetup with Spark | Part 2
  • | 8:53
Chapter 18: A Short Chapter On Scala
  • Introduction to Scala | 12:05
  • First Program in Scala | not HelloWorld
  • | 11:45
  • Scala Functions | 11:43