JOYATRES

Big Data Hadoop Course Overview

The Big Data Hadoop certification training is designed to give you an in-depth knowledge of the Big Data framework using Hadoop and Spark. In this hands-on Hadoop course, you will execute real-life, industry-based projects using Integrated Lab.

Skills Covered

Realtime data processing

Functional programming

Spark applications

Parallel processing

Spark RDD optimization techniques

Spark SQL

Big Data Hadoop Course Curriculum

Elgibility

Big Data Hadoop certification training online course is best suited for IT, Data Management, and Analytics professionals looking to gain expertise in Big Data Hadoop, including Software Developers and Architects, Senior IT professionals, Testing and Mainframe professionals, Business Intelligence professionals, Project Managers, Aspiring Data Scientists, Graduates looking to begin a career in Big Data Analytics.

Lesson 1 Course Introduction

1.1 Course Introduction
1.2 Accessing Practice Lab

Lesson 2 Introduction to Big Data and Hadoop

1.1 Introduction to Big Data and Hadoop
1.2 Introduction to Big Data
1.3 Big Data Analytics
1.4 What is Big Data
1.5 Four Vs Of Big Data
1.6 Case Study: Royal Bank of Scotland
1.7 Challenges of Traditional System
1.8 Distributed Systems
1.9 Introduction to Hadoop
1.10 Components of Hadoop Ecosystem: Part One
1.11 Components of Hadoop Ecosystem: Part Two
1.12 Components of Hadoop Ecosystem: Part Three
1.13 Commercial Hadoop Distributions
1.14 Demo: Walkthrough of Simplilearn Cloudlab
1.15 Key Takeaways
Knowledge Check

Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN

2.1 Hadoop Architecture Distributed Storage (HDFS) and YARN
2.2 What Is HDFS
2.3 Need for HDFS
2.4 Regular File System vs HDFS
2.5 Characteristics of HDFS
2.6 HDFS Architecture and Components
2.7 High Availability Cluster Implementations
2.8 HDFS Component File System Namespace
2.9 Data Block Split
2.10 Data Replication Topology
2.11 HDFS Command Line
2.12 Demo: Common HDFS Commands
HDFS Command Line
2.13 YARN Introduction
2.14 YARN Use Case
2.15 YARN and Its Architecture
2.16 Resource Manager
2.17 How Resource Manager Operates
2.18 Application Master
2.19 How YARN Runs an Application
2.20 Tools for YARN Developers
2.21 Demo: Walkthrough of Cluster Part One
2.22 Demo: Walkthrough of Cluster Part Two
2.23 Key Takeaways
Knowledge Check
Hadoop Architecture,Distributed Storage (HDFS) and YARN

Lesson 4 Data Ingestion into Big Data Systems and ETL

3.1 Data Ingestion into Big Data Systems and ETL
3.2 Data Ingestion Overview Part One
3.3 Data Ingestion Overview Part Two
3.4 Apache Sqoop
3.5 Sqoop and Its Uses
3.6 Sqoop Processing
3.7 Sqoop Import Process
Assisted Practice: Import into Sqoop
3.8 Sqoop Connectors
3.9 Demo: Importing and Exporting Data from MySQL to HDFS
Apache Sqoop
3.9 Apache Flume
3.10 Flume Model
01:56
3.11 Scalability in Flume
3.12 Components in Flume’s Architecture
3.13 Configuring Flume Components
3.15 Demo: Ingest Twitter Data
3.14 Apache Kafka
3.15 Aggregating User Activity Using Kafka
3.16 Kafka Data Model
3.17 Partitions
3.18 Apache Kafka Architecture
3.21 Demo: Setup Kafka Cluster
3.19 Producer Side API Example
3.20 Consumer Side API
3.21 Consumer Side API Example
3.22 Kafka Connect
3.26 Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer
3.23 Key Takeaways
Knowledge Check
Data Ingestion into Big Data Systems and ETL

Lesson 5 Distributed Processing - MapReduce Framework and Pig

4.1 Distributed Processing MapReduce Framework and Pig
4.2 Distributed Processing in MapReduce
4.3 Word Count Example
4.4 Map Execution Phases
4.5 Map Execution Distributed Two Node Environment
4.6 MapReduce Jobs
4.7 Hadoop MapReduce Job Work Interaction
4.8 Setting Up the Environment for MapReduce Development
4.9 Set of Classes
4.10 Creating a New Project
4.11 Advanced MapReduce
4.12 Data Types in Hadoop
4.13 OutputFormats in MapReduce
4.14 Using Distributed Cache
4.15 Joins in MapReduce
4.16 Replicated Join
4.17 Introduction to Pig
4.18 Components of Pig
4.19 Pig Data Model
4.20 Pig Interactive Modes
4.21 Pig Operations
4.22 Various Relations Performed by Developers
4.23 Demo: Analyzing Web Log Data Using MapReduce
4.24 Demo: Analyzing Sales Data and Solving KPIs using PIG
Apache Pig
4.25 Demo: Wordcount
4.23 Key takeaways
Knowledge Check
Distributed Processing - MapReduce Framework and Pig

Lesson 6 Apache Hive

5.1 Apache Hive
5.2 Hive SQL over Hadoop MapReduce
5.3 Hive Architecture
5.4 Interfaces to Run Hive Queries
5.5 Running Beeline from Command Line
5.6 Hive Metastore
5.7 Hive DDL and DML
5.8 Creating New Table
5.9 Data Types
5.10 Validation of Data
5.11 File Format Types
5.12 Data Serialization
5.13 Hive Table and Avro Schema
5.14 Hive Optimization Partitioning Bucketing and Sampling
5.15 Non Partitioned Table
5.16 Data Insertion
5.17 Dynamic Partitioning in Hive
5.18 Bucketing
5.19 What Do Buckets Do
5.20 Hive Analytics UDF and UDAF
Assisted Practice: Synchronization
5.21 Other Functions of Hive
5.22 Demo: Real-Time Analysis and Data Filteration
5.23 Demo: Real-World Problem
5.24 Demo: Data Representation and Import using Hive
5.25 Key Takeaways
Knowledge Check
Apache Hive

Lesson 7 NoSQL Databases - HBase

Lesson 8 Basics of Functional Programming and Scala

7.1 Basics of Functional Programming and Scala
7.2 Introduction to Scala
7.3 Demo: Scala Installation
7.3 Functional Programming
7.4 Programming with Scala
Demo: Basic Literals and Arithmetic Operators
Demo: Logical Operators
7.5 Type Inference Classes Objects and Functions in Scala
Demo: Type Inference Functions Anonymous Function and Class
7.6 Collections
7.7 Types of Collections
Demo: Five Types of Collections
Demo: Operations on List
7.8 Scala REPL
Assisted Practice: Scala REPL
Demo: Features of Scala REPL
7.9 Key Takeaways
Knowledge Check
Basics of Functional Programming and Scala

Lesson 9 Apache Spark Next Generation Big Data Framework

8.1 Apache Spark Next Generation Big Data Framework
8.2 History of Spark
8.3 Limitations of MapReduce in Hadoop
8.4 Introduction to Apache Spark
8.5 Components of Spark
8.6 Application of In-Memory Processing
8.7 Hadoop Ecosystem vs Spark
8.8 Advantages of Spark
8.9 Spark Architecture
8.10 Spark Cluster in Real World
8.11 Demo: Running a Scala Programs in Spark Shell
8.12 Demo: Setting Up Execution Environment in IDE
8.13 Demo: Spark Web UI
8.11 Key Takeaways
Knowledge Check
Apache Spark Next Generation Big Data Framework

Lesson 10 Spark Core Processing RDD

9.1 Processing RDD
9.1 Introduction to Spark RDD
9.2 RDD in Spark
9.3 Creating Spark RDD
9.4 Pair RDD
9.5 RDD Operations
9.6 Demo: Spark Transformation Detailed Exploration Using Scala Examples
9.7 Demo: Spark Action Detailed Exploration Using Scala
9.8 Caching and Persistence
9.9 Storage Levels
9.10 Lineage and DAG
9.11 Need for DAG
9.12 Debugging in Spark
9.13 Partitioning in Spark
9.14 Scheduling in Spark
9.15 Shuffling in Spark
9.16 Sort Shuffle
9.17 Aggregating Data with Pair RDD
9.18 Demo: Spark Application with Data Written Back to HDFS and Spark UI
9.19 Demo: Changing Spark Application Parameters
9.20 Demo: Handling Different File Formats
9.21 Demo: Spark RDD with Real-World Application
9.22 Demo: Optimizing Spark Jobs
Assisted Practice: Changing Spark Application Params
9.23 Key Takeaways
Knowledge Check
Spark Core Processing RDD

Lesson 11 Spark SQL - Processing DataFrames

10.1 Spark SQL Processing DataFrames
10.2 Spark SQL Introduction
10.3 Spark SQL Architecture
10.4 DataFrames
10.5 Demo: Handling Various Data Formats
10.6 Demo: Implement Various DataFrame Operations
10.7 Demo: UDF and UDAF
10.8 Interoperating with RDDs
10.9 Demo: Process DataFrame Using SQL Query
10.10 RDD vs DataFrame vs Dataset
Processing DataFrames
10.11 Key Takeaways
Knowledge Check
Spark SQL - Processing DataFrames

Lesson 12 Spark MLLib - Modelling BigData with Spark

11.1 Spark MLlib Modeling Big Data with Spark
11.2 Role of Data Scientist and Data Analyst in Big Data
11.3 Analytics in Spark
11.4 Machine Learning
11.5 Supervised Learning
11.6 Demo: Classification of Linear SVM
11.7 Demo: Linear Regression with Real World Case Studies
11.8 Unsupervised Learning
11.9 Demo: Unsupervised Clustering K-Means
Assisted Practice: Unsupervised Clustering K-means
11.10 Reinforcement Learning
11.11 Semi-Supervised Learning
11.12 Overview of MLlib
11.13 MLlib Pipelines
11.14 Key Takeaways
Knowledge Check
Spark MLLib - Modeling BigData with Spark

Lesson 13 Stream Processing Frameworks and Spark Streaming

12.1 Stream Processing Frameworks and Spark Streaming
12.1 Streaming Overview
12.2 Real-Time Processing of Big Data
12.3 Data Processing Architectures
12.4 Demo: Real-Time Data Processing
12.5 Spark Streaming
12.6 Demo: Writing Spark Streaming Application
12.7 Introduction to DStreams
12.8 Transformations on DStreams
12.9 Design Patterns for Using ForeachRDD
12.10 State Operations
12.11 Windowing Operations
12.12 Join Operations stream-dataset Join
12.13 Demo: Windowing of Real-Time Data Processing
12.14 Streaming Sources
12.15 Demo: Processing Twitter Streaming Data
12.16 Structured Spark Streaming
12.17 Use Case Banking Transactions
12.18 Structured Streaming Architecture Model and Its Components
12.19 Output Sinks
12.20 Structured Streaming APIs
12.21 Constructing Columns in Structured Streaming
12.22 Windowed Operations on Event-Time
12.23 Use Cases
12.24 Demo: Streaming Pipeline
Spark Streaming
12.25 Key Takeaways
Knowledge Check
Stream Processing Frameworks and Spark Streaming

Lesson 14 Spark GraphX

Practice Projects

Car Insurance Analysis
Transactional Data Analysis
K-Means clustering for telecommunication domain

Big Data Hadoop Certification Training Course

Big Data Hadoop Course Overview

Skills Covered

Big Data Hadoop Course Curriculum

Elgibility

Lesson 1 Course Introduction

Lesson 2 Introduction to Big Data and Hadoop

1.1 Introduction to Big Data and Hadoop

1.2 Introduction to Big Data

1.3 Big Data Analytics

1.4 What is Big Data

1.5 Four Vs Of Big Data

1.6 Case Study: Royal Bank of Scotland

1.7 Challenges of Traditional System

1.8 Distributed Systems

1.9 Introduction to Hadoop

1.10 Components of Hadoop Ecosystem: Part One

1.11 Components of Hadoop Ecosystem: Part Two

1.12 Components of Hadoop Ecosystem: Part Three

1.13 Commercial Hadoop Distributions

1.14 Demo: Walkthrough of Simplilearn Cloudlab

1.15 Key Takeaways

Knowledge Check

Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN

2.1 Hadoop Architecture Distributed Storage (HDFS) and YARN

2.2 What Is HDFS

2.3 Need for HDFS

2.4 Regular File System vs HDFS

2.5 Characteristics of HDFS

2.6 HDFS Architecture and Components

2.7 High Availability Cluster Implementations

2.8 HDFS Component File System Namespace

2.9 Data Block Split

2.10 Data Replication Topology

2.11 HDFS Command Line

2.12 Demo: Common HDFS Commands

HDFS Command Line

2.13 YARN Introduction

2.14 YARN Use Case

2.15 YARN and Its Architecture

2.16 Resource Manager

2.17 How Resource Manager Operates

2.18 Application Master

2.19 How YARN Runs an Application

2.20 Tools for YARN Developers

2.21 Demo: Walkthrough of Cluster Part One

2.22 Demo: Walkthrough of Cluster Part Two

2.23 Key Takeaways

Knowledge Check

Hadoop Architecture,Distributed Storage (HDFS) and YARN

Lesson 4 Data Ingestion into Big Data Systems and ETL

3.1 Data Ingestion into Big Data Systems and ETL

3.2 Data Ingestion Overview Part One

3.3 Data Ingestion Overview Part Two

3.4 Apache Sqoop

3.5 Sqoop and Its Uses

3.6 Sqoop Processing

3.7 Sqoop Import Process

Assisted Practice: Import into Sqoop

3.8 Sqoop Connectors

3.9 Demo: Importing and Exporting Data from MySQL to HDFS

Apache Sqoop

3.9 Apache Flume

3.10 Flume Model

3.11 Scalability in Flume

3.12 Components in Flume’s Architecture

3.13 Configuring Flume Components

3.15 Demo: Ingest Twitter Data

3.14 Apache Kafka

3.15 Aggregating User Activity Using Kafka

3.16 Kafka Data Model

3.17 Partitions

3.18 Apache Kafka Architecture

3.21 Demo: Setup Kafka Cluster

3.19 Producer Side API Example

3.20 Consumer Side API

3.21 Consumer Side API Example

3.22 Kafka Connect

3.26 Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer

3.23 Key Takeaways