JOYATRES

Hadoop Course Overview

Highlights of Hadoop Online training:-

* Very in depth course material with Real Time Scenarios for each topic with its Solutions for Hadoop Online Trainings.

* We Also provide Case studies for Hadoop Online Training.

* We do Schedule the sessions based upon your comfort by our Highly Qualified Trainers and Real time Experts.

* We provide you with your recorded session for further Reference.

* We also provide Normal Track, Fast Track and Weekend Batches also for Hadoop Online Training.

* We also provide Cost Effective and Flexible Payment Schemes.

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Hadoop Architecture

At its core, Hadoop has two major layers namely:

Processing/Computation layer (MapReduce), and

Storage layer (Hadoop Distributed File System).

MapReduce

MapReduce is a parallel programming model for writing distributed applications devised at Google for efficient processing of large amounts of data (multi-terabyte data-sets), on large clusters (thousands of nodes) of commodity

hardware in a reliable, fault-tolerant manner. The MapReduce program runs on Hadoop which is an Apache open-source framework.

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a distributed file system that is designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. It is highly fault-tolerant and is designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications having large datasets.

Apart from the above-mentioned two core components, Hadoop framework also includes the following two modules:

Hadoop Common: These are Java libraries and utilities required by other Hadoop modules.

Hadoop YARN: This is a framework for job scheduling and cluster resource.

Advantages of Hadoop

Hadoop framework allows the user to quickly write and test distributed systems. It is efficient, and it automatic distributes the data and work across the machines and in turn, utilizes the underlying parallelism of the CPU cores.

Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA), rather Hadoop library itself has been designed to detect and handle failures at the application layer.

Servers can be added or removed from the cluster dynamically and Hadoop continues to operate without interruption.

Another big advantage of Hadoop is that apart from being open source, it is compatible on all the platforms since it is Java based.

What you will learn in this Big Data Hadoop training Course?

Master fundamentals of Hadoop and YARN and write applications using them

Setting up Pseudo node and Multi node cluster on Amazon EC2

Master HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Zookeeper, HBase

Learn Spark, Spark SQL, Streaming, DataFrame, RDD, Graphx, MLlib writing Spark applications

Master Hadoop administration activities like cluster managing,monitoring,administration and troubleshooting

Practice real-life projects using Hadoop and Apache Spark

Be equipped to clear Big Data Hadoop Certification.

Who should take this Big Data Hadoop Online Training Course?

Programming Developers and System Administrators

Experienced working professionals , Project managers

Big DataHadoop Developers eager to learn other verticals like Testing, Analytics, Administration

Graduates, undergraduates eager to learn Big Data can take this Big Data Hadoop Certification online training

What are the prerequisites for learning Hadoop?

There is no pre-requisite to take this Big data training and to master Hadoop. But basics of UNIX, SQL and java would be good to learn big data hadoop.

Scheduling Demo With Trainer:

If you would like to take the online demo for Hadoop trainer can you please make an inquiry or fill the form for demo registration, one of our executives will arrange a meeting with the expert trainer.

Course Finished Certificate :

After finish, the course we provide Hadoop course finished certificate of kits technologies looks like

Hadoop Course Curriculum

LE – 1 BIG DATA, HADOOP, INTRODUCTION TO HADOOP ARCHITECTURE AND HDFS

Why did Big Data suddenly become so prominent?

Limitations of traditional large scale systems

Compare Hadoop architecture with traditional architecture

Core components of Hadoop

Understanding Hadoop Master-Slave Architecture

Understanding HDFS Architecture

Learn about NameNode, DataNode, Secondary Node

Learn about JobTracker, TaskTracker

Anatomy of Read and Write data on HDFS

Hadoop deployment Modes – Standalone, Single node, multinode

Configuration files in a Hadoop Cluster

Important Web URL’s for Hadoop

Run HDFS and Linux commands

Manuals for installation of Hadoop 1.0 & Hadoop2.0

Manual for Demo VM installation steps for Windows

MODULE -2 HADOOP 2.0, YARN, MRV2

Hadoo 1.0 Limitations MapReduceLimitations(Mrv1 vs Mrv2)

History of Hadoop 2.0

HDFS 2: Architecture

HDFS 2: HighAvailability

HDFS 2: Federation

YARN Architecture Classic vs YARN

Setting up cluster

MODULE – 3 UNDERSTANDING HADOOP MAPREDUCE

Overview of the MapReduce Framework

Use cases of MapReduce

MapReduce Architecture

Understand the concept of Mappers, Reducers

Anatomy of MapReduce Program

MapReduce Components – Mapper Class, Reducer Class, Driver code

Splits and Blocks Understand Combiner Understanding

Input/Output Format

MapReduce API and Hadoop Data Types

Using Writable and Writable comparable

Concept of Partitioner,Map Side Join,Distributed Join,Distributed Cache, Reduce Side Join.

MODULE-4 UNDERSTANDING Apache Sqoop

Sqoop – How Sqoop works·

Import/Export Data

Sqoop Architecture

Flume – How it works

MODULE- 5 UNDERSTANDING Apache Oozie

How Oozie works·

Oozie workflow·

Making workflow.xml, job.properties and running workflow

MODULE – 6 APACHE HIVE -HIVEQL what is Hive

Hive DDL – Create/Show/Drop Database

Hive DDL – Create/Show/Drop Tables· Hive DML – Load Files into Tables· Hive DML – Inserting Data into Tables

Hive SQL – Select, Filter, Join, Group By

Hive Architecture· & Components Hive Data Model and Data Units

Difference between Hive and RDBMS· Multi-Table Inserts

Joins

Grouping Sets, Cubes, Rollups

Hive SerDeHive UDF Hive UDAF

MODULE – 7 APACHE PIG

PIG vs. MapReduce

PIG components

PIG execution

PIG Data types

PIG Architecture

PIG Latin Relational Operators

PIG Latin Join and CoGroup

PIG Latin Group and Union

Describe, Explain, Illustrate

PIG Latin: File Loaders

Permission levels

To Create a Custom Permission level

To bind Users/Groups and Permission Level

Managing Permissions in Sub site

Allow Users to create their own site

To Set Site Confirmation and Deletion of unused sites

Permissions for Lists / Libraries / List Items

MODULE -8 APACHE HBASE & NOSQL Databases

Introduction to NoSQL

RDBMS vs NoSQL

Analytical (OLAP)

When/Why to use HBase

HBase Architecture/Storage HBase Features

HBase Data Model HBase Families

HBase Master

HBase vs RDBMS

Column Families

Access HBase Data HBase API

Runtime modes

Running HBase

Introduction to Hadoop

High Availability

Scaling

Advantages and Challenges

Introduction to Big Data

What is Big data

Big Data opportunities

Big Data Challenges

Characteristics of Big data

Introduction to Hadoop

Hadoop Distributed File System

Comparing Hadoop & SQL.

Industries using Hadoop.

Data Locality.

Hadoop Architecture.

Map Reduce & HDFS.

Using the Hadoop single node image (Clone).

The Hadoop Distributed File System (HDFS)

HDFS Design & Concepts

Blocks, Name nodes and Data nodes

HDFS High-Availability and HDFS Federation.

Hadoop DFS The Command-Line Interface

Basic File System Operations

Anatomy of File Read

Anatomy of File Write

Block Placement Policy and Modes

More detailed explanation about Configuration files.

Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.

How to add New Data Node dynamically.

How to decommission a Data Node dynamically (Without stopping cluster).

FSCK Utility. (Block report).

How to override default configuration at system level and Programming level.

HDFS Federation.

ZOOKEEPER Leader Election Algorithm.

Exercise and small use case on HDFS.

Map Reduce

Functional Programming Basics.

Map and Reduce Basics

How Map Reduce Works

Anatomy of a Map Reduce Job Run

Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates

Job Completion, Failures

Shuffling and Sorting

Splits, Record reader, Partition, Types of partitions & Combiner

Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots.

Types of Schedulers and Counters.

Comparisons between Old and New API at code and Architecture Level.

Getting the data from RDBMS into HDFS using Custom data types.

Distributed Cache and Hadoop Streaming (Python, Ruby and R).

YARN.

Sequential Files and Map Files.

Enabling Compression Codec’s.

Map side Join with distributed Cache.

Types of I/O Formats: Multiple outputs, NLINEinputformat.

Handling small files using CombineFileInputFormat.

Map/Reduce Programming – Java Programming

Hands on “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode.

Sorting files using Hadoop Configuration API discussion

Emulating “grep” for searching inside a file in Hadoop

DBInput Format

Job Dependency API discussion

Input Format API discussion

Input Split API discussion

Custom Data type creation in Hadoop.

NOSQL

ACID in RDBMS and BASE in NoSQL.

CAP Theorem and Types of Consistency.

Types of NoSQL Databases in detail.

Columnar Databases in Detail (HBASE and CASSANDRA).

TTL, Bloom Filters and Compensation.

HBase

HBase Installation

HBase concepts

HBase Data Model and Comparison between RDBMS and NOSQL.

Master & Region Servers.

HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture.

Catalog Tables.

Block Cache and sharding.

SPLITS.

DATA Modeling (Sequential, Salted, Promoted and Random Keys).

JAVA API’s and Rest Interface.

Client Side Buffering and Process 1 million records using Client side Buffering.

HBASE Counters.

Enabling Replication and HBASE RAW Scans.

HBASE Filters.

Bulk Loading and Coprocessors (Endpoints and Observers with programs).

Real world use case consisting of HDFS,MR and HBASE.

Hive

Installation

Introduction and Architecture.

Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)

Meta store

Hive QL

OLTP vs. OLAP

Working with Tables.

Primitive data types and complex data types.

Working with Partitions.

User Defined Functions

Hive Bucketed Tables and Sampling.

External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts

Dynamic Partition

Differences between ORDER BY, DISTRIBUTE BY and SORT BY.

Bucketing and Sorted Bucketing with Dynamic partition.

RC File.

INDEXES and VIEWS.

MAPSIDE JOINS.

Compression on hive tables and Migrating Hive tables.

Dynamic substation of Hive and Different ways of running Hive

How to enable Update in HIVE.

Log Analysis on Hive.

Access HBASE tables using Hive.

Hands on Exercises

Pig

Installation

Execution Types

Grunt Shell

Pig Latin

Data Processing

Schema on read

Primitive data types and complex data types.

Tuple schema, BAG Schema and MAP Schema.

Loading and Storing

Filtering

Grouping & Joining

Debugging commands (Illustrate and Explain).

Validations in PIG.

Type casting in PIG.

Working with Functions

User Defined Functions

Types of JOINS in pig and Replicated Join in detail.

SPLITS and Multiquery execution.

Error Handling, FLATTEN and ORDER BY.

Parameter Substitution.

Nested For Each.

User Defined Functions, Dynamic Invokers and Macros.

How to access HBASE using PIG.

How to Load and Write JSON DATA using PIG.

Piggy Bank.

Hands on Exercises

SQOOP

Installation

Import Data.(Full table, Only Subset, Target Directory, protecting Password, file format other than CSV,Compressing,Control Parallelism, All tables Import)

Incremental Import(Import only New data, Last Imported data, storing Password in Metastore, Sharing Metastore between Sqoop Clients)

Free Form Query Import

Export data to RDBMS,HIVE and HBASE

Hands on Exercises.

HCATALOG

Installation.

Introduction to HCATALOG.

About Hcatalog with PIG,HIVE and MR.

Hands on Exercises.

FLUME

Installation

Introduction to Flume

Flume Agents: Sources, Channels and Sinks

Log User information using Java program in to HDFS using LOG4J and Avro Source

Log User information using Java program in to HDFS using Tail Source

Log User information using Java program in to HBASE using LOG4J and Avro Source

Log User information using Java program in to HBASE using Tail Source

Flume Commands

Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some analysis using HIVE and PIG

More Ecosystems

HUE.(Hortonworks and Cloudera)

Oozie

Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles.

Workflow to show how to schedule Sqoop Job, Hive, MR and PIG.

Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour.

Zoo Keeper

HBASE Integration with HIVE and PIG.

Phoenix

Proof of concept (POC).

SPARK

Overview

Linking with Spark

Initializing Spark

Using the Shell

Resilient Distributed Datasets (RDDs)

Parallelized Collections

External Datasets

RDD Operations

Basics, Passing Functions to Spark

Working with Key-Value Pairs

Transformations

Actions

RDD Persistence

Which Storage Level to Choose?

Removing Data

Shared Variables

Broadcast Variables

Accumulators

Deploying to a Cluster

Unit Testing

Migrating from pre-1.0 Versions of Spark

Where to Go from Here

Author

Likhit

MongoDB, Hadoop, Pig, HBase, Hive, Hadoop ecosystem, Sqoop, Flume, HDFS, MR, Scala, Spark, Zookeeper, SQL, NoSQL, , Machine Learning, Data Mining, YARN, storm

MS SQL Server, Oracle, ADO.Net, C#, .Net, Data Architecture, ASP.Net, Ajax, Java Servlets, Corporate Training, Windows Forms, MS Visual Studio.net

Rajeev

hadoop, pig, hbase, nosql, mongodb, javascript, html, wordpress, mysql, php, apache, linux system administration, mapreduce, apache pig, hive, cloudera, impala, spark, scala, core java, python, hdfs, oozie, Cassandra, Flume, Cdh, Sqoop

Sanjay

machine learning, natural language processing, data science, python, business development, technology, operations, deep learning, team handling, big data, strategic partnerships, mysql, Communication, Teaching, Training, Mentoring, Corporate Training

R, Hadoop, Spark, Data Analytics, Lead Management, Business Training

UmaPava

C, C++, Oracle, Data warehousing, HADOOP, HIVE, Hbase, PIG, SQOOP, Flume, data base concepts, R, Python, Mapreduce, Machine Learning, Data Science, Deep Learning, Java, Data Structures, Database, SQL, PLSQL

Core Java, Oracle SQL, Big Data, Oracle PL, Hdfs, Artificial Intelligence, Data Integration, Data Mining, Data Modeling, Informatica

Hadoop

Hadoop Course Overview

Highlights of Hadoop Online training:-

Hadoop Architecture

MapReduce

Hadoop Distributed File System

Advantages of Hadoop

What you will learn in this Big Data Hadoop training Course?

Who should take this Big Data Hadoop Online Training Course?

What are the prerequisites for learning Hadoop?

Scheduling Demo With Trainer:

Course Finished Certificate :

Hadoop Course Curriculum

Author

Likhit

Rajeev

Sanjay

UmaPava

Company

Trending Courses

Contact Info