Radhika Technosoft head
Hadoop - Bigdata Development
Radhika Technosoft head

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0,

The Apache Hadoop framework is composed of the following modules:
Hadoop Common – contains libraries and utilities needed by other Hadoop modules

Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
Hadoop Map Reduce – a programming model for large scale data processing.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's Map Reduce and HDFS components originally derived respectively from Google's Map Reduce and Google File System (GFS) papers.

Basics of Hadoop:

  • Motivation for Hadoop
  • Large scale system training
  • Survey of data storage literature
  • Literature survey of data processing
  • Networking constraints
  • New approach requirements
  • Basic concepts of Hadoop

What is Hadoop?

  • Distributed file system of Hadoop
  • Map reduction of Hadoop works
  • Hadoop cluster and its anatomy
  • Hadoop demons
  • Master demons
  • Name node
  • Tracking of job
  • Secondary node detection
  • Slave daemons
  • Tracking of task
  • HDFS(Hadoop Distributed File System)
  • Spilts and blocks
  • Input Spilts
  • HDFS spilts
  • Replication of data
  • Awareness of Hadoop racking
  • High availability of data
  • Block placement and cluster architecture
  • CASE STUDIES
  • Practices & Tuning of performances
  • Development of mass reduce programs
  • Local mode
  • Running without HDFS
  • Pseudo-distributed mode
  • All daemons running in a single mode
  • Fully distributed mode
  • Dedicated nodes and daemon running

Hadoop administration

  • Setup of Hadoop cluster of Cloud era, Apache, Green plum, Horton works
  • On a single desktop, make a full cluster of a Hadoop setup.
  • Configure and Install Apache Hadoop on a multi node cluster.
  • In a distributed mode, configure and install Cloud era distribution.
  • In a fully distributed mode, configure and install Hortom works distribution
  • In a fully distributed mode, configure the Green Plum distribution.
  • Monitor the cluster
  • Get used to the management console of Horton works and Cloud era.
  • Name the node in a safe mode
  • Data backup.
  • Case studies
  • Monitoring of clusters

Hadoop Development :

  • Writing a MapReduce Program
  • Sample the mapreduce program.
  • API concepts and their basics
  • Driver code
  • Mapper
  • Reducer
  • Hadoop AVI streaming
  • Performing several Hadoop jobs
  • Configuring close methods
  • Sequencing of files
  • Record reading
  • Record writer
  • Reporter and its role
  • Counters
  • Output collection
  • Assessing HDFS
  • Tool runner
  • Use of distributed CACHE

Several MapReduce jobs (In Detailed)

  • MOST EFFECTIVE SEARCH USING MAPREDUCE
  • GENERATING THE RECOMMENDATIONS USING MAPREDUCE
  • PROCESSING THE LOG FILES USING MAPREDUCE

Identification of mapper

  • Identification of reducer
  • Exploring the problems using this application
  • Debugging the MapReduce Programs
  • MR unit testing
  • Logging
  • Debugging strategies
  • Advanced MapReduce Programming
  • Secondary sort
  • Output and input format customization
  • Mapreduce joins
  • Monitoring & debugging on a Production Cluster
  • Counters
  • Skipping Bad Records
  • Running the local mode
  • MapReduce performance tuning
  • Reduction network traffic by combiner
  • Partitioners
  • Reducing of input data
  • Using Compression
  • Reusing the JVM
  • Running speculative execution
  • Performance Aspects
  • CASE STUDIES

CDH4 Enhancements :

  • Name Node – Availability
  • Name Node federation
  • Fencing
  • MapReduce – 2

HADOOP ANALYST

  • Concepts of Hive
  • Hive and its architecture
  • Install and configure hive on cluster
  • Type of tables in hive
  • Functions of Hive library
  • Buckets
  • Partitions
  • Joins
  • Inner joins
  • Outer Joins
  • Hive UDF

PIG

  • Pig basics
  • Install and configure PIG
  • Functions of PIG Library
  • Pig Vs Hive
  • Writing of sample Pig Latin scripts
  • Modes of running
  • Grunt shell
  • Java program
  • PIG UDFs
  • Macros of Pig
  • Debugging the PIG

IMPALA

  • Difference between Pig and Impala Hive
  • Does Impala give good performance?
  • Exclusive features
  • Impala and its Challenges
  • Use cases

NOSQL

  • HBase
  • HBase concepts
  • HBase architecture
  • Basics of HBase
  • Server architecture
  • File storage architecture
  • Column access
  • Scans
  • HBase cases
  • Installation and configuration of HBase on a multi node
  • Create database, Develop and run sample applications
  • Access data stored in HBase using clients like Python, Java and Pearl
  • Map Reduce client
  • HBase and Hive Integration
  • HBase administration tasks
  • Refining Schema and its basic operations.
  • Cassandra Basics
  • MongoDB Basics

Ecosystem Components

  • Sqoop
  • Configure and Install Sqoop
  • Connecting RDBMS
  • Installation of Mysql
  • Importing the data from Oracle/Mysql to hive
  • Exporting the data to Oracle/Mysql
  • Internal mechanism

Oozie

  • ozie and its architecture
  • XML file
  • Install and configuring Apache
  • Specifying the Work flow
  • Action nodes
  • Control nodes
  • Job coordinator

Avro, Scribe, Flume, Chukwa, Thrift

  • Concepts of Flume and Chukwa
  • Use cases of Scribe, Thrift and Avro
  • Installation and configuration of flume
  • Creation of a sample application

Challenges of Hadoop

  • Hadoop recovery
  • Hadoop suitable cases.

our services Radhika Technosoft head RADHIKA Technosoft, a pioneer in software training has brought monumental changes in the methods and services offered. We're not an organization who puts across implausible claims. Since our inception, our services are exactly what we promised. On a given note, our service offerings are Online training, Corporate training, Certification, Web development and Job support.

our courses Radhika Technosoft head RADHIKA Technosoft is one of the trusted training institutes that offer online training for WEBSPHERE, SAP, ORACLE, PROFESSIONAL COURSES, works with a mission to make online software learning easier for all the students across the world.

Radhika Technosoft our courses websphere
Radhika Technosoft our courses sap
Radhika Technosoft our courses oracle
Radhika Technosoft our courses professional courses
Radhika Technosoft our courses java
Radhika Technosoft our courses microsoft
Radhika Technosoft our courses tibko
Radhika Technosoft our courses websphere
Radhika Technosoft our courses sap
Radhika Technosoft our courses oracle
Radhika Technosoft get in touch

Radhika Technosoft get in touch message get in touch
Feel free to get in touch with us.

contact us

Radhika Technosoft
No.1, 4th floor ideal home township,
Raja Rajeshwari Nagar,
Bangalore - 560098

social network
  • Facebook
    Radhika Technosoft Social icon
  • Linkedin
    Radhika Technosoft Social icon
  • Twitter
    Radhika Technosoft Social icon
  • Google Plus
    Radhika Technosoft Social icon

Copyrights @ All rights reserved | Privacy Policy | Sitemap Login

Scroll Up