18756 Stone Oak Park Way, Suite200, San Antonio TX 78258 USA
100 Queen St W, Brampton, ON L6X 1A4, Canada
country flagUnited States
share button

HDP Developer Quick Start Training

What HDP Developer Quick Start  training is all about?

This HDP Developer Quick Start Training is recommended for the developers who know how to create apps in analyzing Big data stored in Apache Hadoop by utilizing Apache Hive and Apache Pig. The training also provides knowledge of developing apps on Apache Spark Platform.


Contact us to customize this class with your preferred dates, times and location.
You can call us on 1-800-961-0337 or Chat with our representative.

What are the course objectives for HDP Developer Quick Start training?
  • Introduction to the HDFS and Apache Hadoop.
  • Advanced level of Apache Pig Programming.
  • Functioning with Building Yarn Apps and pair RDDS.
Who should attend HDP Developer Quick Start training?

This training is intended for data engineers and Developers who know how to develop and understand apps in HDP.

What are the prerequisites for HDP Developer Quick Start training?

The recommended prerequisite for this course is familiarity with software development and programming principles. However, knowledge of light scripting knowledge and SQL is also important.

What is the course outline for HDP Developer Quick Start training?
  • 1. An Introduction to Apache Hadoop and HDFS
  • a). Describe the Case for Hadoop
  • b). Describe the Trends of Volume, Velocity and Variety
  • c). Discuss the Importance of Open Enterprise Hadoop
  • d). Describe the Hadoop Ecosystem Frameworks Across the Following Five Architectural Categories:
  • i). Data Management
  • ii). Data Access
  • iii). Data Governance & Integration
  • iv). Security
  • v). Operations
  • e). Describe the Function and Purpose of the Hadoop Distributed File System (HDFS)
  • f). List the Major Architectural Components of HDFS and their Interactions
  • g). Describe Data Ingestion
  • h). Describe Batch/Bulk Ingestion Options
  • i). Describe the Streaming Framework Alternatives
  • j). Describe the Purpose and Function of MapReduce
  • k). Describe the Purpose and Components of YARN
  • l). Describe the Major Architectural Components of YARN and their Interactions
  • m). Define the Purpose and Function of Apache Pig
  • n). Work with the Grunt Shell
  • o). Work with Pig Latin Relation Names and Field Names
  • p). Describe the Pig Data Types and Schema
  • q). Labs and Demonstrations
  • i). Starting an HDP Cluster
  • ii). Using HDFS Commands
  • iii). Demonstration: Understanding Apache Pig
  • iv). Getting Started with Apache Pig
  • v). Exploring Data with Pig
  • 2. Advanced Apache Pig Programming
  • a). Demonstrate Common Operators Such as:
  • i). Order by
  • ii). Case
  • iii). Distinct
  • iv). Parallel
  • v). Foreach
  • b). Understand how Hive Tables are Defined and Implemented
  • c). Use Hive to Explore and Analyze Data Sets
  • d). Explain and Use the Various Hive File Formats
  • e). Create and Populate a Hive Table that Uses ORC File Formats
  • f). Use Hive to Run SQL-like Queries to Perform Data Analysis
  • g). Use Hive to Join Datasets Using a Variety of Techniques
  • h). Write Efficient Hive Queries
  • i). Explain the Uses and Purpose of HCatalog
  • j). Use HCatalog with Pig and Hive
  • k). Labs and Demonstrations
  • i). Splitting a Dataset
  • ii). Joining Datasets
  • iii). Preparing Data for Apache Hive
  • iv). Understanding Apache Hive Tables
  • v). Demonstration: Understanding Partitions and Skew
  • vi). Analyzing Big Data with Apache Hive
  • 3. Advanced Apache Pig Programming
  • a). Describe How to Perform a Multi-Table/File Insert
  • b). Define and Use Views
  • c). Define and Use Clauses and Windows
  • d). List the Hive File Formats Including:
  • i). Text Files
  • ii). SequenceFile
  • iii). RCFile
  • iv). ORC File
  • e). Define Hive Optimization
  • f). Use Apache Zeppelin to Work with Spark
  • g). Describe the Purpose and Benefits of Spark
  • h). Define Spark REPLs and Application Architecture
  • i). Explain the Purpose and Function of RDDs
  • j). Explain Spark Programming Basics
  • k). Define and Use Basic Spark Transformations
  • l). Define and Use Basic Spark Actions
  • m). Invoke Functions for Multiple RDDs, Create Named Functions and Use Numeric Operations
  • n). Labs
  • i). Advanced Apache Hive Programming
  • ii). Introduction to Apache Spark REPLs and Apache Zeppelin
  • iii). Creating and Manipulating RDDs
  • iv). Creating and Manipulating Pair RDDs
  • 4. Working with Pair RDDS and Building Yarn Applications
  • a). Define and Create Pair RDDs
  • b). Perform Common Operations on Pair RDDs
  • c). Name the Various Components of Spark SQL and Explain their Purpose
  • d). Describe the Relationship Between DataFrames, Tables and Contexts
  • i). Use Various Methods to Create and Save DataFrames and Tables
  • e). Understand Caching, Persisting and the Different Storage Levels
  • f). Describe and Implement Checkpointing
  • g). Create an Application to Submit to the Cluster
  • h). Describe Client vs Cluster Submission with YARN
  • i). Submit an Application to the Cluster
  • j). List and Set Important Configuration Items
  • k). Labs
  • i). Creating and Saving DateFrames and Tables
  • ii). Working with DataFrames
  • iii). Building and Submitting Applications to YARN
4 Days | $ 2800
Enroll Now
  215 Ratings

1462 Learners

Get In Touch

Are you being sponsored by your employer to take this class?
* I authorize Microtek Learning to contact me via Phone/Email