2. Advanced Apache Pig Programming
a). Demonstrate Common Operators Such as:
i). Order by
ii). Case
iii). Distinct
iv). Parallel
v). Foreach
b). Understand how Hive Tables are Defined and Implemented
c). Use Hive to Explore and Analyze Data Sets
d). Explain and Use the Various Hive File Formats
e). Create and Populate a Hive Table that Uses ORC File Formats
f). Use Hive to Run SQL-like Queries to Perform Data Analysis
g). Use Hive to Join Datasets Using a Variety of Techniques
h). Write Efficient Hive Queries
i). Explain the Uses and Purpose of HCatalog
j). Use HCatalog with Pig and Hive
k). Labs and Demonstrations
i). Splitting a Dataset
ii). Joining Datasets
iii). Preparing Data for Apache Hive
iv). Understanding Apache Hive Tables
v). Demonstration: Understanding Partitions and Skew
vi). Analyzing Big Data with Apache Hive
3. Advanced Apache Pig Programming
a). Describe How to Perform a Multi-Table/File Insert
b). Define and Use Views
c). Define and Use Clauses and Windows
d). List the Hive File Formats Including:
i). Text Files
ii). SequenceFile
iii). RCFile
iv). ORC File
e). Define Hive Optimization
f). Use Apache Zeppelin to Work with Spark
g). Describe the Purpose and Benefits of Spark
h). Define Spark REPLs and Application Architecture
i). Explain the Purpose and Function of RDDs
j). Explain Spark Programming Basics
k). Define and Use Basic Spark Transformations
l). Define and Use Basic Spark Actions
m). Invoke Functions for Multiple RDDs, Create Named Functions and Use Numeric Operations
n). Labs
i). Advanced Apache Hive Programming
ii). Introduction to Apache Spark REPLs and Apache Zeppelin
iii). Creating and Manipulating RDDs
iv). Creating and Manipulating Pair RDDs
4. Working with Pair RDDS and Building Yarn Applications
a). Define and Create Pair RDDs
b). Perform Common Operations on Pair RDDs
c). Name the Various Components of Spark SQL and Explain their Purpose
d). Describe the Relationship Between DataFrames, Tables and Contexts
i). Use Various Methods to Create and Save DataFrames and Tables
e). Understand Caching, Persisting and the Different Storage Levels
f). Describe and Implement Checkpointing
g). Create an Application to Submit to the Cluster
h). Describe Client vs Cluster Submission with YARN
i). Submit an Application to the Cluster
j). List and Set Important Configuration Items
k). Labs
i). Creating and Saving DateFrames and Tables
ii). Working with DataFrames
iii). Building and Submitting Applications to YARN
Share With Your Friends!
Twitter
LinkedIn