HDP Developer: Apache Spark 2.3 Training

Category

Hortonworks

Rating
4
(4)
Price

$2800

Course Description

This training provides you the knowledge about Apache Spark distributed computing engine which is appropriate for developers, technical managers, architects, data analysts and any learner who want to utilize Spark. The course also provides technical knowledge about Spark architecture and its functionalities. It also covers the basic building blocks along with HL constructs providing a capable and simpler interface. The training also helps you to gain in-depth knowledge of DataSets, Spark SQL and DataFrames.

Prerequisites for this training

Recommend familiarity with programming principles and good experience in software developing utilizing Scala. However, any previous experience with SQL, HDP and data streaming is also beneficial.

Who should attend this course?

This training is intended for software developers who are seeking to develop in-memory apps and highly apps within HDP environment.

What you will learn

  • Installing and acquiring Spark.
  • Identifying Supported Data Formats
  • Utilizing Accumulators and Broadcast Variables.
  • Creating and configuring SparkSession.

With Microtek Learning, you’ll receive:

  • Certified Instructor-led training
  • Industry Best Trainers
  • Official Training Course Student Handbook
  • Pre and Post assessments/evaluations
  • Collaboration with classmates (not available for a self-paced course)
  • Real-world knowledge activities and scenarios
  • Exam scheduling support*
  • Learn and earn program*
  • Practice Tests
  • Knowledge acquisition and exam-oriented
  • Interactive online course.
  • Support from an approved expert
  • For Government and Private pricing*
Request Call

Our Clients

For many years, Microtek Learning has been helping organizations, leaders, and professionals to reach their maximum performance by addressing the challenges they are facing.

  • 300+ enterprise clients
  • 100,000+ professionals trained
  • Service 70 of the Fortune 100
  • 96% of our clients would recommend us
our clients

Our Awards

our awards
why choose us

Curriculum

  • Scala Introduction
  • Working with: Variables, Data Types, and Control Flow
  • The Scala Interpreter
  • Collections and their Standard Methods (e.g. map())
  • Working with: Functions, Methods, and Function Literals
  • Define the Following as they Relate to Scale: Class, Object, and Case Class
  • Overview, Motivations, Spark Systems
  • Spark Ecosystem
  • Spark vs. Hadoop
  • Acquiring and Installing Spark
  • The Spark Shell, SparkContext
  • LABS

  • Setting Up the Lab Environment
  • Starting the Scala Interpreter
  • A First Look at Spark
  • A First Look at the Spark Shell
  • RDD Concepts, Lifecycle, Lazy Evaluation
  • RDD Partitioning and Transformations
  • Working with RDDs Including: Creating and Transforming
  • An Overview of RDDs
  • SparkSession, Loading/Saving Data, Data Formats
  • Introducing DataFrames and DataSets
  • Identify Supported Data Formats
  • Working with the DataFrame (untyped) Query DSL
  • SQL-based Queries
  • Working with the DataSet (typed) API
  • Mapping and Splitting
  • DataSets vs. DataFrames vs. RDDs
  • LABS

  • RDD Basics
  • Operations on Multiple RDDs
  • Data Formats
  • Spark SQL Basics
  • DataFrame Transformations
  • The DataSet Typed API
  • Splitting Up Data
  • Working with: Grouping, Reducing, Joining
  • Shuffling, Narrow vs. Wide Dependencies, and Performance Implications
  • Exploring the Catalyst Query Optimizer
  • The Tungsten Optimizer
  • Discuss Caching, Including: Concepts, Storage Type, Guidelines
  • Minimizing Shuffling for Increased Performance
  • Using Broadcast Variables and Accumulators
  • General Performance Guidelines
  • LABS

  • Exploring Group Shuffling
  • Seeing Catalyst at Work
  • Seeing Tungsten at Work
  • Working with Caching, Joins, Shuffles, Broadcasts, Accumulators
  • Broadcast General Guidelines
  • Core API, SparkSession.Builder
  • Configuring and Creating a SparkSession
  • Building and Running Applications
  • Application Lifecycle (Driver, Executors, and Tasks)
  • Cluster Managers (Standalone, YARN, Mesos)
  • Logging and Debugging
  • Introduction and Streaming Basics
  • Spark Streaming (Spark 1.0+)
  • Structured Streaming (Spark 2+)
  • Consuming Kafka Data
  • LABS

  • Spark Job Submission
  • Additional Spark Capabilities
  • Spark Streaming
  • Spark Structured Streaming
  • Spark Structured Streaming with Kafka
  • REVIEWS ON OUR POPULAR COURSES

    male

    I was sceptical at first whether to enrol with Microtek Learning or not, however, I am glad that I did- I got everything that was promised (maybe more). The trainer was very patient and knowledgeable and with his effort and mine, I was able to clear the exam with ease! Keep up the good work everyone.

    MARTIN

    TORONTO, CANADA

    • (5)
    male

    I'm really impressed with the storytelling skills of the instructor. She makes the session exciting by keeping things simple and easy to understand.

    Prince N.

    Texas

    • (5)
    male

    I was recommended the ITIL 4 Foundation course by an IT professional who had completed the same course at Microtek Learning. The training gave me a thorough understanding of service management that I felt I could take back to my job as an IT Project Management and apply it to improve the value of products and services.

    Marsh George

    Texas

    • (5)
    Accredited By

    Course Details

    • Duration: 4 Days
    • Enrolled: 1423
    • Price: $2800
    side post

    Talk to Learning Advisor

    Get In Touch

    * I authorize Microtek Learning to contact me via Phone/Email