18756 Stone Oak Park Way, Suite200, San Antonio TX 78258 USA
100 Queen St W, Brampton, ON L6X 1A4, Canada
country flagUnited States
share button

HDP Developer: Apache Spark 2.3 Training

What HDP Developer: Apache Spark 2.3 training is all about?

This training provides you the knowledge about Apache Spark distributed computing engine which is appropriate for developers, technical managers, architects, data analysts and any learner who want to utilize Spark. The course also provides technical knowledge about Spark architecture and its functionalities. It also covers the basic building blocks along with HL constructs providing a capable and simpler interface. The training also helps you to gain in-depth knowledge of DataSets, Spark SQL and DataFrames.


Contact us to customize this class with your preferred dates, times and location.
You can call us on 1-800-961-0337 or Chat with our representative.

What are the course objectives for HDP Developer: Apache Spark 2.3 training?
  • Installing and acquiring Spark.
  • Identifying Supported Data Formats
  • Utilizing Accumulators and Broadcast Variables.
  • Creating and configuring SparkSession.
Who should attend HDP Developer: Apache Spark 2.3 training?

This training is intended for software developers who are seeking to develop in-memory apps and highly apps within HDP environment.

What are the prerequisites for HDP Developer: Apache Spark 2.3 training?

Recommend familiarity with programming principles and good experience in software developing utilizing Scala. However, any previous experience with SQL, HDP and data streaming is also beneficial.

What is the course outline for HDP Developer: Apache Spark 2.3 training?
  • 1. Scala Ramp Up, Introduction to Spark
  • Scala Introduction
  • Working with: Variables, Data Types, and Control Flow
  • The Scala Interpreter
  • Collections and their Standard Methods (e.g. map())
  • Working with: Functions, Methods, and Function Literals
  • Define the Following as they Relate to Scale: Class, Object, and Case Class
  • Overview, Motivations, Spark Systems
  • Spark Ecosystem
  • Spark vs. Hadoop
  • Acquiring and Installing Spark
  • The Spark Shell, SparkContext
  • LABS

  • Setting Up the Lab Environment
  • Starting the Scala Interpreter
  • A First Look at Spark
  • A First Look at the Spark Shell
  • 2. RDDs and Spark Architecture, Spark SQL, DataFrames and DataSets
  • RDD Concepts, Lifecycle, Lazy Evaluation
  • RDD Partitioning and Transformations
  • Working with RDDs Including: Creating and Transforming
  • An Overview of RDDs
  • SparkSession, Loading/Saving Data, Data Formats
  • Introducing DataFrames and DataSets
  • Identify Supported Data Formats
  • Working with the DataFrame (untyped) Query DSL
  • SQL-based Queries
  • Working with the DataSet (typed) API
  • Mapping and Splitting
  • DataSets vs. DataFrames vs. RDDs
  • LABS

  • RDD Basics
  • Operations on Multiple RDDs
  • Data Formats
  • Spark SQL Basics
  • DataFrame Transformations
  • The DataSet Typed API
  • Splitting Up Data
  • 3. Shuffling, Transformations and Performance, Performance Tuning
  • Working with: Grouping, Reducing, Joining
  • Shuffling, Narrow vs. Wide Dependencies, and Performance Implications
  • Exploring the Catalyst Query Optimizer
  • The Tungsten Optimizer
  • Discuss Caching, Including: Concepts, Storage Type, Guidelines
  • Minimizing Shuffling for Increased Performance
  • Using Broadcast Variables and Accumulators
  • General Performance Guidelines
  • LABS

  • Exploring Group Shuffling
  • Seeing Catalyst at Work
  • Seeing Tungsten at Work
  • Working with Caching, Joins, Shuffles, Broadcasts, Accumulators
  • Broadcast General Guidelines
  • 4. Creating Standalone Applications and Spark Streaming
  • Core API, SparkSession.Builder
  • Configuring and Creating a SparkSession
  • Building and Running Applications
  • Application Lifecycle (Driver, Executors, and Tasks)
  • Cluster Managers (Standalone, YARN, Mesos)
  • Logging and Debugging
  • Introduction and Streaming Basics
  • Spark Streaming (Spark 1.0+)
  • Structured Streaming (Spark 2+)
  • Consuming Kafka Data
  • LABS

  • Spark Job Submission
  • Additional Spark Capabilities
  • Spark Streaming
  • Spark Structured Streaming
  • Spark Structured Streaming with Kafka
4 Days | $ 2800
Enroll Now
  211 Ratings

1423 Learners

Get In Touch

Are you being sponsored by your employer to take this class?
* I authorize Microtek Learning to contact me via Phone/Email