HDP Analyst Data Science Training




Per Participant

Course Description

The training provides practice of Data science covering machine learning and other natural language processed. Other tools covered are programming and tools languages such as Mathout, Pig, NumPY, Natural Language Toolkit, pandas, Spark MLlib and SciPy.

Prerequisites for this training

It is suggested to have basic knowledge in at least one programming language. However, you must have knowledge in statistics/mathematics and fundamental basic knowledge of Hadoop principles and big data.

Who should attend this course?

This training is intended for software developers and architects. However, the primary audience for this training are data scientists and analysts who want to apply machine learning and data science on Hadoop.


Oops! For this course, there are currently no public schedules available. Clicking on "Notify Me" will allow you to express your interest.

For dates, times, and location customization of this course, get in touch with us.

You can also speak with a learning consultant by calling 800-961-0337.

What you will learn

  • Identifying the cases for data science.
  • Describing about YARN architecture and Hadoop.
  • Describing unsupervised and supervised learning differences.
  • Utilizing Mahout to execute a ML algorithm on Hadoop.
  • Describing the lifecycle of data science.
  • Utilizing Pig to prepare and transform data on Hadoop.
  • Writing Python script.
  • Describing about the options for running Python code.
  • Writing Pig User-Defined functionalities in Python.
  • Utilizing Pig streaming with Python script on Hadoop.
  • Utilizing ML algorithms.
  • Describing the utilizing cases for NLP.
  • Utilizing the NLTK.
  • Writing about Spark application in Python.
  • Running ML algorithms by utilizing Spark MLlib.


  • Define Data Science and Explain What a Data Scientist Does
  • Differentiate Between Different Types of Data Roles
  • List a Number of Data Science Use Cases
  • Present an Overview of Python
  • Describe the Components of the Big Data Scientific Stack
  • LABS

  • Using IPython
  • Data Analysis with Python
  • Using HDFS Commands
  • Introduction to Spark REPLs and Zeppelin
  • Using Apache Mahout for Machine Learning
  • Explain What an RDD Is
  • Explain How RDDs are Partitioned
  • Create Manipulate and Restore RDDs
  • Use Spark SQL to Create Tables
  • Create an Application and Submit to the Cluster
  • LABS

  • Create and Manipulate RDDs
  • Create and Save DataFrames
  • Build and Submit Spark Applications
  • Describe Common Machine Learning Applications
  • List the Pros and Cons of Various Algorithms
  • Explain what Natural Language Processing is
  • Explain the Feature Engineering Capabilities of Spark MLlib
  • LABS

  • Use the Python Natural Language Toolkit (NLTK)
  • Classify text using Naïve Bayes
  • Compute K-nearest neighbors
  • Creating a Spam Classifier with MLlib
  • Sentiment Analysis with Spark MLlib
  • With Microtek Learning, you’ll receive:

    • Certified Instructor-led training
    • Industry Best Trainers
    • Official Training Course Student Handbook
    • Pre and Post assessments/evaluations
    • Collaboration with classmates (not available for a self-paced course)
    • Real-world knowledge activities and scenarios
    • Exam scheduling support*
    • Learn and earn program*
    • Practice Tests
    • Knowledge acquisition and exam-oriented
    • Interactive online course.
    • Support from an approved expert
    • For Government and Private pricing*

    * For more details call: +1-800-961-0337 or Email: info@microteklearning.com

    Request Call

    Our Clients

    For many years, Microtek Learning has been helping organizations, leaders, and professionals to reach their maximum performance by addressing the challenges they are facing.

    • 300+ enterprise clients
    • 100,000+ professionals trained
    • Service 70 of the Fortune 100
    • 96% of our clients would recommend us
    our clients

    Our Awards

    our awards
    why choose us



    I was sceptical at first whether to enrol with Microtek Learning or not, however, I am glad that I did- I got everything that was promised (maybe more). The trainer was very patient and knowledgeable and with his effort and mine, I was able to clear the exam with ease! Keep up the good work everyone.



    • (5)

    I'm really impressed with the storytelling skills of the instructor. She makes the session exciting by keeping things simple and easy to understand.

    Prince N.


    • (5)

    I was recommended the ITIL 4 Foundation course by an IT professional who had completed the same course at Microtek Learning. The training gave me a thorough understanding of service management that I felt I could take back to my job as an IT Project Management and apply it to improve the value of products and services.

    Marsh George


    • (5)
    Accredited By

    Course Details

    • Duration: 3 Days
    • Enrolled: 1246
    • Price: $2295
    side post side mode

    Talk to Learning Advisor