18756 Stone Oak Park Way, Suite200, San Antonio TX 78258 USA
100 Queen St W, Brampton, ON L6X 1A4, Canada
country flagUnited States
share button

HDP Analyst Data Science Training

What HDP Analyst Data Science training is all about?

The training provides practice of Data science covering machine learning and other natural language processed. Other tools covered are programming and tools languages such as Mathout, Pig, NumPY, Natural Language Toolkit, pandas, Spark MLlib and SciPy.


Contact us to customize this class with your preferred dates, times and location.
You can call us on 1-800-961-0337 or Chat with our representative.

What are the course objectives for HDP Analyst Data Science training?
  • Identifying the cases for data science.
  • Describing about YARN architecture and Hadoop.
  • Describing unsupervised and supervised learning differences.
  • Utilizing Mahout to execute a ML algorithm on Hadoop.
  • Describing the lifecycle of data science.
  • Utilizing Pig to prepare and transform data on Hadoop.
  • Writing Python script.
  • Describing about the options for running Python code.
  • Writing Pig User-Defined functionalities in Python.
  • Utilizing Pig streaming with Python script on Hadoop.
  • Utilizing ML algorithms.
  • Describing the utilizing cases for NLP.
  • Utilizing the NLTK.
  • Writing about Spark application in Python.
  • Running ML algorithms by utilizing Spark MLlib.
Who should attend HDP Analyst Data Science training?

This training is intended for software developers and architects. However, the primary audience for this training are data scientists and analysts who want to apply machine learning and data science on Hadoop.

What are the prerequisites for HDP Analyst Data Science training?

It is suggested to have basic knowledge in at least one programming language. However, you must have knowledge in statistics/mathematics and fundamental basic knowledge of Hadoop principles and big data.

What is the course outline for HDP Analyst Data Science training?
  • 1. An Introduction to Data Science, Python, Hadoop and Machine Learning
  • Define Data Science and Explain What a Data Scientist Does
  • Differentiate Between Different Types of Data Roles
  • List a Number of Data Science Use Cases
  • Present an Overview of Python
  • Describe the Components of the Big Data Scientific Stack
  • LABS

  • Using IPython
  • Data Analysis with Python
  • Using HDFS Commands
  • Introduction to Spark REPLs and Zeppelin
  • Using Apache Mahout for Machine Learning
  • 2. Working with Spark RDDs, DataFrames and SparkSQL, Visualization in Zeppelin
  • Explain What an RDD Is
  • Explain How RDDs are Partitioned
  • Create Manipulate and Restore RDDs
  • Use Spark SQL to Create Tables
  • Create an Application and Submit to the Cluster
  • LABS

  • Create and Manipulate RDDs
  • Create and Save DataFrames
  • Build and Submit Spark Applications
  • 3. Machine Learning Algorithms, Natural Language Processing, and Spark MLlib
  • Describe Common Machine Learning Applications
  • List the Pros and Cons of Various Algorithms
  • Explain what Natural Language Processing is
  • Explain the Feature Engineering Capabilities of Spark MLlib
  • LABS

  • Use the Python Natural Language Toolkit (NLTK)
  • Classify text using Naïve Bayes
  • Compute K-nearest neighbors
  • Creating a Spam Classifier with MLlib
  • Sentiment Analysis with Spark MLlib
  • 4. 
3 Days | $ 2295
Enroll Now
  234 Ratings

1246 Learners

Get In Touch

Are you being sponsored by your employer to take this class?
* I authorize Microtek Learning to contact me via Phone/Email