The Natural Language Toolkit (NLTK) and Spark MLlib are also included, along with many tools and programming languages (Python, Mahout, IPython, SciPy, Pig, pandas, NumPy, and Scikitlearn).

Mode of Training

🏫 Classroom 💻 Live Online 🧪 Blended 👨‍👩‍👧‍👦 Private Group

What you will learn

Describe the Hadoop and YARN architecture
Describe supervised and unsupervised learning differences
Use Mahout to run a machine learning algorithm on Hadoop
Describe the data science life cycle
Use Pig to transform and prepare data on Hadoop
Write a Python script
Describe options for running Python code on a Hadoop cluster
Write a Pig User-Defined Function in Python
Use Pig streaming on Hadoop with a Python script
Use machine learning algorithms
Describe use cases for Natural Language Processing (NLP)
Use the Natural Language Toolkit (NLTK)
Describe the components of a Spark application
Write a Spark application in Python
Run machine learning algorithms using Spark MLlib
Take data science into production.

Who Should Attend This Course?

Data scientists who need to use machine learning and data science on Hadoop, including architects, analysts, software developers, and data scientists.

Prerequisites

Students must be familiar with at least one programming or scripting language, statistics, mathematics, and the fundamentals of Hadoop. Attending the HDP Overview.

📞 Talk to a Learning Advisor

📘 HDP Analyst Data Science Outline

a. Setting Up a Development Environment

Demo: Block Storage

b. Using HDFS Commands

Demo: MapReduce

c. Using Apache Mahout for Machine Learning

Demo: Apache Pig

d. Getting Started with Apache Pig

e. Exploring Data with Pig

f. Using the IPython Notebook

Demo: The NumPy Package
Demo: The pandas Library

g. Data Analysis with Python

h. Interpolating Data Points

i. Defining a Pig UDF in Python

j. Streaming Python with Pig

Demo: Classification with Scikit-Learn

k. Computing K-Nearest Neighbor

l. Generating a K-Means Clustering

m. POS Tagging Using a Decision Tree

n. Using NLTK for Natural Language Processing

o. Classifying Text using Naive Bayes

p. Using Spark Transformations and Actions

q. Using Spark MLlib

r. Creating a Spam Classifier with MLlib

Still have questions?

Reach out to our learning advisors for personalized guidance on choosing the right course, group training, or enterprise packages.

📞 Talk to an Advisor

What You Get with Microtek Learning

Instructor-Led Excellence

✓ Certified Instructor-led Training
✓ Top Industry Trainers
✓ Official Student Handbooks

Measurable Learning Outcomes

✓ Pre- & Post-Training Assessments
✓ Practice Tests
✓ Exam-Oriented Curriculum

Real-World Skill Building

✓ Hands-on Activities & Scenarios
✓ Interactive Online Courses
✓ Peer Collaboration (Not in self-paced)

Full Support & Perks

✓ Exam Scheduling Support ^*
✓ Learn & Earn Program ^*
✓ Support from Certified Experts
✓ Gov. & Private Pricing ^*

Our Clients

For over 10 years, Microtek Learning has helped organizations, leaders, students and professionals to reach their maximum potential. We have led the path by addressing their challenges and advancing their performances.