The training provides practice of Data science covering machine learning and other natural language processed. Other tools covered are programming and tools languages such as Mathout, Pig, NumPY, Natural Language Toolkit, pandas, Spark MLlib and SciPy.
Prerequisites for this training
It is suggested to have basic knowledge in at least one programming language. However, you must have knowledge in statistics/mathematics and fundamental basic knowledge of Hadoop principles and big data.
Who should attend this course?
This training is intended for software developers and architects. However, the primary audience for this training are data scientists and analysts who want to apply machine learning and data science on Hadoop.
What you will learn
Identifying the cases for data science.
Describing about YARN architecture and Hadoop.
Describing unsupervised and supervised learning differences.
Utilizing Mahout to execute a ML algorithm on Hadoop.
Describing the lifecycle of data science.
Utilizing Pig to prepare and transform data on Hadoop.
Writing Python script.
Describing about the options for running Python code.
Writing Pig User-Defined functionalities in Python.
Utilizing Pig streaming with Python script on Hadoop.
Utilizing ML algorithms.
Describing the utilizing cases for NLP.
Utilizing the NLTK.
Writing about Spark application in Python.
Running ML algorithms by utilizing Spark MLlib.
With Microtek Learning, you’ll receive:
Certified Instructor-led training
Industry Best Trainers
Official Training Course Student Handbook
Pre and Post assessments/evaluations
Collaboration with classmates (not available for a self-paced course)
Real-world knowledge activities and scenarios
Exam scheduling support*
Learn and earn program*
Knowledge acquisition and exam-oriented
Interactive online course.
Support from an approved expert
For Government and Private pricing*
For many years, Microtek Learning has been helping organizations, leaders, and professionals to reach their maximum performance by addressing the challenges they are facing.
- 300+ enterprise clients
- 100,000+ professionals trained
- Service 70 of the Fortune 100
- 96% of our clients would recommend us
Define Data Science and Explain What a Data Scientist Does
Differentiate Between Different Types of Data Roles
List a Number of Data Science Use Cases
Present an Overview of Python
Describe the Components of the Big Data Scientific Stack
Data Analysis with Python
Using HDFS Commands
Introduction to Spark REPLs and Zeppelin
Using Apache Mahout for Machine Learning
Explain What an RDD Is
Explain How RDDs are Partitioned
Create Manipulate and Restore RDDs
Use Spark SQL to Create Tables
Create an Application and Submit to the Cluster
Create and Manipulate RDDs
Create and Save DataFrames
Build and Submit Spark Applications
Describe Common Machine Learning Applications
List the Pros and Cons of Various Algorithms
Explain what Natural Language Processing is
Explain the Feature Engineering Capabilities of Spark MLlib
Use the Python Natural Language Toolkit (NLTK)
Classify text using NaÃƒÂ¯ve Bayes
Compute K-nearest neighbors
Creating a Spam Classifier with MLlib
Sentiment Analysis with Spark MLlib
REVIEWS ON OUR POPULAR COURSES