Course Outline
Introduction
- Overview of Spark and Hadoop features and architecture
- Understanding big data
- Python programming basics
Getting Started
- Setting up Python, Spark, and Hadoop
- Understanding data structures in Python
- Understanding PySpark API
- Understanding HDFS and MapReduce
Integrating Spark and Hadoop with Python
- Implementing Spark RDD in Python
- Processing data using MapReduce
- Creating distributed datasets in HDFS
Machine Learning with Spark MLlib
Processing Big Data with Spark Streaming
Working with Recommender Systems
Working with Kafka, Sqoop, Kafka, and Flume
Apache Mahout with Spark and Hadoop
Troubleshooting
Summary and Next Steps
Requirements
- Experience with Spark and Hadoop
- Python programming experience
Audience
- Data scientists
- Developers
Testimonials (4)
Examples/exercices perfectly adapted to our domain
Luc - CS Group
Course - Scaling Data Analysis with Python and Dask
The fact of having more practical exercises using more similar data to what we use in our projects (satellite images in raster format)
Matthieu - CS Group
Course - Scaling Data Analysis with Python and Dask
A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution
Rafał - Nordea
Course - Apache Spark MLlib
The trainer was very available to answer all te kind of question I did