Learn how to apply statistical learning techniques to real-time event-driven data in Python by integrating distributed machine learning models with scalable, high-throughput and fault-tolerant streaming platforms.
This course provides a hands-on exploration of the industry-standard Apache Kafka distributed streaming platform and how it can be integrated with distributed machine learning models via Apache Spark and its Structured Streaming engine in order to build high-throughput and low-latency real-time machine learning systems. This course follows on from our Applied Machine Learning and Distributed Machine Learning courses, and enables experienced senior data scientists and data engineers to learn from event-driven data and make predictions in real-time. This course also provides guidance on real-time architectural patterns, as well as how to build real-time continuous feedback loops in order to automate the training of machine learning models based on the actions of system users and customers.
- 1. Introduction to Apache Kafka
- 2. Apache Kafka and Python
- 3. Apache Spark Structured Streaming
- 4. Real-Time Regression
- 5. Real-Time Classification
- 6. Real-Time Clustering
- 7. Real-Time Collaborative Filtering
- 8. Real-Time Feedback and Training
- 9. Real-Time Architectural Patterns
- Introduction to Python or equivalent.
- Applied Machine Learning or equivalent.
- Distributed Machine Learning or equivalent.
- The ability to apply statistical learning techniques to event-driven data in real-time.
- The ability to integrate distributed machine learning models with distributed streaming platforms in order to learn from data and make predictions in real-time.
- The ability to build feedback loops to enable automated updates to machine learning models.
- Knowledge of real-time architectural patterns and best-practice.
- Knowledge of the industry-standard Apache Spark Structured Streaming engine and Apache Kafka distributed streaming platform.