This is the fifth course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
This course introduces you to an area that few data scientists are able to experience: Deploying models for use in large enterprises. Apache Spark is a very commonly used framework for running machine learning models. Best practices for using Spark will be covered in this course. Best practices for data manipulation, model training, and model tuning will also be covered. The use case will call for the creation and deployment of a recommender system. The course wraps up with an introduction to model deployment technologies.
By the end of this course you will be able to:
1. Use Apache Spark’s RDDs, dataframes, and a pipeline
2. Employ spark-submit scripts to interface with Spark environments
3. Explain how collaborative filtering and content-based filtering work
4. Build a data ingestion pipeline using Apache Spark and Apache Spark streaming
5. Analyze hyperparameters in machine learning models on Apache Spark
6. Deploy machine learning algorithms using the Apache Spark machine learning interface
7. Deploy a machine learning model from Watson Studio to Watson Machine Learning
Who should take this course?
This course targets existing data science practitioners that have expertise building machine learning models, who want to deepen their skills on building and deploying AI in large enterprises. If you are an aspiring Data Scientist, this course is NOT for you as you need real world expertise to benefit from the content of these courses.
What skills should you have?
It is assumed that you have completed Courses 1 through 4 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.