Practical Data Science on the AWS Cloud Specialization

Description

Development environments might not have the exact requirements as production environments. Moving data science and machine learning projects from idea to production requires state-of-the-art skills. You need to architect and implement your projects for scale and operational efficiency. Data science is an interdisciplinary field that combines domain knowledge with mathematics, statistics, data visualization, and programming skills.
The Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker.
This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages who want to learn how to build, train, and deploy scalable, end-to-end ML pipelines – both automated and human-in-the-loop – in the AWS cloud.
Each of the 10 weeks features a comprehensive lab developed specifically for this Specialization that provides hands-on experience with state-of-the-art algorithms for natural language processing (NLP) and natural language understanding (NLU), including BERT and FastText using Amazon SageMaker.

By the end of this Specialization, you will be ready to: • Ingest, register, and explore datasets• Detect statistical bias in a dataset• Automatically train and select models with AutoML• Create machine learning features from raw data• Save and manage features in a feature store• Train and evaluate models using built-in algorithms and custom BERT models• Debug, profile, and compare models to improve performance• Build and run a complete ML pipeline end-to-end• Optimize model performance using hyperparameter tuning• Deploy and monitor models• Perform data labeling at scale• Build a human-in-the-loop pipeline to improve model performance• Reduce cost and improve performance of data products

What’s included