This course is designed for business professionals that want to learn how to analyze data to gain insight, use statistical analysis methods to explore the underlying distribution of data, use visualizations such as histograms, scatter plots, and maps to analyze data and preprocess data to produce a dataset ready for training.
The typical student in this course will have several years of experience with computing technology, including some aptitude in computer programming.
What you will learn
In the previous course in this specialization, you conducted extract, transform, and load (ETL) to ensure your data was ready for the next phase of the data science process: analysis. In some cases, an analysis of the data may be the actual final goal of the project, or it may be an important intermediary step on the road to machine learning. In either case, analyzing your data using various techniques will help you obtain useful insights into that data and what it represents. It’ll also give you a better understanding of how the data needs to undergo more processing to prepare it for machine learning. You’ll begin your analysis efforts by exploring the nature of your dataset and the relationships it contains.
Explore the Underlying Distribution of Data
One of the key factors in data analysis is determining how values are spread out within each of the different features. This will give you a deeper understanding of how the data is represented and how it might need to change.
Use Visualizations to Analyze Data
In this module, you’ll look at your data from a visual perspective in order to reveal insights that raw numbers alone may not provide.
Your analysis efforts will most likely prompt you to transform your data further, especially in preparation for machine learning. In this topic, you’ll do just that.