IBM Data Analyst Capstone Project


By completing this final capstone project you will apply various Data Analytics skills and techniques that you have learned as part of the previous courses in the IBM Data Analyst Professional Certificate. You will assume the role of an Associate Data Analyst who has recently joined the organization and be presented with a business challenge that requires data analysis to be performed on real-world datasets.

You will perform the various tasks that professional data analysts do as part of their jobs, including:
– Data collection from multiple sources
– Data wrangling and data preparation
– Exploratory data analysis
– Statistical analysis and data mining
– Data visualization with different charts and plots, and
– Interactive dashboard creation.
The project will culminate with a presentation of your data analysis report for various stakeholders in the organization. The report will include an executive summary, your analysis, and a conclusion. You will be assessed on both your work for the various stages in the Data Analysis process, as well as the final deliverable.
As part of this project you will demonstrate your proficiency with using Jupyter Notebooks, SQL, Relational Databases (RDBMS), Business Intelligence (BI) tools like Cognos, and Python Libraries such as Pandas, Numpy, Scikit-learn, Scipy, Matplotlib, Seaborn and others.
This project is a great addition to your portfolio and an opportunity to showcase your Data Analytics skills to prospective employers.

What you will learn

Data Collection

Data Collection is the first step in solving any analysis problem and can be collected in many formats and from many sources. In the first module of the Capstone, we will collect data by scraping the internet and using web APIs.

Data Wrangling

In this module, you will be focusing on the cleaning of your dataset with various techniques. With these techniques you will be identifying duplicate rows, finding missing values, and normalizing the data.

Exploratory Data Analysis

In this module, begin working with the cleaned dataset from the previous module. You will now begin to analyze the dataset to find the distribution of data, presence of outliers and the correlation between different columns.

Data Visualization

In module 4 of the Capstone, you will be required to create visualizations using the developer survey data. The visualizations you create should highlight the distribution of data, relationships between data, the composition of data, and comparison of data.

What’s included