Data Science is one of the hottest skills in the IT industry. It involves gathering data, cleaning it, processing it, analyzing it, predicting, and visualizing the findings. This course will introduce you to the world of Data Science. Using one the most popular languages in the field, Python, the course will show you how to perform each of these tasks. It bridges the gap between mathematics and computer science, and takes you through the entire data science pipeline. You'll get to grips with machine learning, discover the statistical models that help you take control and navigate even the densest datasets, and find out how to create powerful visualizations that communicate what your data means.
We will begin by talking about Python and installation options. We will follow that up by introducing the Jupyter notebook and explaining how to use it. This tool allows sharing code and accompanying text. We will understand what cells are and how to manipulate them.
This will be followed by loading up data and performing some basic analysis using pandas. We will explore some of the interfaces that pandas provides, do some basic visualizations, and prepare our data for applying some machine learning models to it. At this point we will create some predictive models. We will evaluate the results and see if we can improve them. This will be using the scikit-learn library, a powerful industry standard for machine learning.
We will also use the YellowBrick library to visualize the results of this analysis. This powerful library gives us easy visualizations to determine how effective our models are.
Who should take this course
This session is for anyone who have programming experience and who know a bit of Python and is interested in starting the journey in the world of Data Science, or is interested in data visualization, modelling, or finding actionable insights from data!
What you will learn from this course
- Run a complete end to end analysis
- Use Jupyter notebook to perform EDA, visualization, and predictive modeling
- Ingest and clean data using pandas
- Create a predictive model using scikit-learn
- Visualize and evaluate the data and model using matplotlib and YellowBrick
Author, Speaker, Corporate trainer
About the instructor
Matt Harrison is an author, speaker, corporate trainer, and consultant. He authored the popular Learning the Pandas Library and Illustrated Guide to Python 3. He runs MetaSnake, which provides corporate and online training on Python and Data Science. In addition, he offers consulting services. He has worked on search engines, configuration management, storage, BI, predictive modeling, and in a variety of domains.