The
Data Science
Workshop

Built by a team of experts to help you unlock your next promotion, reboot your career, or kick off your latest side project.

Get Started Today

You'll be up and running with data science in no time at all.

$39.99

$39.99The Data Science Workshop

Unlock one year of full, unlimited access and get started right away!

Buy Now

Engineered for Success

Nobody likes going through hundreds of pages of dry theory, or struggling with uninteresting examples that don’t compile. We've got you covered. Any time, any device.

Learn by doing real-world development, supported by detailed step-by-step examples, screencasts and knowledge checks.
Become a verified practitioner, building your credentials by completing exercises, activities and assessment checks.
Manage your learning based on your personal schedule, with content structured to easily let you pause and progress at will.

Learn By Doing

You already know you want to learn data science, and the best way to learn data science is to learn by doing.

Learn how data science can help you develop cutting-edge machine learning models and unlock critical business insights in Python.

On Your Terms

Build up and reinforce key skills in a way that feels rewarding.

You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend training a model using sci-kit learn.

An Ideal Start

Fast-paced and direct, The Data Science Workshop is the ideal companion for newcomers.

You'll build and iterate on your code like a software developer, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.

Begin Your Journey

A simple, straightforward and pain-free way to learn data science.

$39.99

$39.99The Data Science Workshop

Unlock one year of full, unlimited access and get started right away!

Buy Now

Everything You Need

Every Workshop includes access to dedicated support, course discussions and a wider learning community. Ask questions, share answers and work with the community to expand your learning.

Engage and participate in live user discussions, baked right into your course from start to finish. Share, learn and grow together.
Get live updates and interact directly with the product development, editorial and authoring teams across the Packt network.
Create, showcase and share your own learning outcomes and motivational stories across the entire workshop community.

Get Build-Ready

Every Workshop includes a whole host of features that work together to help you get the job done. You’ll be ready to tackle real-world development in no time.

Hack Your Brain

We've applied the latest pedagogical techniques to deliver a truly multimodal experience. It'll keep you engaged and make the learning stick. It's science!
Build Real Things

Nobody likes wasting their time. We cut right to the action and get you building real skills that real, working developers value. The perfect approach for a career move.
Learn From Experts

We've paired technical experts with top editorial talent. They've worked hard to deliver you the maximum impact for each minute you spend learning. It's our secret sauce.
Verify Your Credentials

You can become a verified practitioner. Complete the course to get a certificate. It's perfect for sharing on social media. Hello LinkedIn!
Receive Free Updates

Technology keeps changing, and so do we. We keep versions updated independently, so you'll always have access. No more worrying about third-party release cycles.
Access Anywhere

All you need is an internet connection. We've built every course so that it works on desktop and mobile, giving you options that fit within your schedule.

What's Inside

From A to Z, we've got you covered!

Workshop Onboarding
1. Introduction to Data Science in Python
2. Regression
- Overview
- Simple Linear Regression
- Exercise 2.01: Loading and Preparing the Data for Analysis
- The Correlation Coefficient
- Exercise 2.02: Graphical Investigation of Linear Relationships Using Python
- Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python
- The Statsmodels Formula API
- Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels Formula API
- Analyzing the Model Summary
- Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels Formula API
- Activity 2.01: Fitting a Log-Linear Model Using the statsmodels Formula API
- Multiple Regression Analysis
- Exercise 2.05: Fitting a Multiple Linear Regression Model Using the Statsmodels Formula API
- Assumptions of Regression Analysis
- Activity 2.02: Fitting a Multiple Log-Linear Regression Model
- Activity 2.02: Fitting a Multiple Log-Linear Regression Model
- Explaining the Results of Regression Analysis
- Summary
3. Binary Classification
- Overview
- Understanding the Business Context
- Exercise 3.01: Loading and Exploring the Data from the Dataset
- Testing Business Hypotheses Using Exploratory Data Analysis
- Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan
- Intuitions from the Exploratory Analysis
- Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
- Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
- Feature Engineering
- Exercise 3.03: Feature Engineering – Exploration of Individual Feature
- Exercise 3.04: Feature Engineering – Creating New Features from Existing Ones
- Data-Driven Feature Engineering
- Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data
- Skewness of Data
- Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank
- Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
- Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
- Next Steps
- Summary
4. Multiclass Classification with RandomForest
- Overview
- Training a Random Forest Classifier
- Evaluating the Model's Performance
- Exercise 4.01: Building a Model for Classifying Animal Type and Assessing Its Performance
- Number of Trees Estimator
- Exercise 4.02: Tuning n_estimators to Reduce Overfitting
- Maximum Depth
- Exercise 4.03: Tuning max_depth to Reduce Overfitting
- Minimum Sample in Leaf
- Exercise 4.04: Tuning min_samples_leaf
- Maximum Features
- Exercise 4.05: Tuning max_features
- Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
- Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
- Summary
- Survey I
5. Performing Your First Cluster Analysis
- Overview
- Clustering with k-means
- Exercise 5.01: Performing Your First Clustering Analysis on the ATO Dataset
- Interpreting k-means Results
- Exercise 5.02: Clustering Australian Postcodes by Business Income and Expenses
- Choosing the Number of Clusters
- Exercise 5.03: Finding the Optimal Number of Clusters
- Initializing Clusters
- Exercise 5.04: Using Different Initialization Parameters to Achieve a Suitable Outcome
- Calculating the Distance to the Centroid
- Exercise 5.05: Finding the Closest Centroids in Our Dataset
- Standardizing Data
- Exercise 5.06: Standardizing the Data from Our Dataset
- Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
- Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
- Summary
6. How to Assess Performance
- Overview
- Splitting Data
- Exercise 6.01: Importing and Splitting Data
- Assessing Model Performance for Regression Models
- Exercise 6.02: Computing the R2 Score of a Linear Regression Model
- Mean Absolute Error
- Exercise 6.03: Computing the MAE of a Model
- Exercise 6.04: Computing the Mean Absolute Error of a Second Model
- Other Evaluation Metrics
- Exercise 6.05: Creating a Classification Model for Computing Evaluation Metrics
- Exercise 6.06: Generating a Confusion Matrix for the Classification Model
- More on the Confusion Matrix
- Exercise 6.07: Computing Precision for the Classification Model
- Recall
- Exercise 6.08: Computing Recall for the Classification Model
- F1 Score
- Exercise 6.09: Computing the F1 Score for the Classification Model
- Exercise 6.10: Computing Model Accuracy for the Classification Model
- Exercise 6.11: Computing the Log Loss for the Classification Model
- Receiver Operating Characteristic Curve
- Exercise 6.12: Computing and Plotting ROC Curve for a Binary Classification Problem
- Exercise 6.13: Computing the ROC AUC for the Caesarian Dataset
- Saving and Loading Models
- Exercise 6.14: Saving and Loading a Model
- Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
- Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
- Summary
7. The Generalization of Machine Learning Models
- Overview
- Overfitting
- Exercise 7.01: Importing and Splitting Data
- Random State
- Exercise 7.02: Setting a Random State When Splitting Data
- Cross-Validation
- Exercise 7.03: Creating a Five-Fold Cross-Validation Dataset
- Exercise 7.04: Creating a Five-Fold Cross-Validation Dataset Using a Loop for Calls
- cross_val_score
- Exercise 7.05: Getting the Scores from Five-Fold Cross-Validation
- Understanding Estimators That Implement CV
- Exercise 7.06: Training a Logistic Regression Model Using Cross-Validation
- Hyperparameter Tuning with GridSearchCV
- Exercise 7.07: Using Grid Search with Cross-Validation to Find the Best Parameters for a Model
- Hyperparameter Tuning with RandomizedSearchCV
- Exercise 7.08: Using Randomized Search for Hyperparameter Tuning
- Model Regularization with Lasso Regression
- Exercise 7.09: Fixing Model Overfitting Using Lasso Regression
- Ridge Regression
- Exercise 7.10: Fixing Model Overfitting Using Ridge Regression
- Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
- Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
- Summary
8. Hyperparameter Tuning
- Overview
- What Are Hyperparameters?
- Exercise 8.01: Manual Hyperparameter Tuning for a k-NN Classifier
- Advantages and Disadvantages of a Manual Search
- GridSearchCV
- Exercise 8.02: Grid Search Hyperparameter Tuning for an SVM
- Advantages and Disadvantages of Grid Search
- Random Search
- Exercise 8.03: Random Search Hyperparameter Tuning for a Random Forest Classifier
- Advantages and Disadvantages of a Random Search
- Activity 8.01: Is the Mushroom Poisonous?
- Activity 8.01: Is the Mushroom Poisonous?
- Summary
9. Interpreting a Machine Learning Model
- Overview
- Linear Model Coefficients
- Exercise 9.01: Extracting the Linear Regression Coefficient
- RandomForest Variable Importance
- Exercise 9.02: Extracting RandomForest Feature Importance
- Variable Importance via Permutation
- Exercise 9.03: Extracting Feature Importance via Permutation
- Partial Dependence Plots
- Exercise 9.04: Plotting Partial Dependence
- Local Interpretation with LIME
- Exercise 9.05: Local Interpretation with LIME
- Activity 9.01: Train and Analyze a Network Intrusion Detection Model
- Activity 9.01: Train and Analyze a Network Intrusion Detection Model
- Summary
- Survey II
10. Analyzing a Dataset
- Overview
- Exploring Your Data
- Exercise 10.01: Exploring the Ames Housing Dataset with Descriptive Statistics
- Analyzing the Content of a Categorical Variable
- Exercise 10.02: Analyzing the Categorical Variables from the Ames Housing Dataset
- Summarizing Numerical Variables
- Exercise 10.03: Analyzing Numerical Variables from the Ames Housing Dataset
- Visualizing Your Data
- Exercise 10.04: Visualizing the Ames Housing Dataset with Altair
- Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
- Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
- Summary
11. Data Preparation
- Overview
- Handling Row Duplication
- Exercise 11.01: Handling Duplicates in a Breast Cancer Dataset
- Converting Data Types
- Exercise 11.02: Converting Data Types for the Ames Housing Dataset
- Handling Incorrect Values
- Exercise 11.03: Fixing Incorrect Values in the State Column
- Handling Missing Values
- Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset
- Activity 11.01: Preparing the Speed Dating Dataset
- Activity 11.01: Preparing the Speed Dating Dataset
- Summary
12. Feature Engineering
- Overview
- Merging Datasets
- Exercise 12.01: Merging the ATO Dataset with the Postcode Data
- Binning Variables
- Exercise 12.02: Binning the YearBuilt Variable from the AMES Housing Dataset
- Manipulating Dates
- Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints
- Performing Data Aggregation
- Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset
- Activity 12.01: Feature Engineering on a Financial Dataset
- Activity 12.01: Feature Engineering on a Financial Dataset
- Summary
13. Imbalanced Datasets
- Overview
- Understanding the Business Context
- Exercise 13.01: Benchmarking the Logistic Regression Model on the Dataset
- Analysis of the Result
- Exercise 13.02: Implementing Random Undersampling and Classification on Our Banking Dataset to Find the Optimal Result
- Analysis
- Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result
- Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result
- Applying Balancing Techniques on a Telecom Dataset
- Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
- Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
- Summary
14. Dimensionality Reduction
- Overview
- Business Context
- Exercise 14.01: Loading and Cleaning the Dataset
- Creating a High-Dimensional Dataset
- Activity 14.01: Fitting a Logistic Regression Model on a High-Dimensional Dataset
- Activity 14.01: Fitting a Logistic Regression Model on a High-Dimensional Dataset
- Strategies for Addressing High-Dimensional Datasets
- Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination
- Forward Feature Selection
- Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection
- Principal Component Analysis (PCA)
- Exercise 14.04: Dimensionality Reduction Using PCA
- Independent Component Analysis (ICA)
- Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis
- Factor Analysis
- Exercise 14.06: Dimensionality Reduction Using Factor Analysis
- Comparing Different Dimensionality Reduction Techniques
- Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
- Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
- Summary
15. Ensemble Learning
- Overview
- Ensemble Learning
- Exercise 15.01: Loading, Exploring, and Cleaning the Data
- Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
- Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
- Simple Methods for Ensemble Learning
- Exercise 15.02: Ensemble Model Using the Averaging Technique
- Weighted Averaging
- Exercise 15.03: Ensemble Model Using the Weighted Averaging Technique
- Iteration 2 with Different Weights
- Exercise 15.04: Ensemble Model Using Max Voting
- Advanced Techniques for Ensemble Learning
- Exercise 15.05: Ensemble Learning Using Bagging
- Boosting
- Exercise 15.06: Ensemble Learning Using Boosting
- Stacking
- Exercise 15.07: Ensemble Learning Using Stacking
- Activity 15.02: Comparison of Advanced Ensemble Techniques
- Activity 15.02: Comparison of Advanced Ensemble Techniques
- Summary
- Survey III
16. Machine Learning Pipelines
17. Automated Feature Engineering
18. Model as a Service with Flask

Get Verified

Complete The Data Science Workshop to unlock your certificate.

You can unlock the certificate by completing the course. The credentials are easy to share, and are ideal for displaying on your LinkedIn profile.

A copy of a certificate for The Data Science Workshop

Take A Step Forward

There has never been a better time to start learning data science.

$39.99

$39.99The Data Science Workshop

Unlock one year of full, unlimited access and get started right away!

Buy Now

Already Know
Data Science?

Don't worry, we've got your back with other languages and frameworks too!

Show me my options!

$39.99

$39.99The Data Science Workshop

$39.99

$39.99The Data Science Workshop

Hack Your Brain

Build Real Things

Learn From Experts

Verify Your Credentials

Receive Free Updates

Access Anywhere

$39.99

$39.99The Data Science Workshop