The
Data Science
Workshop

Built by a team of experts to help you unlock your next promotion, reboot your career, or kick off your latest side project.
Cover for The Data Science Workshop

Get Started Today

You'll be up and running with data science in no time at all.

  • $34.99

    $34.99The Data Science Workshop

    Unlock one year of full, unlimited access and get started right away!
    Buy Now

Engineered for Success

Nobody likes going through hundreds of pages of dry theory, or struggling with uninteresting examples that don’t compile. We've got you covered. Any time, any device.

  • Learn by doing real-world development, supported by detailed step-by-step examples, screencasts and knowledge checks.

  • Become a verified practitioner, building your credentials by completing exercises, activities and assessment checks.

  • Manage your learning based on your personal schedule, with content structured to easily let you pause and progress at will.

Learn By Doing

You already know you want to learn data science, and the best way to learn data science is to learn by doing.

Learn how data science can help you develop cutting-edge machine learning models and unlock critical business insights in Python.

On Your Terms

Build up and reinforce key skills in a way that feels rewarding.

You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend training a model using sci-kit learn.

An Ideal Start

Fast-paced and direct, The Data Science Workshop is the ideal companion for newcomers.

You'll build and iterate on your code like a software developer, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.

Begin Your Journey

A simple, straightforward and pain-free way to learn data science.

  • $34.99

    $34.99The Data Science Workshop

    Unlock one year of full, unlimited access and get started right away!
    Buy Now

Everything You Need

Every Workshop includes access to dedicated support, course discussions and a wider learning community. Ask questions, share answers and work with the community to expand your learning.

  • Engage and participate in live user discussions, baked right into your course from start to finish. Share, learn and grow together.

  • Get live updates and interact directly with the product development, editorial and authoring teams across the Packt network.

  • Create, showcase and share your own learning outcomes and motivational stories across the entire workshop community.

Get Build-Ready

Every Workshop includes a whole host of features that work together to help you get the job done. You’ll be ready to tackle real-world development in no time.
  • Hack Your Brain

    We've applied the latest pedagogical techniques to deliver a truly multimodal experience. It'll keep you engaged and make the learning stick. It's science!

  • Build Real Things

    Nobody likes wasting their time. We cut right to the action and get you building real skills that real, working developers value. The perfect approach for a career move.

  • Learn From Experts

    We've paired technical experts with top editorial talent. They've worked hard to deliver you the maximum impact for each minute you spend learning. It's our secret sauce.

  • Verify Your Credentials

    You can become a verified practitioner. Complete the course to get a certificate. It's perfect for sharing on social media. Hello LinkedIn!

  • Receive Free Updates

    Technology keeps changing, and so do we. We keep versions updated independently, so you'll always have access. No more worrying about third-party release cycles.

  • Access Anywhere

    All you need is an internet connection. We've built every course so that it works on desktop and mobile, giving you options that fit within your schedule.

What's Inside

From A to Z, we've got you covered!

  • Workshop Onboarding

    • Welcome to The Data Science Workshop
    • Installation and Setup
    • Credits
  • 1. Introduction to Data Science in Python

    • Overview
    • Application of Data Science
    • Overview of Python
    • Exercise 1.01: Creating a Dictionary That Will Contain Machine Learning Algorithms
    • Exercise 1.01: Creating a Dictionary That Will Contain Machine Learning Algorithms
    • Python for Data Science
    • Exercise 1.02: Loading Data of Different Formats into a pandas DataFrame
    • Exercise 1.02: Loading Data of Different Formats into a pandas DataFrame
    • Scikit-Learn
    • Exercise 1.03: Predicting Breast Cancer from a Dataset Using sklearn
    • Exercise 1.03: Predicting Breast Cancer from a Dataset Using sklearn
    • Activity 1.01: Train a Spam Detector Algorithm
    • Summary
  • 2. Regression

    • Overview
    • Simple Linear Regression
    • Exercise 2.01: Loading and Preparing the Data for Analysis
    • Exercise 2.01: Loading and Preparing the Data for Analysis
    • Exercise 2.02: Graphical Investigation of Linear Relationships Using Python
    • Exercise 2.02: Graphical Investigation of Linear Relationships Using Python
    • Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python
    • Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python
    • Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels formula API
    • Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels formula API
    • Analyzing the Model Summary
    • Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels formula API
    • Exercise 2.05: Fitting a Multiple Linear Regression Model Using the Statsmodels formula API
    • Exercise 2.05: Fitting a Multiple Linear Regression Model Using the Statsmodels formula API
    • Assumptions of Regression Analysis
    • Activity 2.02: Fitting a Multiple Log-Linear Regression Model
    • Explaining the Results of Regression Analysis
    • Summary
  • 3. Binary Classification

    • Overview
    • Understanding the Business Context
    • Exercise 3.01: Loading and Exploring the Data from the Dataset
    • Exercise 3.01: Loading and Exploring the Data from the Dataset
    • Testing Business Hypotheses Using Exploratory Data Analysis
    • Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan
    • Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan
    • Intuitions from the Exploratory Analysis
    • Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
    • Feature Engineering
    • Exercise 3.03: Feature Engineering – Exploration of Individual Features
    • Exercise 3.03: Feature Engineering – Exploration of Individual Features
    • Exercise 3.04: Feature Engineering – Creating New Features from Existing Ones
    • Exercise 3.04: Feature Engineering – Creating New Features from Existing Ones
    • Data-Driven Feature Engineering
    • Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data
    • Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data
    • Skewness of Data
    • Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank
    • Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank
    • Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
    • Next Steps
    • Summary
  • 4. Multiclass Classification with RandomForest

    • Overview
    • Training a Random Forest Classifier
    • Evaluating the Model's Performance
    • Exercise 4.01: Building a Model for Classifying Animal Type and Assessing Its Performance
    • Exercise 4.01: Building a Model for Classifying Animal Type and Assessing Its Performance
    • Number of Trees Estimator
    • Exercise 4.02: Tuning n_estimators to Reduce Overfitting
    • Exercise 4.02: Tuning n_estimators to Reduce Overfitting
    • Maximum Depth
    • Exercise 4.03: Tuning max_depth to Reduce Overfitting
    • Exercise 4.03: Tuning max_depth to Reduce Overfitting
    • Minimum Sample in Leaf
    • Exercise 4.04: Tuning min_samples_leaf
    • Exercise 4.04: Tuning min_samples_leaf
    • Maximum Features
    • Exercise 4.05: Tuning max_features
    • Exercise 4.05: Tuning max_features
    • Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
    • Summary
    • Survey 1
  • 5. Performing Your First Cluster Analysis

    • Overview
    • Exercise 5.01: Performing Your First Clustering Analysis on the ATO Dataset
    • Exercise 5.01: Performing Your First Clustering Analysis on the ATO Dataset
    • Interpreting k-means Results
    • Exercise 5.02: Clustering Australian Postcodes by Business Income and Expenses
    • Exercise 5.02: Clustering Australian Postcodes by Business Income and Expenses
    • Choosing the Number of Clusters
    • Exercise 5.03: Finding the Optimal Number of Clusters
    • Exercise 5.03: Finding the Optimal Number of Clusters
    • Initializing Clusters
    • Exercise 5.04: Using Different Initialization Parameters to Achieve a Suitable Outcome
    • Exercise 5.04: Using Different Initialization Parameters to Achieve a Suitable Outcome
    • Calculating the Distance to the Centroid
    • Standardizing Data
    • Exercise 5.05: Finding the Closest Centroids in Our Dataset
    • Exercise 5.05: Finding the Closest Centroids in Our Dataset
    • Exercise 5.06: Standardizing the Data from Our Dataset
    • Exercise 5.06: Standardizing the Data from Our Dataset
    • Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
    • Summary
  • 6. How to Assess Performance

    • Overview
    • Exercise 6.01: Importing and Splitting Data
    • Exercise 6.01: Importing and Splitting Data
    • Assessing Model Performance for Regression Models
    • Exercise 6.02: Computing the R2 Score of a Linear Regression Model
    • Exercise 6.02: Computing the R2 Score of a Linear Regression Model
    • Exercise 6.03: Computing the MAE of a Model
    • Exercise 6.03: Computing the MAE of a Model
    • Exercise 6.04: Computing the Mean Absolute Error of a Second Model
    • Exercise 6.04: Computing the Mean Absolute Error of a Second Model
    • Exercise 6.05: Creating a Classification Model for Computing Evaluation Metrics
    • Exercise 6.05: Creating a Classification Model for Computing Evaluation Metrics
    • Exercise 6.06: Generating a Confusion Matrix for the Classification Model
    • Exercise 6.06: Generating a Confusion Matrix for the Classification Model
    • More on the Confusion Matrix
    • Exercise 6.07: Computing Precision for the Classification Model
    • Exercise 6.07: Computing Precision for the Classification Model
    • Exercise 6.08: Computing Recall for the Classification Model
    • Exercise 6.08: Computing Recall for the Classification Model
    • Exercise 6.09: Computing the F1 Score for the Classification Model
    • Exercise 6.09: Computing the F1 Score for the Classification Model
    • Exercise 6.10: Computing Model Accuracy for the Classification Model
    • Exercise 6.10: Computing Model Accuracy for the Classification Model
    • Exercise 6.11: Computing the Log Loss for the Classification Model
    • Exercise 6.11: Computing the Log Loss for the Classification Model
    • Exercise 6.12: Computing and Plotting ROC Curve for a Binary Classification Problem
    • Exercise 6.12: Computing and Plotting ROC Curve for a Binary Classification Problem
    • Exercise 6.13: Computing the ROC AUC for the Caesarian Dataset
    • Exercise 6.13: Computing the ROC AUC for the Caesarian Dataset
    • Exercise 6.14: Saving and Loading a Model
    • Exercise 6.14: Saving and Loading a Model
    • Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
    • Summary
  • 7. The Generalization of Machine Learning Models

    • Overview
    • Overfitting
    • Exercise 7.01: Importing and Splitting Data
    • Exercise 7.01: Importing and Splitting Data
    • Exercise 7.02: Setting a Random State When Splitting Data
    • Exercise 7.02: Setting a Random State When Splitting Data
    • Exercise 7.03: Creating a Five-Fold Cross-Validation Dataset
    • Exercise 7.03: Creating a Five-Fold Cross-Validation Dataset
    • Exercise 7.04: Creating a Five-Fold Cross-Validation Dataset Using a Loop for Calls
    • Exercise 7.04: Creating a Five-Fold Cross-Validation Dataset Using a Loop for Calls
    • Exercise 7.05: Getting the Scores from Five-Fold Cross-Validation
    • Exercise 7.05: Getting the Scores from Five-Fold Cross-Validation
    • Understanding Estimators That Implement CV
    • Exercise 7.06: Training a Logistic Regression Model Using Cross-Validation
    • Exercise 7.06: Training a Logistic Regression Model Using Cross-Validation
    • Hyperparameter Tuning with GridSearchCV
    • Exercise 7.07: Using Grid Search with Cross-Validation to Find the Best Parameters for a Model
    • Exercise 7.07: Using Grid Search with Cross-Validation to Find the Best Parameters for a Model
    • Exercise 7.08: Using Randomized Search for Hyperparameter Tuning
    • Exercise 7.08: Using Randomized Search for Hyperparameter Tuning
    • Exercise 7.09: Fixing Model Overfitting Using Lasso Regression
    • Exercise 7.09: Fixing Model Overfitting Using Lasso Regression
    • Exercise 7.10: Fixing Model Overfitting Using Ridge Regression
    • Exercise 7.10: Fixing Model Overfitting Using Ridge Regression
    • Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
    • Summary
  • 8. Hyperparameter Tuning

    • Overview
    • What Are Hyperparameters?
    • Exercise 8.01: Manual Hyperparameter Tuning for a k-NN Classifier
    • Exercise 8.01: Manual Hyperparameter Tuning for a k-NN Classifier
    • Advantages and Disadvantages of a Manual Search
    • GridSearchCV
    • Exercise 8.02: Grid Search Hyperparameter Tuning for a Support Vector Machine Classifier
    • Exercise 8.02: Grid Search Hyperparameter Tuning for a Support Vector Machine Classifier
    • Advantages and Disadvantages of Grid Search
    • Random Search
    • Exercise 8.03 Random Search Hyperparameter Tuning for a Random Forest Classifier
    • Exercise 8.03 Random Search Hyperparameter Tuning for a Random Forest Classifier
    • Advantages and Disadvantages of a Random Search
    • Activity 8.01: Is the Mushroom Poisonous?
    • Summary
  • 9. Interpreting a Machine Learning Model

    • Overview
    • Exercise 9.01: Extracting the Linear Regression Coefficient
    • Exercise 9.01: Extracting the Linear Regression Coefficient
    • RandomForest Variable Importance
    • Exercise 9.02: Extracting RandomForest Feature Importance
    • Exercise 9.02: Extracting RandomForest Feature Importance
    • Variable Importance via Permutation
    • Exercise 9.03: Extracting Feature Importance via Permutation
    • Exercise 9.03: Extracting Feature Importance via Permutation
    • Partial Dependence Plots
    • Exercise 9.04: Plotting Partial Dependence
    • Exercise 9.04: Plotting Partial Dependence
    • Local Interpretation with LIME
    • Exercise 9.05: Local Interpretation with LIME
    • Exercise 9.05: Local Interpretation with LIME
    • Activity 9.01: Train and Analyze a Network Intrusion Detection Model
    • Summary
    • Survey 2
  • 10. Analyzing a Dataset

    • Overview
    • Exploring Your Data
    • Exercise 10.01: Exploring the Ames Housing Dataset with Descriptive Statistics
    • Exercise 10.01: Exploring the Ames Housing Dataset with Descriptive Statistics
    • Exercise 10.02: Analyzing the Categorical Variables from the Ames Housing Dataset
    • Exercise 10.02: Analyzing the Categorical Variables from the Ames Housing Dataset
    • Summarizing Numerical Variables
    • Exercise 10.03: Analyzing Numerical Variables from the Ames Housing Dataset
    • Exercise 10.03: Analyzing Numerical Variables from the Ames Housing Dataset
    • Visualizing Your Data
    • Exercise 10.04: Visualizing the Ames Housing Dataset with Altair
    • Exercise 10.04: Visualizing the Ames Housing Dataset with Altair
    • Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
    • Summary
  • 11. Data Preparation

    • Overview
    • Handling Row Duplication
    • Exercise 11.01: Handling Duplicates in a Breast Cancer Dataset
    • Exercise 11.01: Handling Duplicates in a Breast Cancer Dataset
    • Exercise 11.02: Converting Data Types for the Ames Housing Dataset
    • Exercise 11.02: Converting Data Types for the Ames Housing Dataset
    • Exercise 11.03: Fixing Incorrect Values in the State Column
    • Exercise 11.03: Fixing Incorrect Values in the State Column
    • Handling Missing Values
    • Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset
    • Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset
    • Activity 11.01: Preparing the Speed Dating Dataset
    • Summary
  • 12. Feature Engineering

    • Overview
    • Merging Datasets
    • Exercise 12.01: Merging the ATO Dataset with the Postcode Data
    • Exercise 12.01: Merging the ATO Dataset with the Postcode Data
    • Exercise 12.02: Binning the YearBuilt variable from the AMES Housing dataset
    • Exercise 12.02: Binning the YearBuilt variable from the AMES Housing dataset
    • Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints
    • Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints
    • Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset
    • Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset
    • Activity 12.01: Feature Engineering on a Financial Dataset
    • Summary
  • 13. Imbalanced Datasets

    • Overview
    • Exercise 13.01: Benchmarking the Logistic Regression Model on the Dataset
    • Exercise 13.01: Benchmarking the Logistic Regression Model on the Dataset
    • Challenges of Imbalanced Datasets
    • Exercise 13.02: Implementing Random Undersampling and Classification on Our Banking Dataset to Find the Optimal Result
    • Exercise 13.02: Implementing Random Undersampling and Classification on Our Banking Dataset to Find the Optimal Result
    • Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result
    • Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result
    • Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result
    • Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result
    • Applying Balancing Techniques on a Telecom Dataset
    • Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
    • Summary
  • 14. Dimensionality Reduction

    • Overview
    • Exercise 14.01: Loading and Cleaning the Dataset
    • Exercise 14.01: Loading and Cleaning the Dataset
    • Creating a High-Dimensional Dataset
    • Activity 14.01: Fitting a Logistic Regression Model on a High-Dimensional Dataset
    • Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination
    • Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination
    • Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection
    • Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection
    • Principal Component Analysis (PCA)
    • Exercise 14.04: Dimensionality Reduction Using PCA
    • Exercise 14.04: Dimensionality Reduction Using PCA
    • Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis
    • Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis
    • Exercise 14.06: Dimensionality Reduction Using Factor Analysis
    • Exercise 14.06: Dimensionality Reduction Using Factor Analysis
    • Comparing Different Dimensionality Reduction Techniques
    • Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
    • Summary
  • 15. Ensemble Learning

    • Overview
    • Ensemble Learning
    • Exercise 15.01: Loading, Exploring, and Cleaning the Data
    • Exercise 15.01: Loading, Exploring, and Cleaning the Data
    • Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
    • Simple Methods for Ensemble Learning
    • Exercise 15.02: Ensemble Model Using the Averaging Technique
    • Exercise 15.02: Ensemble Model Using the Averaging Technique
    • Exercise 15.03: Ensemble Model Using the Weighted Averaging Technique
    • Exercise 15.03: Ensemble Model Using the Weighted Averaging Technique
    • Iteration 2 with Different Weights
    • Exercise 15.04: Ensemble Model Using Max Voting
    • Exercise 15.04: Ensemble Model Using Max Voting
    • Advanced Techniques for Ensemble Learning
    • Exercise 15.05: Ensemble Learning Using Bagging
    • Exercise 15.05: Ensemble Learning Using Bagging
    • Exercise 15.06: Ensemble Learning Using Boosting
    • Exercise 15.06: Ensemble Learning Using Boosting
    • Exercise 15.07: Ensemble Learning Using Stacking
    • Exercise 15.07: Ensemble Learning Using Stacking
    • Activity 15.02: Comparison of Advanced Ensemble Techniques
    • Summary
    • Survey 3
  • 16. Machine Learning Pipelines

    • Overview
    • Exercise 16.01: Preparing the Dataset to Implement Pipelines
    • Exercise 16.01: Preparing the Dataset to Implement Pipelines
    • Exercise 16.02: Applying Pipelines for Feature Extraction to the Dataset
    • Exercise 16.02: Applying Pipelines for Feature Extraction to the Dataset
    • Exercise 16.03: Adding Dimensionality Reduction to the Feature Extraction Pipeline
    • Exercise 16.03: Adding Dimensionality Reduction to the Feature Extraction Pipeline
    • Exercise 16.04: Modeling and Predictions Using ML Pipelines
    • Exercise 16.04: Modeling and Predictions Using ML Pipelines
    • Exercise 16.05: Spot-Checking Models Using ML Pipelines
    • Exercise 16.05: Spot-Checking Models Using ML Pipelines
    • Exercise 16.06: Grid Search and Cross-Validation with ML Pipelines
    • Exercise 16.06: Grid Search and Cross-Validation with ML Pipelines
    • Applying Pipelines to a Dataset
    • Activity 16.01: Complete ML Workflow in a Pipeline
    • Summary
  • 17. Automated Feature Engineering

    • Overview
    • Feature Engineering
    • Exercise 17.01: Defining Entities and Establishing Relationships
    • Exercise 17.01: Defining Entities and Establishing Relationships
    • Feature Engineering – Basic Operations
    • Exercise 17.02: Creating New Features Using Deep Feature Synthesis
    • Exercise 17.02: Creating New Features Using Deep Feature Synthesis
    • Exercise 17.03: Classification Model after Automated Feature Generation
    • Exercise 17.03: Classification Model after Automated Feature Generation
    • Featuretools on a New Dataset
    • Activity 17.01: Building a Classification Model with Features that have been Generated Using Featuretools
    • Summary
  • 18. Model as a Service with Flask

    • Overview
    • Building a Flask Web API
    • Exercise 18.01: Creating a Flask API with Endpoints
    • Exercise 18.01: Creating a Flask API with Endpoints
    • Deploying a Machine Learning Model
    • Exercise 18.02: Deploying a Model as a Web API
    • Exercise 18.02: Deploying a Model as a Web API
    • Adding Data Processing Logic
    • Exercise 18.03: Adding Data Processing Steps into a Web API
    • Exercise 18.03: Adding Data Processing Steps into a Web API
    • Activity 18.01: Train and Deploy an Income Predictor Model Using Flask
    • Summary
  • Activity Solutions

    • Activity 1.01: Train a Spam Detector Algorithm
    • Activity 1.01: Train a Spam Detector Algorithm
    • Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels formula API
    • Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels formula API
    • Activity 2.02: Fitting a Multiple Log-Linear Regression Model
    • Activity 2.02: Fitting a Multiple Log-Linear Regression Model
    • Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
    • Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
    • Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
    • Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
    • Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
    • Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
    • Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
    • Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
    • Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
    • Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
    • Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
    • Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
    • Activity 8.01: Is the Mushroom Poisonous?
    • Activity 8.01: Is the Mushroom Poisonous?
    • Activity 9.01: Train and Analyze a Network Intrusion Detection Model
    • Activity 9.01: Train and Analyze a Network Intrusion Detection Model
    • Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
    • Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
    • Activity 11.01: Preparing the Speed Dating Dataset
    • Activity 11.01: Preparing the Speed Dating Dataset
    • Activity 12.01: Feature Engineering on a Financial Dataset
    • Activity 12.01: Feature Engineering on a Financial Dataset
    • Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
    • Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
    • Activity 14.01: Fitting a Logistic Regression Model on a High-Dimensional Dataset
    • Activity 14.01: Fitting a Logistic Regression Model on a High-Dimensional Dataset
    • Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
    • Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
    • Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
    • Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
    • Activity 15.02: Comparison of Advanced Ensemble Techniques
    • Activity 15.02: Comparison of Advanced Ensemble Techniques
    • Activity 16.01: Complete ML Workflow in a Pipeline
    • Activity 16.01: Complete ML Workflow in a Pipeline
    • Activity 17.01: Building a Classification Model with Features that have been Generated Using Featuretools
    • Activity 17.01: Building a Classification Model with Features that have been Generated Using Featuretools
    • Activity 18.01 Train and Deploy an Income Predictor Model Using Flask
    • Activity 18.01 Train and Deploy an Income Predictor Model Using Flask FREE PREVIEW

Get Verified

Complete The Data Science Workshop to unlock your certificate.

You can unlock the certificate by completing the course. The credentials are easy to share, and are ideal for displaying on your LinkedIn profile.
A copy of a certificate for The Data Science Workshop

Take A Step Forward

There has never been a better time to start learning data science.

  • $34.99

    $34.99The Data Science Workshop

    Unlock one year of full, unlimited access and get started right away!
    Buy Now

Already Know
Data Science?

Don't worry, we've got your back with other languages and frameworks too!

Show me my options!