Beginning Data Analysis with Python and Jupyter

Create reproducible data analyses with Python and Jupyter Notebook

This course is curated from a live training session conducted for 8 hours across two days and it is accompanied with Student guide and a completion certificate.

Description

Data science is becoming increasingly popular as industries continue to value its importance. Recent advancements in open source software have made this discipline accessible to a wide range of people. Python is a popular choice for most data scientists owing to its ease of use and versatile nature. On the other hand, Jupyter Notebook acts as an add-on tool—a virtual playground—that allows you to create and share live codes, equations, visualizations, text. Both these tools offer abstractions to programmatically intense algorithms, allowing you to better conceptualize the problems you are faced with and reduce the amount of programming required for the solutions.

The goal of this training course is to help you get the most out of Python and Jupyter Notebook to complete the trickiest of tasks in data science quickly and effortlessly. By touching on a variety of topics within the discipline, you’ll be exposed to many interesting examples with real-world-like data.

 The training course starts with the basics of Jupyter, which will be the backbone of the course. After familiarizing yourselves with its standard features, you'll look at an example of it in practice with your first analysis. The next lesson dives right into predictive analytics, where you'll implement multiple classification algorithms. Finally, you'll look at data collection techniques. You'll also learn how web data can be acquired with scraping techniques and via APIs, and then briefly explore interactive visualizations.

What is this training about, and why is it important?

This fast-paced practical two-day training course focuses on solving challenges presented by data science in a manner that is simple to conceptualize and easy to implement. This is achieved by leveraging Python libraries that offer abstractions to complicated underlying algorithms. The result is that data science becomes very approachable for beginners, a fact which is reflected by the strength and growing popularity of the Python ecosystem.

What you’ll learn—and how you can apply it

  • Identify areas of investigation within a data set
  • Develop a plan for doing data science
  • Define exploratory analysis
  • Prepare data for modeling
  • Implement predictive analytics
  • Collect data with web scraping
  • Explore various data visualization techniques

This Online Training is for you because…

This course will be most applicable to professionals and students interested in data analysis. The topics covered are relevant to a variety of job descriptions across a large range of industries.

Prerequisites

  • For the best experience in this training course, you should have knowledge of programming fundamentals and some experience with Python. In particular, having some familiarity with the Python libraries Pandas, Matplotlib, and scikit-learn will be useful.

Materials, downloads, or Supplemental Content needed in advance

  • Minimum Hardware Requirements

    For successful completion of this training, you will require the following:

    • Processor: Pentium 4 (or equivalent)
    • Memory:  2 GB RAM
    • Hard disk: 10 GB
    • Internet connection
  • Recommended Hardware Requirements

    For an optimal experience with hands-on labs and other practical activities, we recommend:

    • Processor: Intel i5 (or equivalent)
    • Memory:  8 GB RAM
    • Hard disk: 10 GB
    • Internet connection
  • Software Requirements

    • Python 3.5+
    • Anaconda 4.3+
  • Python libraries included with Anaconda installation:

    • matplotlib 2.1.0+
    • ipython 6.1.0+
    • requests 2.18.4+
    • beautifulsoup4 4.6.0+
    • numpy 1.13.1+
    • pandas 0.20.3+
    • scikit-learn 0.19.0+
    • seaborn 0.8.0+
    • bokeh 0.12.10+
  • Python libraries that require manual installation:

     mlxtend, version_information, ipython-sql, pdir2, graphviz


Schedule

The time frames are only estimates and may vary according to how the class is progressing

DAY 1 (~4 hours)

Section 1: Basic Jupyter Notebook Functionality and Features (1 hour 15 min - instructor lecture + Q&A)

  • Learn what a Jupyter Notebook is and why it’s useful for data analysis
  • See basic commands used to operate a Jupyter Notebook
  • Learn about more advanced usage features like magic functions, tab completion and keyboard shortcuts
  • Review the Python libraries we’ll be using extensively in this course, including pandas, scikit-learn and matplotlib

Section 2: The Boston Housing Dataset Analysis (1 hour 45 min - instructor lecture + Q&A)

  • Get hands-on experience with Jupyter Notebooks data analysis, as we walk through an introductory example
  • Learn how Pandas DataFrames are used with Jupyter to explore data
  • Lean how predictive models can be created with Jupyter using scikit-learn
  • Learn advanced data visualization techniques like pairwise scatter plots and segmented distributions

Section 3: Preparing to Train a Predictive Model (1 hour 15 min - instructor lecture + Q&A)

  • Discuss the steps for planning a classification strategy and learn about data preprocessing techniques
  • Get hands on experience with data preprocessing in Jupyter using Pandas, including filling missing data, converting from categorical to numeric features and splitting data into training and testing sets

DAY 2 (~4 hours)

Section 4: Training Predictive Models for the Employee-Retention Problem (2 hours - instructor lecture + Q&A)

  • Get hands-on experience with solving a real-world problem using machine learning
  • Apply previously learned data preprocessing methods to a realistic dataset
  • Train SVM, kNN and Random Forest models with Jupyter using scikit-learn, and compare their predictive capabilities using decision boundary visualizations
  • Lean how to assess model performance with k-Fold cross validation and validation curves
  • Learn about dimensionality reduction techniques like Principal Component Analysis (PCA)

Section 5: Web Scraping (1 hour 30 min - instructor lecture + Q&A)

  • Analyze how HTTP requests work and how to parse HTML responses
  • See how Jupyter can be used to programmatically scrape data from a webpage
  • Utilize Pandas DataFrames to store, transform and merge tables of web data

Section 6: Interactive Visualizations (30 min - instructor lecture + Q&A)

  • Create and explore interactive visualizations directly inside Jupyter Notebooks

Wrap-up:  Summary, Discussions (30 min)

  • Interactive Discussion

Course Curriculum

Beginning Data Analysis with Python and Jupyter

What's included?

2 Videos
2 Texts
1 PDF
8.0
Alex Galea
Alex Galea
Data Analyst and Python Expert

About the instructor

Alex has been doing data analysis professionally since graduating with an M.Sc in Physics at the University of Guelph in Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. More recently, Alex has been doing web-data analytics, where Python has continued to play a large part in his work. He frequently blogs about work and personal projects, which are generally data centric and usually involve Python and Jupyter Notebooks.