The
Reinforcement
Learning
Workshop

Start with the basics of reinforcement learning and explore deep learning concepts such as deep Q-learning, deep recurrent Q-networks, and policy-based methods with this practical guide.

Get Started with Reinforcement Learning Today

You'll be up and running with reinforcement learning in no time at all.

$44.99

$44.99The Reinforcement Learning Workshop

Unlock one year of full, unlimited access!

Buy Now

Learning Made Simple

Nobody likes going through hundreds of pages of dry theory, or struggling with uninteresting examples that don’t compile. We've got you covered. Any time, any device.

Learn by doing real-world development, supported every step of the way with step-by-step examples and expert screencasts.
Become a verified practitioner and earn an authenticated digital certificate from Packt upon successful completion.
Manage your learning based on your personal schedule, with content that lets you pause and resume your progress at will.

A Smarter Way to Learn RL

A step-by-step, focused approach to getting up and running with real-world reinforcement learning in no time at all.

The Reinforcement Learning Workshop is ideal if you're looking for a structured, hands-on approach to get started with reinforcement learning.

Buy Now

Course Curriculum

An A to Z tour of reinforcement learning.

Workshop Onboarding
1. Introduction to Reinforcement Learning
2. Markov Decision Processes and Bellman Equations
- Overview
- Markov Processes
- Markov Chains
- Value Functions and Bellman Equations for MRPs
- Exercise 2.01: Finding the Value Function in an MRP
- Markov Decision Processes
- The State-Value Function and the Action-Value Function
- Bellman Optimality Equation
- Solving MDPs
- Exercise 2.02: Determining the Best Policy for an MDP Using Linear Programming
- Gridworld
- Activity 2.01: Solving Gridworld
- Activity 2.01: Solving Gridworld
- Summary
3. Deep Learning in Practice with TensorFlow 2
- Overview
- An Introduction to TensorFlow and Keras
- Keras
- Exercise 3.01: Building a Sequential Model with the Keras High-Level API
- How to Implement a Neural Network Using TensorFlow
- Loss Function Definition
- Learning Rate Scheduling
- Model Validation
- Model Improvement
- Standard Fully Connected Neural Networks
- Exercise 3.02: Building a Fully Connected Neural Network Model with the Keras High-Level API
- Convolutional Neural Networks
- Exercise 3.03: Building a Convolutional Neural Network Model with the Keras High-Level API
- Recurrent Neural Networks
- Exercise 3.04: Building a Recurrent Neural Network Model with the Keras High-Level API
- Simple Regression Using TensorFlow
- Exercise 3.05: Creating a Deep Neural Network to Predict the Fuel Efficiency of Cars
- Simple Classification Using TensorFlow
- Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson
- TensorBoard – How to Visualize Data Using TensorBoard
- Exercise 3.07: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for the Higgs Boson Using TensorBoard for Visualization
- Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
- Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
- Summary
4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning
- Overview
- OpenAI Gym
- How to Interact with a Gym Environment
- Exercise 4.01: Interacting with the Gym Environment
- Action and Observation Spaces
- How to Implement a Custom Gym Environment
- OpenAI Universe – Complex Environment
- Environments
- Running an OpenAI Universe Environment
- TensorFlow for Reinforcement Learning
- Exercise 4.02: Building a Policy Network with TensorFlow
- Exercise 4.03: Feeding the Policy Network with Environment State Representation
- How to Save a Policy Network
- OpenAI Baselines
- Training an RL Agent to Solve a Classic Control Problem
- Exercise 4.04: Solving a CartPole Environment with the PPO Algorithm
- Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
- Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
- Summary
- Survey I
5. Dynamic Programming
- Overview
- Solving Dynamic Programming Problems
- Memoization
- Exercise 5.01: Memoization in Practice
- Exercise 5.02: The Tabular Method in Practice
- Identifying Dynamic Programming Problems
- Exercise 5.03: Solving the Coin-Change Problem
- Dynamic Programming in RL
- OpenAI Gym: Taxi-v3 Environment
- Policy Iteration
- Value Iteration
- Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
- Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
- Summary
6. Monte Carlo Methods
- Overview
- The Workings of Monte Carlo Methods
- Exercise 6.01: Implementing Monte Carlo in Blackjack
- Types of Monte Carlo Methods
- Exercise 6.02: First Visit Monte Carlo Prediction for Estimating the Value Function in Blackjack
- Every Visit Monte Carlo Prediction for Estimating the Value Function
- Exercise 6.03: Every Visit Monte Carlo Prediction for Estimating the Value Function
- Exploration versus Exploitation Trade-Off
- The Pseudocode for Monte Carlo Off-Policy Evaluation
- Exercise 6.04: Importance Sampling with Monte Carlo
- Solving Frozen Lake Using Monte Carlo
- Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
- Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
- The Pseudocode for Every Visit Monte Carlo Control for Epsilon Soft
- Activity 6.02: Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
- Activity 6.02: Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
- Summary
7. Temporal Difference Learning
- Overview
- TD(0) – SARSA and Q-Learning
- Exercise 7.01: Using TD(0) SARSA to Solve FrozenLake-v0 Deterministic Transitions
- The Stochasticity Test
- Exercise 7.02: Using TD(0) SARSA to Solve FrozenLake-v0 Stochastic Transitions
- Q-Learning – Off-Policy Control
- Exercise 7.03: Using TD(0) Q-Learning to Solve FrozenLake-v0 Deterministic Transitions
- Expected SARSA
- N-Step TD and TD(λ) Algorithms
- N-step SARSA
- TD(λ)
- Exercise 7.04: Using TD(λ) SARSA to Solve FrozenLake-v0 Deterministic Transitions
- Exercise 7.05: Using TD(λ) SARSA to Solve FrozenLake-v0 Stochastic Transitions
- The Relationship between DP, Monte-Carlo, and TD Learning
- Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
- Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
- Summary
8. The Multi-Armed Bandit Problem
- Overview
- Formulation of the MAB Problem
- Background and Terminology
- The Python Interface
- The Greedy Algorithm
- The Explore-then-Commit Algorithm
- The ε-Greedy Algorithm
- Exercise 8.01: Implementing the ε-Greedy Algorithm
- The Softmax Algorithm
- The UCB Algorithm
- Exercise 8.02: Implementing the UCB Algorithm
- Thompson Sampling
- The Thompson Sampling Algorithm
- Exercise 8.03: Implementing the Thompson Sampling Algorithm
- Contextual Bandits
- Queueing Bandits
- Activity 8.01: Queueing Bandits
- Activity 8.01: Queueing Bandits
- Summary
- Survey II
9. What Is Deep Q Learning?
- Overview
- Basics of Deep Learning
- Basics of PyTorch
- Exercise 9.01: Building a Simple Deep Learning Model in PyTorch
- PyTorch Utilities
- The State-Value Function and the Bellman Equation
- The Action-Value Function (Q Value Function)
- Implementing Q Learning to Find Optimal Actions
- OpenAI Gym Review
- Exercise 9.02: Implementing the Q Learning Tabular Method
- Deep Q Learning
- Exercise 9.03: Implementing a Working DQN Network with PyTorch in a CartPole-v0 Environment
- Challenges in DQN
- The Concept of a Target Network
- Exercise 9.04: Implementing a Working DQN Network with Experience Replay and a Target Network in PyTorch
- The Challenge of Overestimation in a DQN
- Double Deep Q Network (DDQN)
- Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
- Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
- Summary
10. Playing an Atari Game with Deep Recurrent Q Networks
- Overview
- Understanding the Breakout Environment
- Exercise 10.01: Playing Breakout with a Random Agent
- CNNs in TensorFlow
- Exercise 10.02: Designing a CNN Model with TensorFlow
- Combining a DQN with a CNN
- Activity 10.01: Training a DQN with CNNs to Play Breakout
- Activity 10.01: Training a DQN with CNNs to Play Breakout
- RNNs in TensorFlow
- Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow
- Building a DRQN
- Activity 10.02: Training a DRQN to Play Breakout
- Activity 10.02: Training a DRQN to Play Breakout
- Introduction to the Attention Mechanism and DARQN
- Activity 10.03: Training a DARQN to Play Breakout
- Activity 10.03: Training a DARQN to Play Breakout
- Summary
11. Policy-Based Methods for Reinforcement Learning
- Overview
- Introduction to Value-Based and Model-Based RL
- Policy Gradients
- Exercise 11.01: Landing a Spacecraft on the Lunar Surface Using Policy Gradients and the Actor-Critic Method
- Deep Deterministic Policy Gradients
- The Actor-Critic Model
- Exercise 11.02: Creating a Learning Agent
- Activity 11.01: Creating an Agent That Learns a Model Using DDPG
- Activity 11.01: Creating an Agent That Learns a Model Using DDPG
- Improving Policy Gradients
- Exercise 11.03: Improving the Lunar Lander Example Using PPO
- The Advantage Actor-Critic Method
- Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
- Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
- Summary
12. Evolutionary Strategies for RL
- Overview
- Problems with Gradient-Based Methods
- Exercise 12.01: Optimization Using Stochastic Gradient Descent
- Introduction to Genetic Algorithms
- Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs
- Components: Population Creation
- Exercise 12.03: Population Creation
- Components: Parent Selection
- Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques
- Components: Crossover Application
- Exercise 12.05: Crossover for a New Generation
- Components: Population Mutation
- Exercise 12.06: New Generation Development Using Mutation
- Application to Hyperparameter Selection
- Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training
- NEAT and Other Formulations
- Exercise 12.08: XNOR Gate Functionality Using NEAT
- Activity 12.01: Cart-Pole Activity
- Activity 12.01: Cart-Pole Activity
- Summary
- Survey III
13. Recent Advancements and Next Steps
- Overview
- Next-Generation RL – One-Shot Learning and Transferable Domain Priors
- Exercise 13.01: Implementing Transfer Learning for Image Recognition
- Model-Based Reinforcement Learning
- Exercise 13.02: Implementing Q-Learning for the FrozenLake-v0 Environment
- Learning from Human Preference
- Exercise 13.03: Demonstration Capture
- Hindsight Experience Replay
- Exercise 13.04: Hindsight Experience Replay Class
- Deep Q-Learning from Demonstrations
- Exercise 13.05: Class Development of a Deep Q-Learning Agent from Demonstrations
- Hierarchical Reinforcement Learning
- Exercise 13.06: Q-Table Update Using Feudal Q-Learning
- Inverse Reinforcement Learning
- Exercise 13.07: Implementing Inverse Reinforcement Learning for MountainCar
- Cautionary Notes – AI Winter and Superintelligence
- Activity 13.01: Solving MountainCar with Experience Replay DQN
- Activity 13.01: Solving MountainCar with Experience Replay DQN
- Summary

Hack Your Brain

We've applied the latest pedagogical techniques to deliver a truly multimodal experience. It'll keep you engaged and make the learning stick. It's science!
Build Real Things

Nobody likes wasting their time. We cut right to the action and get you building real skills that real, working developers value. The perfect approach for a career move.
Learn From Experts

We've paired technical experts with top editorial talent. They've worked hard to deliver you the maximum impact for each minute you spend learning. It's our secret sauce.

Get Verified

Complete The Reinforcement Learning Workshop to unlock your very own Packt certificate.

Unlock your own digital certificate by completing all activities. Designed to be easy to share with potential employers on LinkedIn, as well as other popular social media channels.

Take A Step Forward

There has never been a better time to get started with reinforcement learning.

$44.99

$44.99The Reinforcement Learning Workshop

Unlock one year of full, unlimited access!

Buy Now

Need More Python Data Science?

Don't worry, we've got your back with other workshops too!

Show me my options!

$44.99

$44.99The Reinforcement Learning Workshop

Hack Your Brain

Build Real Things

Learn From Experts

$44.99

$44.99The Reinforcement Learning Workshop