Course Curriculum
An A to Z tour of reinforcement learning.

1. Introduction to Reinforcement Learning
 Overview
 Learning Paradigms
 Supervised versus Unsupervised versus RL
 Classifying Common Problems into Learning Scenarios
 Fundamentals of Reinforcement Learning
 Elements of RL
 An Example of an Autonomous Driving Environment
 Exercise 1.01: Implementing a Toy Environment Using Python
 The AgentEnvironment Interface
 Environment Types
 An Action and Its Types
 Policy
 Policy Parameterizations
 Exercise 1.02: Implementing a Linear Policy
 Goals and Rewards
 Reinforcement Learning Frameworks
 Getting Started with Gym – CartPole
 Rendering an Environment
 Exercise 1.03: Creating a Space for Image Observations
 Exercise 1.04: Implementing the Reinforcement Learning Loop with Gym
 Activity 1.01: Measuring the Performance of a Random Agent
 Activity 1.01: Measuring the Performance of a Random Agent FREE PREVIEW
 OpenAI Baselines
 Applications of Reinforcement Learning
 Summary

2. Markov Decision Processes and Bellman Equations
 Overview
 Markov Processes
 Markov Chains
 Value Functions and Bellman Equations for MRPs
 Exercise 2.01: Finding the Value Function in an MRP
 Markov Decision Processes
 The StateValue Function and the ActionValue Function
 Bellman Optimality Equation
 Solving MDPs
 Exercise 2.02: Determining the Best Policy for an MDP Using Linear Programming
 Gridworld
 Activity 2.01: Solving Gridworld
 Activity 2.01: Solving Gridworld
 Summary

3. Deep Learning in Practice with TensorFlow 2
 Overview
 An Introduction to TensorFlow and Keras
 Keras
 Exercise 3.01: Building a Sequential Model with the Keras HighLevel API
 How to Implement a Neural Network Using TensorFlow
 Loss Function Definition
 Learning Rate Scheduling
 Model Validation
 Model Improvement
 Standard Fully Connected Neural Networks
 Exercise 3.02: Building a Fully Connected Neural Network Model with the Keras HighLevel API
 Convolutional Neural Networks
 Exercise 3.03: Building a Convolutional Neural Network Model with the Keras HighLevel API
 Recurrent Neural Networks
 Exercise 3.04: Building a Recurrent Neural Network Model with the Keras HighLevel API
 Simple Regression Using TensorFlow
 Exercise 3.05: Creating a Deep Neural Network to Predict the Fuel Efficiency of Cars
 Simple Classification Using TensorFlow
 Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson
 TensorBoard – How to Visualize Data Using TensorBoard
 Exercise 3.07: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for the Higgs Boson Using TensorBoard for Visualization
 Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
 Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
 Summary

4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning
 Overview
 OpenAI Gym
 How to Interact with a Gym Environment
 Exercise 4.01: Interacting with the Gym Environment
 Action and Observation Spaces
 How to Implement a Custom Gym Environment
 OpenAI Universe – Complex Environment
 Environments
 Running an OpenAI Universe Environment
 TensorFlow for Reinforcement Learning
 Exercise 4.02: Building a Policy Network with TensorFlow
 Exercise 4.03: Feeding the Policy Network with Environment State Representation
 How to Save a Policy Network
 OpenAI Baselines
 Training an RL Agent to Solve a Classic Control Problem
 Exercise 4.04: Solving a CartPole Environment with the PPO Algorithm
 Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
 Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
 Summary
 Survey I

5. Dynamic Programming
 Overview
 Solving Dynamic Programming Problems
 Memoization
 Exercise 5.01: Memoization in Practice
 Exercise 5.02: The Tabular Method in Practice
 Identifying Dynamic Programming Problems
 Exercise 5.03: Solving the CoinChange Problem
 Dynamic Programming in RL
 OpenAI Gym: Taxiv3 Environment
 Policy Iteration
 Value Iteration
 Activity 5.01: Implementing Policy and Value Iteration on the FrozenLakev0 Environment
 Activity 5.01: Implementing Policy and Value Iteration on the FrozenLakev0 Environment
 Summary

6. Monte Carlo Methods
 Overview
 The Workings of Monte Carlo Methods
 Exercise 6.01: Implementing Monte Carlo in Blackjack
 Types of Monte Carlo Methods
 Exercise 6.02: First Visit Monte Carlo Prediction for Estimating the Value Function in Blackjack
 Every Visit Monte Carlo Prediction for Estimating the Value Function
 Exercise 6.03: Every Visit Monte Carlo Prediction for Estimating the Value Function
 Exploration versus Exploitation TradeOff
 The Pseudocode for Monte Carlo OffPolicy Evaluation
 Exercise 6.04: Importance Sampling with Monte Carlo
 Solving Frozen Lake Using Monte Carlo
 Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
 Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
 The Pseudocode for Every Visit Monte Carlo Control for Epsilon Soft
 Activity 6.02: Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
 Activity 6.02: Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
 Summary

7. Temporal Difference Learning
 Overview
 TD(0) – SARSA and QLearning
 Exercise 7.01: Using TD(0) SARSA to Solve FrozenLakev0 Deterministic Transitions
 The Stochasticity Test
 Exercise 7.02: Using TD(0) SARSA to Solve FrozenLakev0 Stochastic Transitions
 QLearning – OffPolicy Control
 Exercise 7.03: Using TD(0) QLearning to Solve FrozenLakev0 Deterministic Transitions
 Expected SARSA
 NStep TD and TD(λ) Algorithms
 Nstep SARSA
 TD(λ)
 Exercise 7.04: Using TD(λ) SARSA to Solve FrozenLakev0 Deterministic Transitions
 Exercise 7.05: Using TD(λ) SARSA to Solve FrozenLakev0 Stochastic Transitions
 The Relationship between DP, MonteCarlo, and TD Learning
 Activity 7.01: Using TD(0) QLearning to Solve FrozenLakev0 Stochastic Transitions
 Activity 7.01: Using TD(0) QLearning to Solve FrozenLakev0 Stochastic Transitions
 Summary

8. The MultiArmed Bandit Problem
 Overview
 Formulation of the MAB Problem
 Background and Terminology
 The Python Interface
 The Greedy Algorithm
 The ExplorethenCommit Algorithm
 The εGreedy Algorithm
 Exercise 8.01: Implementing the εGreedy Algorithm
 The Softmax Algorithm
 The UCB Algorithm
 Exercise 8.02: Implementing the UCB Algorithm
 Thompson Sampling
 The Thompson Sampling Algorithm
 Exercise 8.03: Implementing the Thompson Sampling Algorithm
 Contextual Bandits
 Queueing Bandits
 Activity 8.01: Queueing Bandits
 Activity 8.01: Queueing Bandits
 Summary
 Survey II

9. What Is Deep Q Learning?
 Overview
 Basics of Deep Learning
 Basics of PyTorch
 Exercise 9.01: Building a Simple Deep Learning Model in PyTorch
 PyTorch Utilities
 The StateValue Function and the Bellman Equation
 The ActionValue Function (Q Value Function)
 Implementing Q Learning to Find Optimal Actions
 OpenAI Gym Review
 Exercise 9.02: Implementing the Q Learning Tabular Method
 Deep Q Learning
 Exercise 9.03: Implementing a Working DQN Network with PyTorch in a CartPolev0 Environment
 Challenges in DQN
 The Concept of a Target Network
 Exercise 9.04: Implementing a Working DQN Network with Experience Replay and a Target Network in PyTorch
 The Challenge of Overestimation in a DQN
 Double Deep Q Network (DDQN)
 Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
 Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
 Summary

10. Playing an Atari Game with Deep Recurrent Q Networks
 Overview
 Understanding the Breakout Environment
 Exercise 10.01: Playing Breakout with a Random Agent
 CNNs in TensorFlow
 Exercise 10.02: Designing a CNN Model with TensorFlow
 Combining a DQN with a CNN
 Activity 10.01: Training a DQN with CNNs to Play Breakout
 Activity 10.01: Training a DQN with CNNs to Play Breakout
 RNNs in TensorFlow
 Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow
 Building a DRQN
 Activity 10.02: Training a DRQN to Play Breakout
 Activity 10.02: Training a DRQN to Play Breakout
 Introduction to the Attention Mechanism and DARQN
 Activity 10.03: Training a DARQN to Play Breakout
 Activity 10.03: Training a DARQN to Play Breakout
 Summary

11. PolicyBased Methods for Reinforcement Learning
 Overview
 Introduction to ValueBased and ModelBased RL
 Policy Gradients
 Exercise 11.01: Landing a Spacecraft on the Lunar Surface Using Policy Gradients and the ActorCritic Method
 Deep Deterministic Policy Gradients
 The ActorCritic Model
 Exercise 11.02: Creating a Learning Agent
 Activity 11.01: Creating an Agent That Learns a Model Using DDPG
 Activity 11.01: Creating an Agent That Learns a Model Using DDPG
 Improving Policy Gradients
 Exercise 11.03: Improving the Lunar Lander Example Using PPO
 The Advantage ActorCritic Method
 Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
 Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
 Summary

12. Evolutionary Strategies for RL
 Overview
 Problems with GradientBased Methods
 Exercise 12.01: Optimization Using Stochastic Gradient Descent
 Introduction to Genetic Algorithms
 Exercise 12.02: Implementing FixedValue and Uniform Distribution Optimization Using GAs
 Components: Population Creation
 Exercise 12.03: Population Creation
 Components: Parent Selection
 Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques
 Components: Crossover Application
 Exercise 12.05: Crossover for a New Generation
 Components: Population Mutation
 Exercise 12.06: New Generation Development Using Mutation
 Application to Hyperparameter Selection
 Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training
 NEAT and Other Formulations
 Exercise 12.08: XNOR Gate Functionality Using NEAT
 Activity 12.01: CartPole Activity
 Activity 12.01: CartPole Activity
 Summary
 Survey III

13. Recent Advancements and Next Steps
 Overview
 NextGeneration RL – OneShot Learning and Transferable Domain Priors
 Exercise 13.01: Implementing Transfer Learning for Image Recognition
 ModelBased Reinforcement Learning
 Exercise 13.02: Implementing QLearning for the FrozenLakev0 Environment
 Learning from Human Preference
 Exercise 13.03: Demonstration Capture
 Hindsight Experience Replay
 Exercise 13.04: Hindsight Experience Replay Class
 Deep QLearning from Demonstrations
 Exercise 13.05: Class Development of a Deep QLearning Agent from Demonstrations
 Hierarchical Reinforcement Learning
 Exercise 13.06: QTable Update Using Feudal QLearning
 Inverse Reinforcement Learning
 Exercise 13.07: Implementing Inverse Reinforcement Learning for MountainCar
 Cautionary Notes – AI Winter and Superintelligence
 Activity 13.01: Solving MountainCar with Experience Replay DQN
 Activity 13.01: Solving MountainCar with Experience Replay DQN
 Summary
