Posts

Showing posts from August, 2018

Q Learning Explained with Example Code

Image
Intro to Reinforcement Learning and Q Learning Q Learning is a off-policy TD reinforcement learning algorithm. Reinforcement learning consists of an agent, an environment, and a reward system. The agent performs an action in its current environment and receives a reward for it. The goal of the agent is to maximize the expected cumulative reward. We can see this visualized in an agent-environment loop depicted below. Source: https://devblogs.nvidia.com/train-reinforcement-learning-agents-openai-gym/ Q learning is a type of TD, or Temporal Difference learning, algorithm. This means that our algorithm learns at every time step (loop in the above diagram) by remembering the best possible actions for each state and action. In fact, the Q in Q learning is a function that takes in a state “s” and an action “a” and returns the expected reward for the given inputs. Q learning seeks to approximate the optimal action-value function Q*. Another important detail to note about Q