Q-learning
From Wikipedia, the free encyclopedia
Q-learning is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. A strength with Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. A recent variation called delayed-Q learning has shown substantial improvements, bringing PAC bounds to Markov Decision Processes.
[edit] Algorithm
The core of the algorithm is a simple value iteration update. For each state, s, from the state set S, and for each action, a, from the action set A, we can calculate an update to its expected discounted reward with the following expression:
where r is an observed real reward, α is the learning rate such that 0 ≤ α ≤ 1, and φ is the discount rate such that 0 ≤ φ ≤ 1.
[edit] See also
[edit] External links
- Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England.
- Q-Learning
- Q-Learning by examples
- Reinforcement Learning online book
- Connectionist Q-learning Java Framework
- Piqle : a Generic Java Platform for Reinforcement Learning
- Online demonstration of Q-learning (bug in a maze)