Temporal difference model
From Wikipedia, the free encyclopedia
The temporal difference model is a real time classical conditioning model. The primary idea behind the TD model is that the prediction is calculated as a sum of discounted rewards.
[edit] Math
Let
- λt
be the reinforcement on time step t. Let
be the correct prediction.
Thus, the reinforcement is the difference between the ideal prediction and the current prediction.
putting this reinforcement term into the Sutton-Barto model yields the temporal difference model:
[edit] Reference
Sutton, R.S., Barto A.G. (1990) Time Derivative Models of Pavlovian Reinforcement, Learning and Computational Neuroscience (available here).