Temporal difference model

From Wikipedia, the free encyclopedia

You have new messages (last change).

The introduction to this article provides insufficient context for those unfamiliar with the subject matter.
Please help Wikipedia by improving the introduction according to the guidelines laid out at Wikipedia:Guide to layout. You can discuss the issue on the talk page.

The temporal difference model is a real time classical conditioning model. The primary idea behind the TD model is that the prediction is calculated as a sum of discounted rewards.

[edit] Math

Let

λ t

be the reinforcement on time step t. Let

$\bar V_t$

be the correct prediction.

$\bar V_t = \sum_{i=0}^{\infty} \gamma^i \lambda_{t+i}$

$0 \le \gamma < 1$

$\bar V_t = \lambda_{t+1} + \gamma \sum_{i=0}^{\infty} \gamma^i \lambda_{t+i+1}$

$\bar V_t = \lambda_{t+1} + \gamma \bar V_{t+1}$

Thus, the reinforcement is the difference between the ideal prediction and the current prediction.

$R_t = \lambda_{t+1} + \gamma \bar V_{t+1} - \bar V_{t}$

putting this reinforcement term into the Sutton-Barto model yields the temporal difference model:

$\triangle V_i = \beta (\lambda_{t+1} + \gamma \bar V_{t+1} - \bar V_{t}) + \alpha_i \bar X_i$

[edit] Reference

Sutton, R.S., Barto A.G. (1990) Time Derivative Models of Pavlovian Reinforcement, Learning and Computational Neuroscience (available here).

This mathematics-related article is a stub. You can help Wikipedia by expanding it.

Retrieved from "http://en.wikipedia.org../../../t/e/m/Temporal_difference_model.html"

Categories: Wikipedia articles needing context | Mathematics stubs

Views

Search