Markov reward model

In probability theory, a Markov reward model or Markov reward process is a stochastic process which extends either a Markov chain or continuous-time Markov chain by adding a reward rate to each state. An additional variable records the reward accumulated up to the current time.^[1] Features of interest in the model include expected reward at a given time and expected time to accumulate a given reward.^[2] The model appears in Ronald A. Howard's book.^[3] The models are often studied in the context of Markov decision processes where a decision strategy can impact the rewards received.

The Markov Reward Model Checker tool can be used to numerically compute transient and stationary properties of Markov reward models.

Continuous-time Markov chain

The accumulated reward at a time t can be computed numerically over the time domain or by evaluating the linear hyperbolic system of equations which describe the accumulated reward using transform methods or finite difference methods.^[4]

References

↑ Begain, K.; Bolch, G.; Herold, H. (2001). "Theoretical Background". Practical Performance Modeling. p. 9. doi:10.1007/978-1-4615-1387-2_2. ISBN 978-1-4613-5528-1.
↑ Li, Q. L. (2010). "Markov Reward Processes". Constructive Computation in Stochastic Models with Applications. pp. 526–573. doi:10.1007/978-3-642-11492-2_10. ISBN 978-3-642-11491-5.
↑ Howard, R.A. (1971). Dynamic Probabilistic Systems, Vol II: Semi-Markov and Decision Processes. New York: Wiley. ISBN 0471416657.
↑ Reibman, A.; Smith, R.; Trivedi, K. (1989). "Markov and Markov reward model transient analysis: An overview of numerical approaches". European Journal of Operational Research 40 (2): 257. doi:10.1016/0377-2217(89)90335-4.