Talk:Markov chain

From Wikipedia, the free encyclopedia

This article has been reviewed by the Version 1.0 Editorial Team.
Version 0.7
This article has been selected for Version 0.7 and subsequent release versions of Wikipedia.
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: B Class Top Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Contents

[edit] Power

Might be a stupid question...but how do you calculate the "power" of the matrix in order to calculate the n-step transition matrix?

See matrix multiplication for the answer to the question above. (That article could use some polishing, BTW, but the answer is there.) Michael Hardy 01:49, 30 Sep 2004 (UTC)

What is meant by a _sequence_ of random variables? A r.v. is a fn. from the probability space to some measurable space, so we can think of assigning a 'number' to each elementary event in the probability space - is the sequence here a series of values from a single r.v. generated by a series of elementary events occurring, or is it actually a sequence of different functions (which would be the literal interpretation of "a sequence of random variables").--SgtThroat 10:37, 10 Dec 2004 (UTC)

Having read around I'm pretty sure that the sequence consists of a sequence of functions. Also these need to have the same domain and codomain (?). Then each point in the sample space would correspond to a sequence x_1,x_2...x_N of values of these functions. The set of all points then corresponds to the set of all sequences. Lets assume for simplicity that each point in the underlying probability space has equal probability - then the transition probability P(x_n|x_n-1) would be obtained by taking all the sequences in which X_n-1=x_n-1 and calculating the fraction of them that have X_n=x_n. Similarly P(x_n|x_n-1,x_n-2) would be obtained by calculating the same fraction among all those that have X_n-1=x_n-1 and X_n-2=x_n-2. The Markov property is the statement that these are equal for all x_n (i.e the conditional distributions are the same) and all n. Can anybody confirm/correct this understanding? --SgtThroat 12:04, 10 Dec 2004 (UTC)

In mathematics, a sequence has a first element, a second one, a third one, and so on, so a sequence of random variables has a first random variable, a second random variable, etc. And since a r.v. is a function whose domain is a probability space, a sequence of random variables is a sequence of such functions. Your remarks about "elementary" events assumes a discrete probability space, and that assumption is not warranted. Definitely all of the r.v.'s in this sequence have the same domain; that is true of any stochastic process.

each point in the sample space would correspond to a sequence x_1,x_2...x_N of values of these functions

Correct.

Lets assume for simplicity that each point in the underlying probability space has equal probability

Again, your assuming discrete without warrant. More later .... Michael Hardy 23:02, 10 Dec 2004 (UTC)

To continue:

It seems your difficulty stems mainly from the unwarranted assumption of discreteness. Consider the "uniform" probability distribution on the interval [0, 1]. The probability that a random variable with this distribution falls in, for example, the interval [0.15, 0.23] is just the length of that interval: 0.23 − 0.15 = 0.08, and similarly for any other subinterval of [0, 1]. This is a continuous, rather than discrete, probability distribution. Nota bene: the probability assigned to each one-point set is zero! Thus, knowing the probability assigned to each one-point set does not tell you what the probability distribution is. Similarly, the familiar bell-shaped curve introduced in even the most elementary statistics courses for non-mathematically inclined people is a continuous distribution, so we're not talking about anything the least bit esoteric here.

Now consider an infinite sequence of independent fair-coin tosses. What's the probability of getting one particular infinite sequence of heads and tails? It's zero, no matter which sequence you pick. So this is also not a discrete distribution. Michael Hardy 01:01, 11 Dec 2004 (UTC)

Hi Michael. Yes, I agree that the assumption of a discrete probability space is rather limited. I was trying to get the discrete case clear before moving on to trying to understand the continuous one, although I neglected to say so. I think I now have a much better understanding of some of this material thanks to your comments and to some additional reading (in particular the entry on random vector which I hadnàt read when I wrote the above. I think we ought to clarify that the sequence of random vbles is a sequence of functions from some probability space onto some domain, both of which are the same for all elements of the sequence. It could use a link to the entry on random vector - although a sequence is not quite the same as a vector, the components of a random vector form a sequence with the required properties and there is much useful content on that page to clarify. SgtThroat 12:21, 12 Dec 2004 (UTC)


[edit] Aperiodic

Did anyone notice that the definiton of aperiodic in the text is plain wrong, the condition mentioned implies that the chain is aperiodic but it's not neccesary.

[edit] Submitted for Wikipedia:Mathematics Collaboration of the Week

This article needs an update, so I have asked for some help from the collaboration of the week project. From my submission, some todos for this page:

  • Clearer introduction for the newcomer
  • Motivation for using Markov chains
  • Clean-up, reviewing and clarification of the math
  • Better distinction between continuous and discrete state spaces
  • A clear example of a Markov model
  • Conditions for which a unique eigenvector exists, and how to use the Power method to find limit distributions
  • Scientific applications should be extended; currently the list is not explicative
  • Relation to Hidden Markov models
  • Incorporation of Markov process

Anthony Liekens 00:07, 18 September 2005 (UTC)

Agreed. I've read the article and I still have no idea what a Markov chain - I got directed here by a Slashdot comment, saying "On Usenet there are enough kooks that a simple Markov chain based text analysing program could pass." (in regards to A.I.). Having read the article, I have no idea what he means. Please clean up this article for readability! 80.6.114.153 11:02, 3 December 2005 (UTC)
Many of the complex definitions have simpler versions which suffice for the simpler special case of a finite state Markov chain. Often, when people refer to "Markov chains", they actually mean "finite state Markov chains". Perhaps we need to clarify this early in the article, and link to a new page specifically on finite state Markov chains. Jim 14159 11:17, 1 November 2007 (UTC)

[edit] Bad didactics

In writing wrongly P(X|Y) instead of P(X=x|Y=y) you doesn't make things easier to understand, just easier to write down and no more than that. If that's your purpose just don't write anything; most easy to copy.Nijdam 22:15, 4 February 2006 (UTC)


[edit] Steady state

Quite a nonsense story about "steady state" and "convenient variables".Nijdam 22:37, 23 March 2006 (UTC)

[edit] bad formatting, perhaps?

some of the equations go off the right-hand side on Safari; dunno if this is a problem with other browsers too, but it's kinda annoying...problem is, my math is not hardcore enough for me to fix this myself--help?

I am not having this problem with internet explorer or firefox. Sounds like a mac error.

[edit] Error(s) in article

[edit] Memoryless vs Markov

From the article:

The Markov property means the system is memoryless, i.e. it does not "remember" the states it was in before, just "knows" its present state, and hence bases its "decision" to which future state it will transit purely on the present, not considering the past.

This definition of "memoryless" seems incorrect to me. In information theory a "memoryless" source is a sequence of independent identically distributed random variables. A Markov chain is not in general memoryless. WikiC 22:19, 25 May 2006 (UTC)

Do you know the usage at memorylessness? Michael Hardy 17:18, 13 June 2006 (UTC)

I agree with the statements above that Markov Chains are not memoryless. Statisticians reserve the term for "truly memoryless" distributions like Geometric or Exponential. As you gain info by knowing what happened in previous states (i.e: states are not independent), I believe we should remove the mention of "memoryless" in the article. We should still keep some mention the special property of the Markov chains, called the "Markov property". Do you guys agree?-akshayaj

The concepts are very much related: in each case we're saying the conditional probability distribution of future states given the present and past states, does not depend on the past states. Michael Hardy 17:21, 13 June 2006 (UTC)

They may be similar concepts ("memoryless in exponential" vs "Markov property in Markov chains"), but using the strict definitions, you cannot use the term "memoryless" for the property of Markov chains. In fact, for the Markov chains, none of the states but the last one are needed to determine the probability of the current state. But I repeat, knowing any previous state will add information, and thus change the current state probability. Therefore, Markov probabilities are not strictly memoryless, as they "remember" all previous states just by knowing the last state. Speaking statistically, the previous states are not independent of the current one for Markov chains, like they are for Exponential states. See the wikipedia entry on "memoryless" for a perhaps better explanation-Akshayaj 19:55, 20 June 2006 (UTC)

I'm a bit unclear on the notion of independence, in regards to whether they're correct above. I still believe my main point remains, and will try to get the answer to my "independence issue" soon-Akshayaj 19:55, 20 June 2006 (UTC)

The point is that MCs are memoryless conditional on the most recent state. 128.8.168.148 17:10, 31 August 2006 (UTC)

From the article: "In other words, the past states carry no information about future states." The mutual information between the past and future (excluding - or not - the current state) is nonzero in general. I think this is misleading. I think the better statement would be: "Any information about the future that may be contained in the past is also contained in the current state." -jmahoney@cse.ucdavis.edu

[edit] Periodicity

Also this sentence is not quite correct: "A process is periodic if there exists at least one state to which the process will continually return with a fixed time period (greater than one)." Consider the transition matrix:

P = \begin{bmatrix}
0 & 0 & 1/2 & 1/2  \\
0 & 0 & 1/2 & 1/2  \\
1/2 & 1/2 & 0 & 0  \\
1/2 & 1/2 & 0 & 0  \\

\end{bmatrix}

Isn't a Markov chain with this transition matrix periodic? I don't know how to word that sentence better. See the definition of aperiodicity at http://www.quantlet.com/mdstat/scripts/csa/html/node26.html. WikiC 01:00, 26 May 2006 (UTC)

That definition is indeed incorrect. I felt the other definitions were much too terse (eg, saying "accessible" without saying what it meant), and a few common terms were absent altogether (eg, transient). I've expanded those definitions quite a bit. Check them out for completeness :) - grubber 02:26, 21 June 2006 (UTC)

[edit] Finite vs Discrete

There was a section on "discrete state spaces," but then it described finite state spaces. I've cleaned up that section a little bit. When I have some time, I'm going to remove the integrals earlier in the article and put in some text that reinforces that a MC has a discrete state space. - grubber 19:16, 28 June 2006 (UTC)

[edit] Slight omissions

[edit] Describe S

The "S" used in the applicable form of the Chapman-Kolmogorov equation, as well as in the one-step evolution of the marginal distribution, appears to me to be the set of sates (the state space). It even seems very clear, now that I have understood ; however, S is not actually defined anywhere, as far as I can see. Might it not be an idea to add the denotation near "state space" in bold type in the definition section (for example) ? Exaton 10:53, 3 July 2006 (UTC)

Thanks. I've put an S in the definition paragraph (although it's not very prominent and could get lost maybe... not sure where else to put it tho). Also, I added a note about what "gcd" is, for completeness. - grubber 18:16, 3 July 2006 (UTC)

[edit] Related questions

[edit] Probability in the definition of a state's hitting time

A state's (next) hitting time Ti is defined as \operatorname{min}\{n: X_n=i | X_0=i\}. I'm wondering what the difference is between that definition and, say, \operatorname{min}\{ n: Pr(X_n = i | X_0 = i) > 0\} (ie. with a probabilistic notation, like in the definition of a state's period) ? Exaton 11:25, 3 July 2006 (UTC)

The difference is that the first is a random variable, and the second says something about connectivity between states and about probability distributions. Michael Hardy 14:28, 3 July 2006 (UTC)

[edit] Absorbing states

A very good and clear article to someone who learned the subject - but forgot... I missed a referance about absorbing states. The folowing link helped me in the subject (and gives a simple example for a markov chain)

http://people.hofstra.edu/faculty/Stefan_Waner/Realworld/Summary8.html

[edit] Doesn't teach

I apologize, but this page fails to teach anything to anyone who already isn't well versed in the subject. Perhaps a wonderful reference, but not very educational, even more so to the layman. How about breaking this down into more simple terms and more concrete examples? --Anonymous 68.97.215.103 03:25, 19 July 2006 (UTC)

This isn't wiki-learn-ia, it is an encyclopedia. If you are looking for an article to teach you about Markov chains, you came to the wrong place. Try Wikibooks or Wikiversity. 130.126.108.104 15:24, 21 September 2007 (UTC)

The article at minimum needs more of the jargon/terminology to be turned into hyperlinks to articles that explain what they mean. At present, I can't understand the steady state section that I wanted to understand more about because of such terms. And an article that only tells knowledgeable people what they already know does not serve to inform anyone. 08:18, 30 September 2007 (UTC)

[edit] Ergodicity hypothesis

We can see the sentence: "Markov chains are related to Brownian motion and the ergodic hypothesis". I think Markov chains aren't always ergordic. If they were, there would only be one final class. Could someone confirm?

This is a very strange sentence. Brownian motion is a continuous time Markov process. It is not ergodic. And, Markov chains are not always ergodic.

 Spmeyn (talk) 16:21, 9 December 2007 (UTC)Sean Meyn

You are right, but I don't see a problem. This part of the article explains the history of the study of Markov chains. It says that Markov chanes are related to Brownian motion (which is true). They are certainly related, since Brownian motion in the mathematical sense is a continuous time Markov process. But I think in this case the sentence actually referrs to the physics process of Brownian motion (which is also related). Then, the claim that Markov chains are related to the ergodic hypothesis is also correct. They are related, since the question of ergodicity may be stated in the context of Markov chains. It is a hypothesis, so there is no claim that it holds for every Markov chain. I hope this clarifies the issues. Oded (talk) 15:44, 8 May 2008 (UTC)

[edit] Suggestion about Examples

For the non-mathematical reader it would be good to point out that the question of whether a physical process is a Markov process or not is not a well posed question. It cannot be answered until we say what information we are willing to include in the "state" variable(s). It would be nice to give an example of a physical process that is Markov using one definition of state and not Markov using another definition of state.

[edit] Bad Example: Coin Toss

I think coin toss is a bad example for Markov chain. The outcome does not depend on current state, it is always 50/50 (or x/(1-x) with an unfair coin) regardless of state. The coin toss can be a degenerate case of Markov chain at best.

A first-in-first-out (FIFO) queue could be a better example: The state is the length of the queue. In a given time interval, a new person (or thing) arrives at the queue with probability of p; and a person (or thing) is served and leaves the queue with probability of q. Then, if current state is i, the probability of going to states i-1, i and i+1 depend only on current state, and input probabilities p and q. —Preceding unsigned comment added by 71.163.214.29 (talk) 04:43, 2 May 2008 (UTC)


I could not agree more with this comment. A coin toss is not a Markov process! It's an iid process. I will fix this in a few days. Sunbeam44 (talk) 15:15, 10 May 2008 (UTC)

Any i.i.d. process is Markov. And a sequence of coin tosses is Markov. But it is not a good illustration of the concept of a Markov process. Oded (talk) 15:36, 10 May 2008 (UTC)

[edit] Properties of Markov chains

The section right after the intro, "Properties of Markov chains", seems to be saying that

\Pr(X_{n+1}=x|X_n=y) = \Pr(X_{n}=x|X_{n-1}=y)\,

One needs to have this kind of a property in order for the claimed formulation of the Chapman-Kolmogorov equation to apply, as well as the discussion of the marginal distribution. However, I was under the mpression that. in general, one may have

\Pr(X_{n+1}=x|X_n=y) \neq \Pr(X_{n}=x|X_{n-1}=y)\,

and still be able to call it a "Markov process", since the next state depends only on the current state (and the current transition probability). Or is it the case that the transition probability must be time-independent, in order for it to be called "Markov"? If the former is true, then it should be stated right up front, at the begining. If not true, then there are numerous, serious faults and errors in the article ... Sincerely confused, linas 00:42, 29 August 2006 (UTC)

Hmm, it seems inescapable that the former not the latter is intended. Is there a name for the later? In particular, I am interested in studying a system where the transition matrix is finite-dimensional, but it changes over time. It is also the case that for my system, it is not a "second order Markov proces" (i.e. repeated concatentions of the transition matrix is not periodic). linas 01:02, 29 August 2006 (UTC)
The article assumes, but does not state, that the Markov chains are "time invariant". That is the common assumption, since time-varying Markov chains are much more difficult to study. I think it would be useful for the article to mention this property once. - grubber 15:20, 29 August 2006 (UTC)
Hmm. Well, I know how to solve a certain class of time-varying Markov chains, and surely there are other classes that are also solvable, so I was surprised that this article seemed to confuse the definition Markov chains in general with stationary Markov chains. linas 15:12, 4 September 2006 (UTC)

Never mind. On closer inspection, it appears that the article does not assume that Markov chains are time invariant. The notation in the "properties" section is marvelously misleading though: it can be misread, which is what I did at first: I took the superscript (n) to mean "raised to the power of", whereas in fact the equations all hold just fine if (n) is interpreted as merely "some label". Silly me. linas 16:01, 4 September 2006 (UTC)

[edit] More problems

After going through the article more carefully, it seems that most of the article does not assume that the process is stationary (and I explicitly marked the few places where it does). However, the section on "stationary analysis" is ambiguous. I believe that lots of it can be generalized to the non-stationary case, but would require some care with the notation being used. That section needs work. linas 16:20, 4 September 2006 (UTC)

[edit] Definition of "stationary"

The article says: "A stationary Markov chain (a.k.a. time-stationary chain or time-homogenous chain) is a process where one has \Pr(X_{n+1}=x|X_n=y) = \Pr(X_{n}=x|X_{n-1}=y)\, (*) for all n". My professor uses "stationary" to mean something quite different, namely: \Pr(X_{n}=x)=\Pr(X_m=x) for all n, m, x. When I asked him, he was quite sure that his usage was standard. When he wants to express that the transition probabilities are constant with respect to time, i.e. equation (*), he uses "homogenous" or "time-homogenous", but not "stationary". I can imagine that either the article is wrong, or that there is significant variation in terminology within the field - I propose that we either fix the article, or mention the existence of different conventions, respectively. A5 23:36, 19 November 2006 (UTC)

I've updated the article, with my professor's help. He pointed out that at least some of the external links use his terminology. I hope this is the right thing to do. A5 14:57, 22 November 2006 (UTC)
It's a good point, and off the top of my head, I know I've heard "time-homogenous" for that property, and I can't recall if I've heard it called "stationary". I'm going to check my Kulkarni book when I get on campus next. For now, I think your edit is fine. - grubber 16:44, 22 November 2006 (UTC)
I got my Kulkarni book. A DTMC with a stationary distribution π is stationary iff P(Xn=j) = πj. Time-homogeneous is the proper term for independence of n. Thanks for fixing this! - grubber 19:49, 27 November 2006 (UTC)

[edit] Recurrence

Sorry, I was just puzzled about the following statement in the section on recurrence:


It can be shown that a state is recurrent if and only if     \sum_{n=0}^{\infty} p_{ii}^{(n)} = \infty



I am puzzled about this. Take for example a simple 2 states Markov chain with p11 = p22 = 0 and p12 = p21 = 1, i.e. states flip between state 1 and state 2 every period. Now both states 1 and 2 are recurrent, since they are reached every second period. But pii = 0 and therefore     \sum_{n=0}^{\infty} p_{ii}^{(n)} = 0 < \infty , which means by the statement above no state should be recurrent. Did I get something wrong? —The preceding unsigned comment was added by 212.201.78.127 (talk) 09:45, 15 January 2007 (UTC).

I think I can help with this - the superscript in the original notation  p_{ii}^{(n)} represents the number of steps or transitions in the path from i back to i. Thus in your example, again using the original notation  p_{ii}^{(1)} = 0 and  p_{ii}^{(2)} = 1 . More generally,  p_{ii}^{(2n)} = 1 , thus the sum over all n would indeed be infinite. Trog23 07:20, 7 February 2007 (UTC)

[edit] text manglers

Has anyone ever made a web page showing how simple markov chains are? Wheenever I look them up on the web, I find sigmas and all sorts of stuff, I know nothing about, yet I understand markov chains and can teach any fairly bright person how to create a markov chain text mangler on paper without even the aid of a computer program, in 2 hours at most. There simple. I can think of many many other frivolouse non-text uses for them and suspect they will eventually be the solution to the Turing test given time and creativity, but they are always presented as something that needs BIG math skills. Any links that explain how to do a text mangler markov chain would be appreciatedThaddeus Slamp 06:55, 9 February 2007 (UTC)

[edit] ergodic definitions

Does anyone else find these two statements in the article to be a bit contradictory? We first have

A finite states irreducible Markov chain is said to be ergodic if its states are periodic. 

and then

A state i is said to be ergodic if it is aperiodic and positive recurrent. 
If all states in a Markov chain are ergodic, then the chain is said to be ergodic.

I assume we want "aperiodic" and not "periodic" in the first statement? --Lavaka 23:22, 30 April 2007 (UTC)

[edit] 0.7 pass

I have passed this article for Wikipedia 0.7. It meets the Criteria for approval by being a B-Class article of High Importance or Higher. It is of Top importance resulting in the passing of this article. Funpika 22:15, 5 June 2007 (UTC)

[edit] Discrete-parameter Markov process vs. discrete-state Markov process

(This is my first Wikipedia discussion/talk page entry. So, please be patient. ;) )

I'm somewhat puzzled by the following statement of the Markov chain page: "In mathematics, a Markov chain, named after Andrey Markov, is a discrete-time stochastic process with the Markov property."

To my best knowledge, I cannot agree here. As far as I know, stochastic processes with discrete state space are referred to as "chains". On the other hand, we use, e.g., "discrete-time" and "continuous-time" to differ between stochastic processes that have a discrete or continuous parameter, respectively.

See, e.g.,
- G. Bolch et al.: "Queueing Networks and Markov Chains", 2nd Edition, Wiley, 2006, page 53:

[...], we consider Markov processes with discrete state spaces only, that is, Markov chains, [...]

- K.S. Trivedi: "Probability and Statistics with Reliability, Queueing and Computer Science Applications", 2nd Editions, Wiley, 2001, page 296:

Although this definition applies to Markov processes with continuous state space, we will mostly concerned with discrete-state Markov processes — specifically, Markov chains. We will study both discrete-time and continuous-time Markov chains.

Of course, in Markov chains, state changes (transitions) between the different states happen instantaneously, so at time points. Still, the transitions may happen at arbitrary points in time (CTMCs). Only in DTMCs, these time points are fixed.

Thus, I am neither able to comprehend the statement given in the Markov chain article ("[...] a Markov chain [...] is a discrete-time stochastic process") nor the fact that "DTMC" is redirected to "Markov chain" and "CTMC" is redirected to "Continuous-time Markov process". Both DTMCs and CTMCs are Markov chains.

I'm happy to receive any pointer towards work that states that Markov chains always have a discrete parameter/time. MuesLee 13:00, 19 June 2007 (UTC)

I believe that your understanding is correct, MuesLee, and offer two more references that support your view:
- J.R. Norris. "Markov Chains", Cambridge University Press, 1977 (ISBN 0-521-63396-6) Introduction: "...the case where the process can assume only a finite or countable number of states, when it is usual to refer to it as a Markov Chain."
- William J Stewart. "Introduction to the Numerical Solution of Markov Chains", Princeton University Press (ISBN 0-691-03699-3) Page 5: "If the state space of a Markov Process is discrete, the Markov process is referred to as a Markov Chain."
Trog23 08:04, 24 June 2007 (UTC)

[edit] Periods

"A state i has period k if any return to state i must occur in some multiple of k time steps and k is the largest number with this property. For example, if it is only possible to return to state i in an even number of steps, then i is periodic with period 2. Formally, the period of a state is defined as" I'm a bit confused here - shouldn't it be "smallest number with this property"? The example with even numbers suggests that this is the case as well. I won't change it, in case I'm just wrong. 66.216.172.3 18:05, 12 July 2007 (UTC)

Nevermind, I'm just dense.

[edit] Removed (Bad?) link

I removed this external link:

  • A Markov text generator generates nonsense in the style of another work, because the probability of spitting out each word depends only on the n words before it

Because it didn't work when I ran it. I moved the source code into this utility and linked to that now: http://www.utilitymill.com/utility/Markov_Chain_Parody_Text_Generator —Preceding unsigned comment added by 69.255.197.177 (talk) 20:44, 2 September 2007 (UTC)

[edit] MCML - Markov Chain Markup Language

Has anyone seen a definition of this language anywhere? A Google search just found some references to it from scientific papers, but I'm hoping there's an official or semi-official site with the definition and tools to work with it. —Preceding unsigned comment added by Engmark (talkcontribs) 13:53, 11 June 2008 (UTC)