Post correspondence problem

From Wikipedia, the free encyclopedia

The Post correspondence problem is an undecidable decision problem that was introduced by Emil Post in 1946. Because it is simpler than the halting problem and the Entscheidungsproblem it is often used in proofs of undecidability.

Contents

[edit] Definition of the problem

The input of the problem consists of two finite lists u1,...,un and v1,...,vn of words over some alphabet A; with at least two symbols. A solution to this problem is a sequence of indexes i_1, ..., i_k, 1 \le i_j \le n, such that

u_{i_1} \cdots u_{i_k} = v_{i_1} \cdots v_{i_k}.

The decision problem then is to decide whether such a solution exists or not.

[edit] Example of an instance of the problem

Consider the following two lists:

u1 u2 u3 u4
aba bbb aab bb
,
v1 v2 v3 v4
a aaa abab babba

A solution to this problem would be the sequence 1, 4, 3, 1 because

u1u4u3u1 = aba + bb + aab + aba = ababbaababa = a + babba + abab + a = v1v4v3v1

However, if the two lists had consisted of only u1,u2,u3 and v1,v2,v3, then there would have been no solution.

A convenient way to view an instance of a Post correspondence problem is as a collection of blocks of the form

ui
vi

Thus the above example is viewed as

aba
a
,
bbb
aaa
,
aab
abab
,
bb
babba
i = 1

i = 2

i = 3

i = 4

A solution corresponds to some way of laying blocks next to each other so that the string in the top cells corresponds to the string in the bottom cells. Then the solution to the above example corresponds to:

aba
a
,
bb
babba
,
aab
abab
,
aba
a
i1 = 1

i2 = 4

i3 = 3

i4 = 1

[edit] Proof sketch of undecidability

The most common proof for the undecidability of PCP describes an instance of PCP that can simulate the computation of a Turing machine on a particular input. A match will only occur if the input would be accepted by the Turing machine. Because deciding if a Turing machine will accept an input is a basic undecidable problem, PCP cannot be decidable either. The following discussion is based on the proof from section 5.2 of Michael Sipser's seminal textbook Introduction to the Theory of Computation.

In more detail, the idea is that the string along the top and bottom will be a computation history of the Turing machine's computation. This means it will list a string describing the initial state, followed by a string describing the next state, and so on until it ends with a string describing an accepting state. The state strings are separated by some separator symbol (usually written #). According to the definition of a Turing machine, the full state of the machine consists of three parts:

  • The current contents of the tape.
  • The current state of the finite state machine which operates the tape head.
  • The current position of the tape head on the tape.

Although the tape has infinitely many cells, only some finite prefix of these will be non-blank. We write these down as part of our state. To describe the state of the finite control, we create new symbols, labelled q1 through qk, for each of the finite state machine's k states. We insert the correct symbol into the string describing the tape's contents at the position of the tape head, thereby indicating both the tape head's position and the current state of the finite control. For the alphabet {0,1}, a typical state might look something like:

101101110q700110

A simple computation history would then look something like this:

q0101#1q401#11q21#1q810

We start out with this block, where x is the input string and q0 is the start state:

 
q0x#

The top starts out with a "lead" of one state, and keeps this lead until the very end. Next, for each symbol a in the tape alphabet, as well as #, we have a "copy" block, which copies it unmodified from one state to the next:

a
a

We also have a block for each position transition the machine can make, showing how the tape head moves, how the finite state changes, and what happens to the surrounding symbols. For example, here the tape head is over a 0 in state 4, and then writes a 1 and moves right, changing to state 7:

q40
1q7

Finally, when the top reaches an accepting state, the bottom needs a chance to finally catch up to complete the match. To allow this, we extend the computation so that once an accepting state is reached, each subsequent machine step will cause a symbol near the tape head to vanish, one at a time, until none remain. If qf is an accepting state, we can represent this with the following transition blocks, where a is a tape alphabet symbol:

qfa
q'f
a'qf
qf

There are a number of details to work out, such as dealing with boundaries between states, making sure that our initial tile goes first in the match, and so on, but this shows the general idea of how a static tile puzzle can simulate a Turing machine computation.

[edit] Variants

A simple variant is to fix n, the number of tiles. This problem is decidable if n ≤ 2, but remains undecidable for n ≥ 7. It is unknown whether the problem is decidable for 3 ≤ n ≤ 6.

One of the most important variants of the Post correspondence problem is the bounded Post correspondence problem, which asks if we can find a match using no more than k tiles, including repeated tiles. A brute force search solves the problem in time O(2k), but this may be difficult to improve upon, since the problem is NP-complete. Unlike some NP-complete problems like the boolean satisfiability problem, a small variation of the bounded problem was also shown to be complete for RNP, which means that it remains hard even if the inputs are chosen at random (it is hard on average over uniformly distributed inputs).[1] In 2000, solving the problem biologically using DNA was explored, yielding promising results about the capabilities of DNA computation systems [2].

Another variant of PCP is called the marked Post Correspondence Problem, in which each ui must begin with a different symbol, and each vi must also begin with a different symbol. Halava, Hirvensalo, and de Wolf showed in 1999 that this variation is decidable in exponential time. Moreover, they showed that if this requirement is slightly loosened so that only the two-character prefixes need to differ (the so-called 2-marked Post Correspondence Problem), the problem becomes undecidable again.[3]

The condition that the alphabet A have at least two symbols is required since the problem is decidable if A has only one symbol.

[edit] References

  • Emil L. Post. A variant of a recursively unsolvable problem. Bulletin of the American Mathematical Society, volume 52. 1946.

[edit] External links

In other languages