Talk:Unreachable code

From Wikipedia, the free encyclopedia

[edit] Merging in Dead code

I'm merging in the contents from Dead code. I'll post an update here when I'm done. Jamie 21:54, 13 November 2005 (UTC)

Merge done. Jamie 22:48, 13 November 2005 (UTC)

[edit] Perfect unreachability rules are impossible

The article claims that "it is infeasible to choose rules that are equivalent to true unreachability". In fact, it's worse than that — it is impossible in principle. One cannot have a ruleset which, in finite time, correcly finds all unreachable code. This is equivalent to the halting problem. I'm going to change the article to reflect this, once I can figure out how to do so without disrupting the narrative.

[edit] Unreachable vs dead code

I'm a decompilation researcher, and I see unreachable and dead code as completely different. To me, dead code is code generating a definition for which there are no further uses. This happens all the time in decompilation, through the effects of expression propagation (the original expression assignment becomes dead code), and the fact that many machine instructions have semantics that are unused (e.g. it is rare to use both the remainder and the quotient after a divide instruction; it is common to ignore the assignments to the flags after an arithmetic or logical instruction, and so on). By contrast, unreachable code is less common, and this requires control flow analysis.

Surely compilers see dead code (according to my definition), just perhaps less frequently. Let's suppose that your machine code target is a RISC machine, which has three address instructions and no explicit move instructions. Suppose the source code is

 x = a+b;
 printf("I'm about to use %d\n", x);
 useit(x);

Here before optimisation the compiler needs to move x to the register where the second argument is passed, then to the first one. But each move instruction is probably an add or or instruction anyway; if a and b are already available in registers, you would probably optimise this, in the IR (intermediate representation), to effectively

 x = a+b;          /* Now dead code */
 printf("I'm about to use %d\n", a+b);
 useit(a+b); 

This is expression propagation, not used much in compilers, as the idea is usually to make the expressions simpler, not more complex, so that individial machine instructions can match to the simple semantics. But in this case, an add is just as easy as a move, so expression propagation is useful. Now the code for x=a+b is dead. Note it is dead, not unreachable; it effectively does get executed (twice, in fact), but the IR for the original definition of x can be removed, since the result in x is no longer used.

My other concern is that you state right near the start of the article that dead code (through its equivalence to unreachable code) is source code. Surely dead code can arise in the intermediate code of a compiler or decompiler, which has little relationship to the original source code. In the example above, it is really the intermediate code defining x that is dead, and only after an optimisation; by most definitions, the definition of x is not dead at the source code level. Note that when rearranging basic blocks to take care of delay slot instructions and the like you can end up with unreachable code in the IR that is not in the original code (source or binary code). So I would prefer "is source code or intermediate representation", or the like, for both dead and unreachable code. I think it's OK to keep dead and unreachable code in the same article, since they are very similar, and you do the same thing with both (delete them).

Now, I don't want to make a change that only makes sense to a minority of readers; compilation is obviously much more commonly researched than decompilation. So I'd value some comments from the compiler community before making changes to the article. --Mike Van Emmerik 23:22, 12 February 2006 (UTC)