Talk:Very long instruction word

From Wikipedia, the free encyclopedia

Does the VLIW architecture solve branch prediction problems? The article implies that it does, but as far as I know, VLIW does not help at all in that regard.

Sort of. Most VLIW processors will never make an incorrect branch prediction. Not because the branch-prediction hardware is supernaturally intelligent, but because it doesn't *have* branch-prediction hardware. Branch prediction has been moved into the compiler. Ordinary superscalar machines predict branches to keep all their functional units busy. VLIW machines, on the other hand, rely on the compiler to explicitly tell the processor exactly what every functional unit is doing at any instant -- all packed into a single instruction (the Very Long Instruction Word).
In particular, when a program has a if-then-else, normal superscalar machines will guess whether the condition is true or false and start speculatively executing the appropriate instructions. If it later finds out it guessed wrong, it cancels the effect of all those instructions and starts the other side from the beginning.
A compiler for a VLIW machine will schedule the instructions for the "true" side of the condition into some of the functional units, and the "false" side of the condition into other functional units, so both sides get executed simultaneously. Later the compiler explicitly schedules instructions that cancel the effect of the "wrong" side.
--DavidCary 05:44, 20 Jul 2004 (UTC)
David, you're half right.  :-) What you describe is "if-conversion," and is intended to eliminate short branches by replacing them with compiler-controlled speculation. On machines with many "predicate registers" (the registers that hold the conditions that determine whether the instructions execute), you can have many independent streams of execution scheduled in parallel. Scheduling algorithms such as trace scheduling rely on this heavily.
That doesn't mean there are no branches, nor that they might be candidates for hardware prediction. Branch prediction addresses the inherent pipeline latency involved in a branch. There are some number of cycles between when the machine first computes the condition that controls a conditional branch and when the branch target instructions arrive at the functional units. Flushing the pipeline is the simplest and least-performant strategy for handling these branches. Many VLIWs (such as the one I'm most familiar with, TI's C6x family of DSPs) expose the branch delay, allowing code to execute in those slots. The code may be code that logically appeared before the branch prior to instruction scheduling, or some mixture of "fall-thru" vs. "branch-target" code, suitably predicated. Still others variations on VLIW, such as Intel's EPIC, rely on branch prediction and static prediction hints to attempt to eliminate stalls due to the branch delay.
The exposed delay slot case is an interesting case, because it leaves everything to the compiler. In many cases, the compiler can fill the delay slots of the branch with code that resides at the branch target--a practice sometimes referred to as branch delay slot stuffing. There are some very important cases where the compiler cannot do this easily: Function calls and returns. If the compiler has available the text of the called function, it may be able to pull portions of the target function's code into the call's delay slots. On the return path, however, unless the compiler can prove that the function always returns to the same place (which may be true in the case of tail-call optimization), it has no text available to pull into those delay slots.
So, to sum up, I'd say that it's not fair at all to say that VLIWs have "solved branch prediction" or that "they don't need/have it," but rather have made the problem "different."
--Mr z 23:24, 16 May 2006 (UTC)
On reading David's segment a little closer, I should provide a minor mea culpa: There are two common ways that I am aware of to implement if-conversion: Speculative computation with fixup code, and predicated execution. In the "fixup-code" case, you might have a conditional move instruction or similar that commits the speculatively computed number to its final destination. In the "predication" case, you have an additional field in the instruction opcode or similar that specifies that the instruction is conditional based on some condition register (also known as a predicate register). Predication effectively adds an "if (cond)" in front of the instruction.
CPUs such as TI's C6000 DSP and Intel's EPIC implement predication. Other machines (including x86--a non-VLIW CPU) implement conditional moves. Predication is usually more general, since it can be applied to memory references as well as normal computation.
--Mr z 23:40, 16 May 2006 (UTC)

This page is missing any mention of Cydrome, the other company that was pioneering VLIW concepts. Dyl 21:38, Aug 14, 2004 (UTC)

Since a Multiflow article was recently created, it might be better to move company specific information there, leaving the VLIW article a little more general. As I previously mentioned, Cydrome was a company pioneering VLIW concepts in the same timeframe as Multiflow, but there's no mention of that in the article. Dyl 21:29, August 9, 2005 (UTC)

I think it could make sense to capitalize the title, as it is usually seen. Nicolas1981 20:37, 30 January 2006 (UTC)

Ugh... I wish I had more time. I would love to step back, refactor this article, and write a more comprehensive survey of VLIW architectures and related technologies. I have been involved with TI's C6000 DSP architecture development over the last several years, and I feel I have concrete expertise to contribute in this space. So far, I've made a few edits around the edges. Perhaps I'll find time to make more later. --Mr z 01:22, 17 May 2006 (UTC)

This article really could use a rewrite. Anyway, Pizzadeliveryboy added "also known as static superscalar or compile-time superscalar" right upfront. I'm taking the libery of removing this for several reasons. (1) "static superscalar" is an all-but-trademarked term for the TigerSharc VLIW, and one doesn't hear "compiled superscalar". One could make up lots of good terms, many better than VLIW, but that doesn't mean they're used. (2) More importantly, these terms make VLIW sound like a variant of superscalar--just as one wouldn't say "dynamic VLIW" for superscalar. It's its own thing, with similarities and differences. --Josh 18:53, 6 June 2006 (UTC)

Someone changed VLIW to stand for Very Large Instruction Word. It's long, not large. See the term as initially defined in 1983 ISCA paper. Some people do say "large", but it is incorrect. Really, this article could use a total rewrite. People have been making in incrementally better in the Wiki way, but, man, this needs a fresh start. I'll try to get to it sometime.