Cell software development

From Wikipedia, the free encyclopedia

This sub-article is under construction.
A recent, fully-intact main article can be viewed at oldid=57781754.

Contributions are welcome at cell microprocessor.
If you wish to help with the creation of this subarticle, please see Talk:cell microprocessor to avoid creating edit conflicts.

Software development for the cell microprocessor involve a mixture of conventional development practices for the POWER architecture-compatible PPU core, and novel software development challenges with regards to the functionally reduced SPU coprocessors.

Contents

[edit] Cell SDK

Cell BE
architecture
software
development
fabrication

[edit] Full system simulator

[edit] GNU compiler toolchain

[edit] IBM XL C/C++

[edit] IBM Octopiler

[edit] References

[edit] Linux on cell

[edit] Software portability

[edit] Adapting VMX for SPU

[edit] Differences between VMX and SPU

The [VMX] technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences.

VMX to SPU Comparison[1]
unfinished
feature VMX SPU
word size 32 bits 32 bits
number of registers 32 128
register width 128 bit quadword 128 bit quadword
integer formats 8, 16, 32 8, 16, 32
saturation support yes no
byte ordering big (default), little big endian
floating point modes Java, non-Java single precision, IEEE double
memory alignment quadword only quadword only

The VMX Java mode conforms to the Java Language Specification 1 subset of the default IEEE standard, extended to include IEEE and C9X compliance where the Java standard falls silent. In a typical implementation, non-java mode converts denormal values to zero but java mode traps into an emulator when the processor encounters such a value. Non-Java mode might or might not be faster, might or might not be non-compliant.

Quadword (ie Four times a 32 bit word or 128 bits) alignment is on 16 Byte (128 bit) boundaries (ie the low four address bits are zero).

The IBM PPE Vector/SIMD manual does not define operations for double precision floating point, though IBM has published material implying certain double precision performance numbers associated with the Cell PPE VMX technology.

[edit] Intrinsics

to do!

[edit] Porting VMX code for SPU

There is a great body of code which has been developed for other IBM Power processors that could potentially be adapted and recompiled to run on the SPU. This code base includes VMX code that runs under the PowerPC version of Apple's Mac OS X, where it is better known as Altivec. Depending on how many VMX specific features are involved, the adaptation involved can range anywhere from straightforward, to onerous, to completely impractical. The most important workloads for the SPU generally map quite well.

In some cases it is possible to port existing VMX code directly. If the VMX code is highly generic (makes few assumptions about the execution environment) the translation can be relatively straightforward. The two processors specify a different binary code format, so recompilation is required at a minimum. Even where instructions exist with the same behaviours, they do not have the same instruction names, so this must be mapped as well. IBM provides compiler intrinsics which take care of this mapping transparently as part of the development toolkit.

In many cases, however, a directly equivalent instruction does not exist. The workaround might be obvious or it might not. For example, if saturation behaviour is required on the SPU, it can be coded by adding additional SPU instructions to accomplish this (with some loss of efficiency). At the other extreme, if Java floating point semantics are required, this is almost impossible to achieve on the SPU processor. To achieve the same computation on the SPU might require an entirely different algorithm which needs to be written from scratch.

The most important conceptual similarity between VMX and the SPU architecture is supporting the same vectorization model. For this reason, mosts algorithms successfully adapted to Altivec will usually adapt successfully to the SPU architecture as well.

[edit] Compiler-mediated parallelism

[edit] References