XDR DRAM or extreme data rate dynamic random access memory is a high-performance RAM interface and successor to the Rambus RDRAM it is based on, competing with the rival DDR2 SDRAM and GDDR4 technology. XDR was designed to be effective in small, high-bandwidth consumer systems, high-performance memory applications, and high-end GPUs. It eliminates the unusually high latency problems that plagued early forms of RDRAM. Also, the XDR DRAM have heavy emphasis on per pin bandwidth, which can benefit further cost control on PCB production. This is because fewer lanes are needed for the same amount of bandwidth. Rambus owns the rights to the technology. XDR is used by Sony in the PlayStation 3 console.[1]
Contents |
An XDR RAM chip's high-speed signals are a differential clock input (clock from master, CFM/CFMN), a 12-bit single-ended request/command bus (RQ11..0), and a bidirectional differential data bus up to 16 bits wide (DQ15..0/DQN15..0). The request bus may be connected to several memory chips in parallel, but the data bus is point to point; only one RAM chip may be connected to it. To support different amounts of memory with a fixed-width memory controller, the chips have a programmable interface width. A 32-bit-wide DRAM controller may support 2 16-bit chips, or be connected to 4 memory chips each of which supplies 8 bits of data, or up to 16 chips configured with 2-bit interfaces.
In addition, each chip has a low-speed serial bus used to determine its capabilities and configure its interface. This consists of three shared inputs: a reset line (RST), a serial command input (CMD) and a serial clock (SCK), and serial data in/out lines (SDI and SDO) that are daisy-chained together and eventually connect to a single pin on the memory controller.
All single-ended lines are active-low; an asserted signal or logical 1 is represented by a low voltage.
The request bus operates at double data rate relative to the clock input. Two consecutive 12-bit transfers (beginning with the falling edge of CFM) make a 24-bit command packet.
The data bus operates at 8x the speed of the clock; a 400 MHz clock generates 3200 MT/s. All data reads and writes operate in 16-transfer bursts lasting 2 clock cycles.
Request packet formats are as follows:
Clock edge |
Bit | NOP | Column read/write | Calibrate/power-down | Precharge/refresh | Row Activate | Masked write | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bit | Bit | Description | Bit | Description | Bit | Description | Bit | Description | Bit | Description | ||||||||
↓ | RQ11 | 0 | 0 | COL opcode | 0 | COLX opcode | 0 | ROWP opcode | 0 | ROWA opcode | 1 | COLM opcode | ||||||
↓ | RQ10 | 0 | 0 | 0 | 0 | 1 | M3 | Write mask low bits |
||||||||||
↓ | RQ9 | 0 | 0 | 1 | 1 | R9 | Row address high bits |
M2 | ||||||||||
↓ | RQ8 | 0 | 1 | 0 | 1 | R10 | M1 | |||||||||||
↓ | RQ7 | x | WRX | Write/Read bit | x | reserved | POP1 | Precharge delay (0..3) |
R11 | M0 | ||||||||
↓ | RQ6 | x | C8 | Column address high bits |
x | POP0 | R12 | reserved | C8 | Column address high bits |
||||||||
↓ | RQ5 | x | C9 | x | x | reserved | R13 | C9 | ||||||||||
↓ | RQ4 | x | C10 | reserved | x | x | R14 | C10 | reserved | |||||||||
↓ | RQ3 | x | C11 | XOP3 | Subopcode | x | R15 | C11 | ||||||||||
↓ | RQ2 | x | BC2 | Bank address | XOP2 | BP2 | Precharge bank | BA2 | Bank address | BC2 | Bank address | |||||||
↓ | RQ1 | x | BC1 | XOP1 | BP1 | BA1 | BC1 | |||||||||||
↓ | RQ0 | x | BC0 | XOP0 | BP0 | BA0 | BC0 | |||||||||||
↑ | RQ11 | x | DELC | Command delay (0..1) | x | reserved | POP2 | Precharge enable | DELA | Command delay (0..1) | M7 | Write mask high bits |
||||||
↑ | RQ10 | x | x | reserved | x | ROP2 | Refresh command | R8 | Row address low bits |
M6 | ||||||||
↑ | RQ9 | x | x | x | ROP1 | R7 | M5 | |||||||||||
↑ | RQ8 | x | x | x | ROP0 | R6 | M4 | |||||||||||
↑ | RQ7 | x | C7 | Column address low bits |
x | DELR1 | Refresh delay (0..3) |
R5 | C7 | Column address low bits |
||||||||
↑ | RQ6 | x | C6 | x | DELR0 | R4 | C6 | |||||||||||
↑ | RQ5 | x | C5 | x | x | reserved | R3 | C5 | ||||||||||
↑ | RQ4 | x | C4 | x | x | R2 | C4 | |||||||||||
↑ | RQ3 | x | SC3 | Sub-column address | x | x | R1 | SC3 | Sub-column address | |||||||||
↑ | RQ2 | x | SC2 | x | BR2 | Refresh bank | R0 | SC2 | ||||||||||
↑ | RQ1 | x | SC1 | x | BR1 | SR1 | Sub-row address | SC1 | ||||||||||
↑ | RQ0 | x | SC0 | x | BR0 | SR0 | SC0 |
There are a large number of timing constraints giving minimum times that must elapse between various commands (see Dynamic random access memory: Memory timing); the DRAM controller sending them must ensure they are all met.
Some commands contain delay fields. These delay the effect of the command by the given number of clock cycles. This permits multiple commands (to different banks) to take effect on the same clock cycle.
This operates equivalently to standard SDRAM's activate command, specifying a row address to be loaded into the bank's sense amplifier array. To save power, a chip may be configured to only activate a portion of the sense amplifier array. In this case, the SR1..0 bits specify the half or quarter of the row to activate, and following read/write commands' column addresses are required to be limited to that portion. (Refresh operations always use the full row.)
These operate analogously to a standard SDRAM's read or write commands, specifying a column address. Data is provided to the chip a few cycles after a write command (typically 3), and is output by the chip several cycles after a read command (typically 6). Just as with other forms of SDRAM, the DRAM controller is responsible for ensuring that the data bus is not scheduled for use in both directions at the same time. Data is always transferred in 16-transfer bursts, lasting 2 clock cycles. Thus, for a ×16 device, 256 bits (32 bytes) are transferred per burst.
If the chip is using a data bus less than 16 bits wide, one or more of the sub-column address bits are used to select the portion of the column to be presented on the data bus. If the data bus is 8 bits wide, SC3 is used to identify which half of the read data to access; if the data bus is 4 bits wide, SC3 and SC2 are used, etc.
Unlike conventional SDRAM, there is no provision for choosing the order in which the data is supplied within a burst. Thus, it is not possible to perform critical-word-first reads.
The masked write command is similar to a normal write, but no command delay is permitted and a mask byte is supplied, which permits controlling which 8-bit fields are written. This is not a bitmap indicating which bytes are to be written; it would not be large enough for the 32 bytes in a write burst. Rather, it is a bit pattern which the DRAM controller fills unwritten bytes with. The DRAM controller is responsible for finding a pattern which does not appear in the other bytes that are to be written. Because there are 256 possible patterns and only 32 bytes in the burst, it is straightforward to find one. Even when multiple devices are connected in parallel, a mask byte can always be found when the bus is at most 128 bits wide. (This would produce 256 bytes per burst, but a masked write command is only used if at least one of them is not to be written.)
Each byte is the 8 consecutive bits transferred across one data line during a particular clock cycle. M0 is matched to the first data bit transferred during a clock cycle, and M7 is matched to the last bit.
This convention also interferes with performing critical-word-first reads; any word must include bits from at least the first 8 bits transferred.
This command is similar to a combination of a conventional SDRAM's precharge and refresh commands. The POPx and BPx bits specify a precharge operation, while the ROPx, DELRx, and BRx bits specify a refresh operation. Each may be separately enabled. If enabled, each may have a different command delay and must be addressed to a different bank.
Precharge commands may only be sent to one bank at a time; unlike a conventional SDRAM, there is no "precharge all banks" command.
Refresh commands are also different from a conventional SDRAM. There is no "refresh all banks" command, and the refresh operation is divided into separate activate and precharge operations so the timing is determined by the memory controller. The refresh counter is also programmable by the controller. Operations are:
This command performs a number of miscellaneous functions, as determined by the XOPx field. Although there are 16 possibilities, only 4 are actually used. Three subcommands start and stop output driver calibration (which must be performed periodically, every 100 ms).
The fourth subcommand places the chip in power-down mode. In this mode, it performs internal refresh and ignores the high-speed data lines. It must be woken up using the low-speed serial bus.
XDR DRAMs are probed and configured using a low-speed serial bus. The RST, SCK, and CMD signals are driven by the controller to every chip in parallel. The SDI and SDO lines are daisy-chained together, with the last SDO output connected to the controller, and the first SDI input tied high (logic 0).
On reset, each chip drives its SDO pin low (1). When reset is released, a series of SCK pulses are sent to the chips. Each chip drives its SDO output high (0) one cycle after seeing its SDI input high (0). Further, it counts the number of cycles that elapse between releasing reset and seeing its SDI input high, and copies that count to an internal chip ID register. Commands sent by the controller over the CMD line include an address which must match the chip ID field.
Each command either reads or writes a single 8-bit register, using an 8-bit address. This allows up to 256 registers, but only the range 1–31 is currently assigned.
Normally, the CMD line is left high (logic 0) and SCK pulses have no effect. To send a command, a sequence of 32 bits is clocked out over the CMD lines:
1100
, a command start signal.
|