Tagged Command Queuing
From Wikipedia, the free encyclopedia
Tagged Command Queuing (TCQ) is a technology built into certain ATA and SCSI hard drives. It allows the operating system to send multiple read and write requests to a hard drive. ATA TCQ is not identical in function to the more efficient Native Command Queuing (NCQ) used by SATA II drives[1]. SCSI TCQ does not suffer from the same limitations as ATA TCQ.
Before TCQ, an operating system was only able to send one request at a time. In order to boost performance, it had to decide the order of the requests based on its own, possibly incorrect, idea of what the hard drive was doing. With TCQ, the drive can make its own decisions about how to order the requests (and in turn relieve the operating system from having to do so). The result is that TCQ can improve the overall performance of a hard drive.
Contents |
[edit] Overview
Typically, sectors nearest the current location of the drive's read/write head are serviced first. The second-nearest location is serviced second, and so on. The queue is constantly being added to and re-ordered based upon new incoming read/write requests and the new position of the read/write head following the last read/write operation. The reordering algorithm is drive-dependent, and may vary from drive to drive. However, the host computer does not need to know what the algorithm is, or how it works.
This queuing mechanism is also sometimes referred to as "elevator seeking", as the image of the metaphorical elevator with a very limited maximum speed, trying to service multiple requests without excessive amounts of going up and down explains the idea rather well.
For example, assume an elevator is servicing a building with five floors, with the elevator currently residing on the bottom floor. Suppose tenants on the second, fourth, and fifth floors push the elevator call button to leave the building, but in the following order: 5-2-4. If the elevator serviced these requests in the order they were received, it would go to the top floor, then back down to the second floor, then back up to the fourth floor, and then finally back to the first floor to deliver the passengers. This is clearly very inefficient, because floors with waiting passengers are bypassed on the way to other floors. Furthermore, the elevator travels much further overall, increasing wear and tear. Modern elevators will realize this and re-order the service queue to a more efficient order such as 5-4-2.
[edit] Comparison of SCSI TCQ, ATA TCQ, and SATA NCQ
SCSI TCQ is the first popular version of TCQ and is still popular today. The SCSI-3 protocol itself permits 64 bits to be used in the tag field, allowing up to 264 tasks in one task set to be issued before requiring that some of them complete before any more commands be issued[2]. However, different protocols that implement the SCSI protocol might not permit the use of all 64 bits. For example, tradidional parallel SCSI permits 8 bits of tag bits, iSCSI permits up to 32 tag bits, and Fibre Channel permits up to 16 bits of tag with tag FFFF reserved. This flexibility allows the designer of a protocol to trade off between queuing ability and cost. Networks that can be large like iSCSI networks benefit from more tag bits to deal with the larger number of disks in the network and the larger latencies such large networks generate, while small scale networks like parallel SCSI chains do not have enough disks or latency to benefit at all from having many tag bits and therefore are cost-reduced by allowing fewer tag bits. It also allows tasks to be entered into a queue using one of three different modes: head of queue, ordered, and simple[2]. In head of queue mode, a task must be pushed into the front of a queue, even ahead of other head of queue tasks[2]. This mode is unique to SCSI TCQ, and is not found in either ATA TCQ or SATA NCQ [3][1]. However, this mode is not used much because it can cause data starvation when abused. Ordered mode means that this task must execute after all older tasks, and must execute before all newer tasks[2]. Simple mode allows tasks to execute in any order possible that does not violate the constraints on the tasks in the other two modes[2]. After a command in a task is completed, a notification is sent by the device that completed the command to the host bus adapter[2]. Whether or not SCSI TCQ causes massive interrupt overhead depended on the bus being used to connect the SCSI host bus adapter. On PCI, PCI-X, PCI Express, and other buses that permit first party DMA, first party DMA allows for low interrupt overhead. ISA buses forced the SCSI host bus adapter to generate high CPU overhead because the ISA bus requires the CPU to be interrupted in order for it to program the third-party DMA engine to perform transfers, and then required another interrupt to notify the CPU that a task in the queue was finished[1].
ATA TCQ was developed to try to bring the same benefits to ATA drives. It is available in both Parallel ATA and Serial ATA. This effort was not very successful because the ATA bus started out as a reduced pin count ISA bus. The demand of software compatibility required that ATA host bus adapters act like ISA bus devices, and therefore lacked first-party DMA. Therefore, when a drive was ready for a transfer, it had to interrupt the CPU, wait for the CPU to ask the disk what command was ready to execute, respond to that command, wait for the CPU program the host bus adapter's third-party DMA engine based on the result of that command, wait for the third-party DMA engine to execute the command, and then had to interrupt the CPU again to notify it when the DMA engine finished the task so that the CPU could notify the thread that requested the task that it was finished[1]. Since responding to interrupts causes much overhead, CPU utilization rose quickly when ATA TCQ was enabled [1]. Also, since interrupt service time can be unpredictable, there are times when the disk is ready to transfer data but is unable to because it must wait for the CPU[1]. Therefore, this standard was rarely implemented because it caused high CPU utilization without improving performance enough to make this worthwhile[1]. This standard allows up to 32 outstanding commands per device [3].
SATA NCQ is a totally reworked standard compared to ATA TCQ. Like ATA TCQ, it allows up to 32 outstanding commands per device[1]. However, because SATA host bus adapters can support first party DMA, the protocol was rewritten to take advantage of that fact[1]. Instead of interrupting the CPU before the task, the hard drive itself programs the first-party DMA engine integrated into the host bus adapter, and then the DMA engine performs the task[1]. To further reduce the interrupt overhead, the drive can withhold the interrupt with the task completed messages until it gathers many of them to send at once, allowing the CPU to notify many threads at once that their tasks were done[1]. If another task completes after such an interrupt is sent, the host bus adapter can amend the completion messages if they have not been sent to the CPU[1]. This allows the hard disk firmware programmer to set a trade-off between high disk performance and lower CPU utilization by allowing the hard disk firmware programmer to program when to withhold completion messages and when to send them[1].
[edit] See also
[edit] References
- ^ a b c d e f g h i j k l m Dees, Brian (November/December 2005). "Native command queuing - advanced performance in desktop storage" (PDF, fee required). IEEE Potentials 24 (4): 4-7. DOI:10.1109/MP.2005.1549750. ISSN 0278-6648.
- ^ a b c d e f SCSI Architecture Model - 3 (SAM-3) (PDF). Retrieved on 2007-02-24.
- ^ a b 1532D: AT Attachment with Packet Interface - 7 Volume 1 (PDF). 1532D: AT Attachment with Packet Interface - 7. Retrieved on 2007-01-02.
[edit] External links
- Can Command Queuing Turbo Charge SATA? by Patrick Schmid, and Achim Roos of Tom's Hardware Guide