RDTSC

From Wikipedia, the free encyclopedia

In the x86 assembly language, the RDTSC instruction is a mnemonic for read time stamp counter[1]. The instruction returns a 64-bit value in registers EDX:EAX that represents the count of ticks from processor reset. The instruction was introduced in the Pentium line of processors. Its opcode is 0F 31. This instruction was not formally part of the X86 assembly language at Pentium and was recommended for use by expert users or System Programmers only (Intel reference required). Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider this instruction illegal. Use of this instruction in Linux distributions precludes Linux from booting where the CPU does not support RDTSC. Cyrix included a Time Stamp Counter in their MII CPU architecture as RDTSC was formally included in the X86 assembly language in Pentium II.

The RDTSC instruction has, until recently, been an excellent high-resolution, low-overhead way of getting CPU timing information. With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, RDTSC often no longer provides reliable results. The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no longer any promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. So, you can no longer get reliable timestamp values unless you lock your program to using a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). Also it makes the program not portable to anything other than x86. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors the TSC reads at the processors maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.

Under Windows platforms, Microsoft strongly discourages using RDTSC for high-resolution timing for exactly these reasons, providing instead the Windows APIs QueryPerformanceCounter and QueryPerformanceFrequency. [2]

Starting with the Pentium Pro, Intel processors have supported out-of-order execution, where instructions are not necessarily performed in the order they appear in the executable. This can cause RDTSC to be executed later than expected, producing a misleading cycle count.[3] This problem can be solved by executing a serializing instruction, such as CPUID, to force every preceding instruction to complete before allowing the program to continue.

Contents

[edit] Implementation in Various Processors

Intel processor families increment the time-stamp counter differently:[4]

  • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle.The internal processor clock cycle is determined by the current core-clock to busclock ratio. Intel SpeedStep technology transitions may also impact the processor clock.
  • For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors (family [06H], model [0FH]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the maximum qualified frequency of the processor, see Section 18.17.5 for more detail

    The specific processor configuration determines the behavior. Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward.

AMD processors always increment the time-stamp counter every clock cycle.[5] Thus, power management features can change the number increments per second, and the values can get out of sync between different cores or processors in the same system. For Windows, AMD provides a utility[6] to periodically synchronize the counters on multiple core CPUs.

[edit] Examples of using it

[edit] C

GNU C++

#include <stdint.h>
extern "C" {
   __inline__ uint64_t rdtsc() {
   uint32_t lo, hi;
   __asm__ __volatile__ (      // serialize
     "xorl %%eax,%%eax \n        cpuid"
     ::: "%rax", "%rbx", "%rcx", "%rdx");
   /* We cannot use "=A", since this would use %rax on x86_64 */
   __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
   return (uint64_t)hi << 32 | lo;
   }
}

Microsoft Visual C++

__declspec(naked)
unsigned __int64 __cdecl rdtsc(void)
{
   __asm
   {
      rdtsc
      ret       ; return value at EDX:EAX
   }
}

[edit] D

ulong rdtsc()
{
    asm
    {
        naked;
        rdtsc;
        ret;
    }
}

[edit] Pascal / Delphi

function RDTSC: comp;
var TimeStamp: record case byte of
                 1: (Whole: comp);
                 2: (Lo, Hi: Longint);
               end;
begin
  asm
    db $0F; db $31;
    mov [TimeStamp.Lo], eax
    mov [TimeStamp.Hi], edx
  end;
  Result := TimeStamp.Whole;
end;

In more recent versions of Delphi you can also use:

function RDTSC: Int64; register;
asm
  rdtsc
end;

[edit] FreeBASIC

 Function ReadTSC() As uLongInt
   Asm
     rdtsc
     mov [function], eax
     mov [function+4], edx
   End Asm
 End Function

[edit] See also

[edit] References

  1. ^ Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2B: Instruction Set Reference, N-Z, [1], Pp. 251–252
  2. ^ Game Timing and Multicore Processors [2], Pp. 251–252
  3. ^ Using the RDTSC Instruction for Performance Monitoring[3]
  4. ^ Intel 64 and IA-32 Architectures Software Developer's Manual [4], Volume 3B, Chapter 18
  5. ^ AMD64 Architecture Programmer's Manual [5] Volume 3
  6. ^ AMD Dual-Core Optimizer [6]

[edit] External links

  • cycle.h - C code to read the high-resolution timer on many CPUs and compilers.