Time Stamp Counter

The Time Stamp Counter is a 64-bit register present on all x86 processors since the Pentium. It counts the number of cycles since reset. The instruction "RDTSC" returns the TSC in EDX:EAX. In x86_64 mode, RDTSC also clears the higher 32 bits of RAX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII.

The time stamp counter has, until recently, been an excellent high-resolution, low-overhead way of getting CPU timing information. With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, the TSC cannot be relied on to provide accurate results — unless great care is taken to correct the possible flaws: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). In those latter cases, to stay relevant, the counter must be recalibrated periodically (according to the time resolution your application requires).

Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors, the TSC reads at the processor's maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.

Under Windows platforms, Microsoft strongly discourages using the TSC for high-resolution timing for exactly these reasons, providing instead the Windows APIs QueryPerformanceCounter and QueryPerformanceFrequency.[2] Under *nix, similar functionality is provided by reading the value of CLOCK_MONOTONIC clock using POSIX clock_gettime function.

Starting with the Pentium Pro, Intel processors have supported out-of-order execution, where instructions are not necessarily performed in the order they appear in the executable. This can cause RDTSC to be executed later than expected, producing a misleading cycle count.[3] This problem can be solved by executing a serializing instruction, such as CPUID, to force every preceding instruction to complete before allowing the program to continue, or by using the RDTSCP instruction, which is a serializing variant of the RDTSC instruction (starting from Core i7[4] and starting from AMD Athlon 64 X2 CPUs with AM2 Socket (Windsor & Brisbane)).

Contents

Implementation in various processors

Intel processor families increment the time-stamp counter differently:[5]

The specific processor configuration determines the behavior. Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward for all Intel processors.

AMD processors up to the K8 core always incremented the time-stamp counter every clock cycle.[6] Thus, power management features were able to change the number of increments per second, and the values could get out of sync between different cores or processors in the same system. For Windows, AMD provides a utility[7] to periodically synchronize the counters on multiple core CPUs. Since the family 10h (Barcelona/Phenom), AMD chips feature a constant TSC, which can be driven either by the Hypertransport speed or the highest P state. A CPUID bit (Fn8000_0007:EDX_8) advertises this.

Other processors also have registers which count CPU clock cycles, but with different names. For instance, on the AVR32, it is called the "Performance Clock Counter" (PCCNT) register. SPARCv9 provides the TICK register.

Operating system support

The RDTSC instruction can be enabled or disabled by operating systems. For example, on some versions of the Linux kernel, seccomp sandboxing mode disables RDTSC.[8] It can also be disabled using the PR_SET_TSC argument to the prctl() syscall.[9]

Examples

C++

GNU C++

#ifdef __cplusplus
#include <cstdint>
#else
#include <stdint.h>
#endif
 
extern "C" {
  __inline__ uint64_t rdtsc(void) {
    uint32_t lo, hi;
    __asm__ __volatile__ (      // serialize
    "xorl %%eax,%%eax \n        cpuid"
    ::: "%rax", "%rbx", "%rcx", "%rdx");
    /* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return (uint64_t)hi << 32 | lo;
  }
}

Microsoft Visual C++

#include <intrin.h>
unsigned __int64 rdtsc(void)
{
  return __rdtsc();
}

D

ulong rdtsc()
{
  asm
  {
    naked;
    rdtsc;
    ret;
  }
}

Delphi / Object Pascal

{$asmmode intel}
function RDTSC: comp;
var
  TimeStamp: record case byte of
    1: (Whole: comp);
    2: (Lo, Hi: Longint);
  end;
begin
  asm
    db $0F; db $31;
    mov [TimeStamp.Lo], eax
    mov [TimeStamp.Hi], edx
  end;
  RDTSC := TimeStamp.Whole;
end;

In more recent versions of Delphi or Free Pascal you can also use:

function RDTSC: Int64; register;
asm
  rdtsc
end;

FreeBasic

Function ReadTSC() As uLongInt
  Asm
    rdtsc
    mov [function], eax
    mov [function+4], edx
  End Asm
End Function

See also

References

External links