X86 calling conventions
From Wikipedia, the free encyclopedia
- The correct title of this article is x86 calling conventions. The initial letter is shown capitalized due to technical restrictions.
This article describes the calling conventions used on the x86 architecture.
Calling conventions describe the interface of called code:
- in what order parameters are allocated
- where parameters are placed (pushed on the stack or placed in registers)
- whether the caller or the callee is responsible for unwinding the stack on return
A closely related topic is name mangling, which determines how symbol names in the code map to symbol names used by the linker.
It should be noted that there are often subtle differences in how various compilers implement these conventions, so it is often difficult to interface code which is compiled by different compilers. On the other hand, conventions which are used as an API standard (like stdcall) are necessarily very uniformly implemented.
Contents |
[edit] Historical background
In the times of UNIX mainframes, the machine manufacturer also used to provide an OS for it and most of (if not all) the software including a C compiler. So there used to be only one calling convention — the one implemented by the "official" compiler. The IBM PC case was totally different. One firm (IBM) provided the hardware, another (Intel) made the processor, the third (Microsoft) was responsible for the OS (MS-DOS), and many others wrote compilers for quite a number of programming languages. Different mutually exclusive calling schemes were thus designed to satisfy their different requiremenents.
[edit] Caller Clean-Up
In these conventions the caller cleans the arguments from the stack, which allows for variable argument lists, eg. printf().
[edit] cdecl
The cdecl calling convention is used by many C and C++ systems for the x86 architecture. In cdecl, function parameters are pushed on the stack in a right-to-left order. Function return values are returned in the EAX register. Registers EAX, ECX, and EDX are available for use in the function.
For instance, the following C code function prototype and function call:
int function(int, int, int); int a, b, c, x; ... x = function(a, b, c);
will produce the following x86 Assembly code (written in MASM syntax):
push c push b push a call function add esp, 12 ;Stack clearing mov x, eax
The calling function cleans the stack after the function call returns.
The cdecl calling convention is usually the default calling convention for x86 C compilers, although many compilers provide options to automatically change the calling conventions used. To manually define a function to be cdecl, some support the following syntax:
void _cdecl function(params);
The _cdecl modifier must be included in the function prototype, and in the function declaration to override any other settings that might be in place.
[edit] syscall
This is similar to cdecl in that arguments are pushed right to left. EAX, ECX, and EDX are not preserved.
Syscall was the standard calling convention for 32 bit OS/2 API.
[edit] Callee Clean-Up
When the callee cleans the arguments from the stack it needs to be known at compile time how many bytes the stack needs to be adjusted. Therefore, these calling convention are not compatible with variable argument lists, eg. printf(). They may be, however, slightly more efficient as the code needed to unwind the stack does not need to be generated by the calling code.
Functions which utilize these conventions are easy to recognize in ASM code because they will unwind the stack prior to returning. The x86 ret instruction allows an optional byte parameter that specifies the number of stack locations to unwind before returning to the caller. Such code looks like this:
ret 12
[edit] pascal
The parameters are pushed on the stack in left-to-right order (opposite of cdecl), and the callee is responsible for balancing the stack before return.
This calling convention was common in the following 16 bit APIs OS/2 1.x , Microsoft Windows 3.x, and Borland Delphi version 1.x.
[edit] register
This convention is similar to pascal except that up to three registers (eax, edx, ecx) are used for the parameters in preference to the stack.
Register calling convention is the default of Borland Delphi.
[edit] stdcall
The stdcall[1] calling convention is a variation on the pascal calling convention in which parameters are passed right-to-left. Registers EAX, ECX, and EDX are preserved for use within the function. Return values are stored in the EAX register.
Stdcall is the standard calling convention for the Microsoft WIN32 API.
[edit] fastcall
Conventions entitled fastcall have not been standardized, and have been implemented differently, depending on the compiler vendor.
[edit] Microsoft fastcall
- Microsoft or GCC [2] __fastcall[3] convention (aka __msfastcall) passes the first TWO arguments which fit (evaluated left to right) into ECX and EDX. Remaining arguments are pushed onto the stack from right to left.
[edit] Borland fastcall
Evaluating arguments from left to right, it passes three arguments via EAX, EDX, ECX. Remaining arguments are pushed onto the stack, also left to right
[edit] Watcom register based calling convention
Watcom does not support the __fastcall keyword except to alias it to null. The register calling convention may be selected by command line switch. (However, IDA uses __fastcall anyway for uniformity)
Up to 4 registers are assigned to arguments in the order eax, edx, ebx, ecx. Arguments are assigned to registers from left to right. If any argument cannot be assigned to a register (say it is too large) it, and all subsequent arguments, are assigned to the stack. Arguments assigned to the stack are pushed from right to left. Names are mangled by adding a suffixed underscore.
Variadic functions fall back to the Watcom stack based calling convention.
The Watcom C/C++ compiler also uses the #pragma aux[4] directive that allows you to specify your own calling convention. As its manual states, "Very few users are likely to need this method, but if it is needed, it can be a lifesaver".
[edit] safecall
In Borland Delphi on Microsoft Windows, the safecall calling convention encapsulates COM (Component Object Model) error handling, so that exceptions aren't leaked out to the caller, but are reported in the HRESULT return value, as required by COM/OLE. When calling a safecall function from Delphi code, Delphi also automatically checks the returned HRESULT and raises an exception if necessary. Together with language-level support for COM interfaces and automatic IUnknown handling (implicit AddRef/Release/QueryInterface calls), the safecall calling convention makes COM/OLE programming in Delphi very nice and elegant.
[edit] Either Caller or Callee Clean-Up
[edit] thiscall
This calling convention is used for calling C++ non-static member functions. There are two primary versions of thiscall used depending on the compiler and whether or not the function uses variable arguments.
For the GCC compiler, thiscall is almost identical to cdecl: the calling function cleans the stack, and the parameters are passed in right-to-left order. The difference is the addition of the this pointer, which is pushed onto the stack last, as if it were the first parameter in the function prototype.
On the Microsoft Visual C++ compiler, the this pointer is passed in ECX and it is the callee that cleans the stack, mirroring the stdcall convention used in C for this compiler and in Windows API functions. When functions use a variable number of arguments, it is the caller that cleans the stack (cf. cdecl).
The thiscall calling convention can only be explicitly specified on Microsoft Visual C++ 2005 and later. On any other compiler thiscall is not a keyword. (Disassemblers like IDA, however, have to specify it anyway. So IDA uses keyword __thiscall__ for this)
[edit] Intel ABI
The Intel Application Binary Interface is a computer programming standard that most compilers and languages follow. According to the Intel ABI, the EAX, EDX, and ECX are to be free for use within a procedure or function, and need not be preserved.
[edit] Microsoft x64 calling convention
The x86-64 or x64 calling convention takes advantage of additional register space in the AMD64 / Intel64 platform. The registers RCX, RDX, R8, R9 are used for integer and pointer arguments, and XMM0, XMM1, XMM2, XMM3 are used for floating point arguments. Additional arguments are pushed onto the stack. The return value is stored in RAX.
Note that when compiling for the x64 architecture using Microsoft tools, there is only one calling convention -- the one described here, so that stdcall, thiscall, cdecl, fastcall, etc., are now all one and the same.
On x86, one could create thunks that convert any function call from stdcall to thiscall by placing the 'this' pointer in ECX and jumping to to the member function address. On x64 a universal stdcall-to-thiscall thunk cannot be written, except for functions that take no arguments. Putting the implicit 'this' in place requires shifting all the arguments, whose number and sizes are unknown.
In the Microsoft x64 calling convention, it's the caller's responsibility to allocate 32 bytes of "shadow space" on the stack right before calling the function (regardless of the actual number of parameters used), and to pop the stack after the call. The shadow space is used to spill RCX, RDX, R8, and R9.
[edit] AMD64 ABI convention
The calling convention of the AMD64 application binary interface is followed on Linux and other non-Microsoft operating systems. The registers RDI, RSI, RDX, RCX, R8 and R9 are used for integer and pointer arguments while XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for floating point arguments. As in the Microsoft x64 calling convention, additional arguments are pushed onto the stack and the return value is stored in RAX.
[edit] Standard Exit and Entry Sequences for C Code
The Standard Entry Sequence to a function is as follows:
_function: push ebp ;store the old base pointer mov ebp, esp ;make the base pointer point to the current stack location - at ;the top of the stack is the old ebp, followed by the return ;address and then the parameters. sub esp, x ;x is the size, in bytes, of all "automatic variables" ;in the function
This sequence preserves the original base pointer ebp; points ebp to the current stack pointer (which points at the old ebp, followed by the return address and then the function parameters); and then creates space for automatic variables on the stack. Local variables are created on the stack with each call to the function, and are cleaned up at the end of each function. This behavior allows for functions to be called recursively. In C and C++, variables declared "automatic" are created in this way.
The Standard Exit Sequence goes as follows:
mov esp, ebp ;reset the stack to "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function
The following C function:
int _cdecl MyFunction(int i){ int k; return i + k; }
would produce the equivalent asm code:
;entry sequence push ebp mov ebp, esp sub esp, 4 ;create function stack frame ;function code mov eax, [ebp + 8] ;move parameter i to accumulator add eax, [ebp - 4] ;add k to i ;answer is returned in eax ;exit sequence mov esp, ebp pop ebp ret
Note that many compilers can optimize these standard sequences away when not needed. (often called "no stackframe generation") If you require them for e.g. interlanguage interfacing, you probably need to search your compiler manual for a compiler directive (or pragma) to turn this kind of optimalizations locally off.
[edit] External links
- The Code Project—Calling Conventions Demystified
- About Calling conventions
- Intel x86 Function-call Conventions - Assembly View
- Microsoft x64 Calling Convention
- Calling Conventions on x86 by Agner Fog (pdf)
- AMD64 ABI (pdf)
- The Old New Thing — the history of calling conventions (by Raymond Chen) — Part1, Part2, Part3, Part4(ia64), Part5(amd64)