Skip to content

x86-64 Code Models

AMD64 does not allow an instruction to encode arbitrary 64-bit constants as immediate operand. Most instructions accept 32-bit immediates that are sign extended to 64-bit.

  • 32-bit operations with register destinations implicitly perform zero extension.

  • Branch instructions accept 32-bit immediate operands that are sign extended.

This can be due to the following reasons (I could not find the exact reason documented) :

  1. The immediate operands of instructions has not been extended to 64 bits to keep instruction size smaller, instead they remain 32-bit sign extended: Source
  2. When using 64-bit immediates, then there would be no space for displacement, hence instructions like jmp would not work (This is what I could get from chatgpt, not sure tho. I have to check this claim!).

So based on the requirements and to reduce the size / improve the performance of the program, several code models are present.

Code Models

Code models usually tell the compiler how far the code and data might be from each other. These define constraints for symbolic values that allow the compiler to generate better code.

  • These models differ in: addressing, code size, data size and address range. To use a particular code model, you can compile your program in GCC using -mcmodel= flag (Source).
  • In terms of instructions, addressing methods and steps for mov and call will change.
  • The System V ABI describes the following code models: Small, Medium, Large, Kernel, Small PIC, Medium PIC, Large PIC.
ModelAddress Range / LayoutAddressing
Small (Default)Code and data in lower 2 GB of address spaceRIP-relative addressing for code and data (within ±2 GB). Pointers are still 64-bit
KernelCode and data in upper (negative) 2 GB of address spaceRIP-relative (same limits as small model)
MediumCode in lower 2 GB, small data in lower 2 GB.
Large data > 2 GB (or larger than -mlarge-data-threshold) in large data sections like .ldata, .lrodata, .lbss .
Mixed: RIP-relative for code/small data, movabs (absolute) for large data
LargeCode and data anywhere in address spaceAbsolute addressing using movabs
Small PICCode and data in lower 2 GB, accessed via GOT if neededRIP-relative for nearby symbols; GOT-based indirect addressing for globals
Medium PICCode in lower 2 GB, data can be anywhereCode: RIP-relativeData: via GOT using base in %r15
Large PICCode and data anywhereAll addressing via GOT/PLT (absolute built from %rip + offset)

Addressing in Action

Let’s compile the following program on Godbolt with the compiler version x86-64 gcc 15.2 and argument: as given below with the programs. We will focus only a small part of the program

Program (taken from the System V ABI Manual) :

extern int src[65536];
extern int dst[65536];
extern int *ptr;

static int lsrc[65536];
static int ldst[65536];
static int *lptr;

int main() {
    dst[0] = src[0];
    ptr = dst;
    *ptr = src[100];
    ldst[0] = lsrc[0];
    lptr = ldst;
    *lptr = lsrc[0];
}

Small, Medium and Large Model

  • Argument: -mcmodel=small -mlarge-data-threshold=65535
main:
    ...
    ; ptr = dst
    movq    $dst, ptr(%rip)

    ; *ptr = src[100]
    movq    ptr(%rip), %rax
    movl    src+400(%rip), %edx
    movl    %edx, (%rax)
    ...
    

Small, Medium and Large PIC Model

  • Argument: -mcmodel=small -fPIC -mlarge-data-threshold=65535
main:
    ...
    ; ptr = dst
    movq    ptr@GOTPCREL(%rip), %rax
    movq    dst@GOTPCREL(%rip), %rdx
    movq    %rdx, (%rax)

    ; *ptr = src[100] 
    movq    ptr@GOTPCREL(%rip), %rax
    movq    (%rax), %rax
    movq    src@GOTPCREL(%rip), %rdx
    movl    400(%rdx), %edx
    movl    %edx, (%rax)
    ...

    ; ldst[0] = lsrc[0];
    movl    lsrc(%rip), %eax
    movl    %eax, ldst(%rip)
    ...