UNPKG

ocaml

Version:

OCaml compiler packaged for esy

191 lines (155 loc) 6.87 kB
# Supported platforms IBM POWER processors, 64-bit, little-endian, ELF ABI v2. (Called `ppc64el` in Debian.) No longer supported: * 32 bits, ELF ABI (Debian's `powerpc`) * 64 bits big-endian, ELF ABI v1 (Debian's `powerpc`) * AIX * MacOS X. # Reference documents * Instruction set architecture: _Power ISA User Instruction Set Architecture_, book 1 of _Power Instruction Set_ (https://openpowerfoundation.org/specifications/isa/) * ELF ABI 64 bits version 2: _OpenPOWER ABI for Linux Supplement for the Power Architecture 64-bit ELF V2 ABI_ (https://openpowerfoundation.org/specifications/64bitelfabi/) * _The PowerPC Compiler Writer's Guide_, Warthman Associates, 1996. (PDF available from various sources on the Web.) # Notes on calling conventions ## The minimal stack frame The stack pointer SP (register 1) is always 16-byte aligned. Every stack frame contains a 32-byte reserved area at SP to SP + 31. * The 64-bit word at SP + 0 is reserved for a back pointer to the previous frame. ocamlopt-generated code does not maintain this back link. * The 32-bit word at SP + 8 is reserved for saving the CR register. ocamlopt-generated code does not use this word. * The 32-bit word at SP + 12 is reserved for future uses. * The 64-bit word at SP + 16 is used by the *callee* to save its return address (the value of the LR register on entry to the callee). So, a function saves its return address in the callee's frame, at offset 16 from the SP on entry to the function. * The 64-bit word at SP + 24 is used by the *caller* or by linker-generated call *veeners* to save the TOC register r2 across calls. A typical function prologue that allocates N bytes of stack and saves LR: ``` addi 1, 1, -N mflr 0 std 0, (N + 16)(1) ; in caller's frame! ``` If the back pointer is to be maintained as well: ``` stdu 1, -N(1) mflr 0 std 0, (N + 16)(1) ; in caller's frame! ``` If the function needs to stack-allocate M extra bytes, e.g. for an exception handler, it should leave the bottom 32 bytes reserved. So, it decrements SP by M and uses bytes SP + 32 ... SP + 32 + M - 1 as the stack-allocated space: ``` addi 1, 1, -32 ; allocate exception handler std 29, 40(1) ; save previous trap ptr at handler + 8 ``` Because the bottom 32 bytes are used by linker-generated call veeners, we need to reserve them both in C stack frames and in OCaml stack frames. To speed up stack switching between OCaml and C, the stack pointers saved in struct stack_info and struct c_stack_link point to the bottom of the stack frame and include the reserved 32 bytes area. ## TOC references and register r2 Register r2 points to a table of contents (TOC) that contains absolute addresses for symbols. Different compilation units can have different TOCs. Hence, register r2 must be properly initialized at the beginning of functions that access the TOC, and properly saved and restored around function calls. The general protocol is as follows: * When a function is called, its address (the address of the first instruction of the function) must be in register r12. * Every function that makes TOC references starts with two instructions that initialize r2 to a fixed offset from r12, offset determined by the linker: ``` 0: addis 2, 12, (.TOC. - 0b)@ha addi 2, 2, (.TOC. - 0b)@l ``` * Register r2 is caller-save. Hence, if the caller makes TOC references, it must save r2 before the call and restore it after. The reserved area of the current frame is typically used to save r2. For example, here is a call to a function pointer found at offset 0 from register r3: ``` std 2, 24(1) ; save r2 ld 12, 0(3) ; address of function in r12 mtctr 12 bctrl ; call the function ld 2, 24(1) ; restore r2 ``` If the caller does not make TOC references, the `std 2` and `ld 2` can be omitted. Likewise for a tail call: ``` ld 12, 0(3) ; address of function in r12 mtctr 12 bctr ; tail call the function ``` For calls to known functions `bl function_name`, a special protocol is implemented by the linker. The object file contains the `bl` instruction followed by a `nop` instruction: ``` bl function_name nop ``` If the linker determines that `function_name` and the caller function share the same TOC, it rewrites the code as ``` bl function_name+8 ; skip the prologue that initializes r2 nop ``` So, the callee does not recompute r2 from r12, it just uses the same r2 value as the caller. For this reason, r12 need not be initialized by the caller. If `function_name` uses a different TOC than the caller function, the linker generates a branch-and-link to a veener function. The veener function saves r2, recomputes r12, and branches to `function_name`. The `nop` after the branch-and-link becomes a reload of the caller's TOC pointer in r2. ``` bl veener_function ld 2, 24(1) ; restore r2 veener_function: std 2, 24(1) ; save caller's r2 <load r12 with absolute address of function_name> mtctr r12 bctr ``` However, this doesn't quite work for tail calls. Consider: - current function f1 calls f2 that has the same TOC; - f2 tailcalls f3 that has a different TOC. Because f1 and f2 have the same TOC, the linker inserted no code in f1 to save and restore r2 around the call to f2. Because f2 tailcalls f3, r2 will not be restored to f2's TOC when f3 returns. So, we're back into f1, with the wrong TOC in r2. To avoid this problem, ocamlopt always reloads r2 after a direct call. One possibility would be to save r2 and reload r2 around the direct call, using the canonical slot in the current frame: ``` std 2, 24(1) bl function_name nop ld 2, 24(1) ``` However, this adds 2 instructions to every direct call. We can avoid generating the store if the current function saves its r2 value somewhere on the stack at the beginning. Then, after each direct or indirect call, we can reload r2 from there. What is "there"? Which location is used for saving r2 at the beginning of the function? A natural choice would be the word at SP + 24, i.e. the TOC-saving word for the current function. However, SP can vary inside the function, e.g. to install and remove exception handlers, so additional r2 saves would need to be generated. The code currently generated by ocamlopt saves its r2 at SP + (N + 8) where N is the size of the stack frame, that is, at the word reserved by the callee for the caller to save the CR register, which ocamlopt-generated code does not use otherwise. This avoids allocating one more word in the current frame just to save r2.