A barebones Linux binary for x86-64
Diving into Rust coming from a C background many newcomers are often badly surprised by the binary size of a very basic Rust binary. Much of this is caused by linking to the standard library. There are already lots of articles on techniques for minimizing the size of your binaries after you've built but I want to discuss an approach for approaching this from the opposite end. Starting from as little as possible, how much do we actually need for a standard binary to 'run'?
Disaster Tourism: Running with a libc
Let's entertain for a moment using a target with a 'platform' implementation. We quickly run into much rubble. As it turns out, the setup does actually quite a lot. When we want to execute a main, which code actually transfers control? Which code will handle the abnormal termination? While the latter was stabilized in the form of a panic handler attribute, running a main function is still complex. Let's just try to do nothing.
#![no_std]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
fn main() {}
error: requires `start` lang_item
error: could not compile `no_std_hi` (bin "no_std_hi") due to previous error
Huh? You see, the platform does not only come with an initialization, each
such C platform also comes with its own mechanism for transferring control.
The contract to tell Rust which function to match to said mechanism is a
language item, #[start]. And since the signature and all ABI concerns depend
on the platform outside of Rust's own control, none of this is remotely stable.
We now have two paths: we can skip Rust's mechanism of transferring control, or
we can use nightly. As it turns out, they lead to similar outcomes on
Linux/libc at least.
For completeness sake, expand to see the nightly 
#[start] variant.
#![no_std]
#![feature(start)]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
#[start]
fn main(_: isize, _: *const*const u8) -> isize { 0 }
#[no_mangle] extern "C" fn __libc_start_main() -> ! {
    main(0, core::ptr::null());
    loop {}
}
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
#[no_mangle]
fn main() { }
error: linking with `cc` failed: exit status: 1
  |
  = note: LC_ALL="C" PATH="/r/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/r/.opam/5.0.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/opt/cuda/bin:/opt/cuda/nsight_compute:/opt/cuda/nsight_systems/bin:/usr/lib/emscripten:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/rustup/bin:/r/.cargo/bin:/r/intel/bin:/r/.local/bin:/r/bin:/r/build/x10-2.5.4/x10.dist/bin:/r/tree/usr/bin:/r/tree/bin:/r/tree/arm-linux-musleabihf/bin:/r/.cargo/bin:/r/.local/bin:/r/.local/bin" VSLANG="1033" "cc" "-m64" "/tmp/rustcAEfotu/symbols.o" "/tmp/no_std_hi/target/release/deps/no_std_hi-7ac8a610109fff56.no_std_hi.59b3cadcf3826e40-cgu.0.rcgu.o" "-Wl,--as-needed" "-L" "/tmp/no_std_hi/target/release/deps" "-L" "/r/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/r/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-0577018320f99037.rlib" "/r/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-193cf992125ccd4c.rlib" "/r/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-8e138eaf26ebb4a8.rlib" "-Wl,-Bdynamic" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/r/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/tmp/no_std_hi/target/release/deps/no_std_hi-7ac8a610109fff56" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs" "-fuse-ld=lld"
  = note: ld.lld: error: undefined symbol: __libc_start_main
          >>> referenced by /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../lib/Scrt1.o:(_start)
          collect2: error: ld returned 1 exit status
This issues results as part of control transfer. The exact sequence allows
libc to provide an extension point for 'standard library' setup. Let's supply
this ourselves. This is quite a lot of trial and error and you should not trust
me in the slightest for this being stable. But the function is intended to
end in a call to main with the standard argc/argv arguments, and never
return. Since we control both sides, we can cheat a little and skip the arguments:
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
#[no_mangle]
fn main() { }
#[no_mangle] extern "C" fn __libc_start_main() -> ! {
    main();
    loop {}
}
Running a 'minimal' GNU libc target
The last code fragment from the prior section actually finally compiles and
loops infinitely just fine. Let's break out strace and take a look under the
hood, what's actually involved in setting up the C environment implied by our
target?
execve("./target/release/no_std_hi", ["./target/release/no_std_hi"], 0x7ffd703dad40 /* 74 vars */) = 0
brk(NULL)                               = 0x64c4a81de000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd9b4196d0) = -1 EINVAL (Das Argument ist ungültig)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (Datei oder Verzeichnis nicht gefunden)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x72d0a8aa1000
arch_prctl(ARCH_SET_FS, 0x72d0a8aa1a80) = 0
set_tid_address(0x72d0a8aa1d50)         = 101321
set_robust_list(0x72d0a8aa1d60, 24)     = 0
rseq(0x72d0a8aa23a0, 0x20, 0, 0x53053053) = 0
mprotect(0x64c4a6257000, 4096, PROT_READ) = 0
Woah. Scary stuff. Line-by-line:
- execvc: is part of strace switching into the actual binary.
- brk: is used to discover the data segment address.
- arch_prctl: just tries to initialize Control-Flow Enforcement Technology, and fails. I can't recall asking rustc or libc to do this.
- accessis part of glibc's convoluted and massive system for dynamically overriding symbols that would have been dynamically linked. No one asked for this, either, and we shouldn't need to overwrite a symbol.
- mmapgrabs some memory pages from the operating system to use for thread-local storage. We' not going to use it.
- arch_prctl,- set_tid_address,- set_robust_listall inform the kernel how to interact with our thread-local memory. That is, where to put and read structures that differ for each thread. Note these all point into the dynamically memory.
- rseq, restartable sequence, is a sadly niche feature where a Linux thread can run a section of instructions without having been interrupted by the scheduler (or rather, re-entering at a different program address if the thread has been pre-empted while inside the block). The call configures one global structure for the whole thread to interact with the kernel, i.e. tells the kernel how to verify whether such a critical section is running. Note how glibc has taken the liberty to deny this choice from us while also being pointless. We're not using rseq.
- mprotectsets up a 'guard page', to diagnose stack overflows instead of having them be undefined behavior. Our program has bounded stack usage.
Running without libc
The next logical dependency to avoid after #![no_std] is clearly the link to
libc. Not only is it costly to link, even if we consider dynamic linking
allowable, but it is not fully transparent to rustc to do so. And as we've
seen, even the initialization has costs. There's a target which does not have
these costs: x86_64-unknown-none. Just drop this in .cargo/config.toml and
install the target via rustup.
# Requires: rustup target install x86_64-unknown-none
[build]
target = "x86_64-unknown-none"
Helpfully, the elf standard specifies an entrypoint we can immediately
utilize without having to involve Linker Scripts at all: _start. Since we're
not initializing any platform control transfer, we now need to use
#![no_main]. Note how main refers only to the control transfer part, we still
have our 'platform's' semantics of an initial entrypoint. We just get control.
#![no_std]
#![no_main]
#[no_mangle]
pub extern "C" fn _start() -> ! {
    loop {}
}
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
And that comes to: 640 bytes after a strip (which you can even configure in
the Cargo profile).
Doing a hello world
We've broken the chains that bind us, cast away the cords of libc linkage, let's start doing useful stuff. First on the list is interacting with the operating system. For this we need to break out our assembly. The interaction with the kernel occurs by moving values into the right registers, yield control with the right instruction, and resuming control back with a different process state.
How exactly we do the assembly interaction is completely up to us here. An
inline asm! macro, linking another object, or blindly re-interpreting a byte
array of instructions as if it was a function symbol. (The later is my favorite
bit of magic, but pick your poison). The Linux ABI is very similar to the
System V C ABI which affords us a great deal of leniency if we declare our
function properly. Most interestingly, the system call identifier—a simple
register sized integer—is not equivalent to the first argument despite libc
defining its signature this way. This means, our Rust function should end up
more efficient than C once we've convinced the compiler to inline the setup to
our syscall primitive.
pub unsafe extern "C" fn call3(_: isize, _: isize, _: isize, NR: SysNr) -> isize {
    // Move fourth argument to %rax, scratch register
    "mov %rax,%rcx";
    // Other arguments already set up by the SysV ABI
    "syscall";
    // Return value delivered by the kernel in %rax,%rdx already.
    "ret";
}
The system calls for 0 through 2 arguments are similar, with a different
register for the call number argument in the System V ABI. At 4 upwards we need
to reshuffle a few registers due to %rcx not being used by the kernel. Now we
just need to find the magic system call numbers and cast all our arguments to
register sized values:
#![no_std]
#![no_main]
fn exit() -> ! {
    unsafe {
        syscall_linux_raw::syscall1(syscall_linux_raw::SysNr(60), 0);
        core::hint::unreachable_unchecked();
    }
}
fn write(fd: usize, buf: &[u8]) {
    unsafe {
        syscall_linux_raw::syscall3(syscall_linux_raw::SysNr(1), fd as isize, buf.as_ptr() as isize, buf.len() as isize);
    }
}
#[no_mangle]
pub extern "C" fn _start() -> ! {
    write(1, "Hello, world!\n".as_bytes());
    exit();
}
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
Segmentation fault
One final push
WAT! But rust works if it compiles, what is going on? We forget something that
the platform does provide, after all. Inspecting the failure with gdb reveals
we're actually segfault'ing on the call instruction to syscall3. And we're
segfaulting at an address 0x0? It's as if the symbol is just missing. A call
to objdump reveals the answer here. If we ask it to disassemble for us:
$ objdump -dx target/x86_64-unknown-none/release/examples/simple
0000000000001260 <_ZN6simple5write17ha62c0a9d479e57fcE>:
    1260:	48 8d 35 d9 ef ff ff 	lea    -0x1027(%rip),%rsi        # 240 <_ZN6simple4exit17h5340d43b981301c5E-0x1010>
    1267:	bf 01 00 00 00       	mov    $0x1,%edi
    126c:	ba 0e 00 00 00       	mov    $0xe,%edx
    1271:	b9 01 00 00 00       	mov    $0x1,%ecx
    1276:	ff 25 5c 11 00 00    	jmp    *0x115c(%rip)        # 23d8 <_DYNAMIC+0xe8>
The last line, # 23d8 <_DYNAMIC+0xe8>, is a marker inserted by objdump to
inform us of a relocation. A relocation is an entry in the ELF headers which
reference an address that was not fully determined by the linker. Instead, the
location of the relocation contains only an offset relative to something else
(there are actually several formats for this offset) which are supposed to be
replaced with the final symbol location at a later point. This is done so
that the file and memory layout can be determined for subsections of a binary
file. Another use case is to make code position independent, that is to make
the assembled section independent of its base address in memory which allows
the section to be reshuffled and allocated at runtime. For instance, it may be
necessary to accommodate other sections not known ahead of time as in
dynamically loaded code.
We don't want to fix ourselves at runtime! We did static linking so that we do
not deal with this problem, damn it. Well, let's tell rustc about this
explicitly. For this we add one final entry to .cargo/config.toml.
[build]
target = "x86_64-unknown-none"
rustflags = ["-C", "relocation-model=static"]
And drum roll for 888 bytes of binary:
Hello, world!
execve("../target/x86_64-unknown-none/release/examples/simple", ["../target/x86_64-unknown-none/re"...], 0x7ffd369ba040 /* 74 vars */) = 0
write(1, "Hello, world!\n", 14Hello, world!
)         = 14
exit(0)                                 = ?