Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Writing a Rust OS for a Single Board Computer

Hello and welcome to this tutorial on writing a simple operating system kernel for single board computers (SBCs) in Rust!

This tutorial is aimed at everyone who wants to learn about basic principles of operating systems and build one hands-on. It will guide you, step by step, with detailed explanations of what is being implemented and why.

Who is this tutorial for?

This tutorial doesn't assume any prior OS development knowledge. That being said, it is expected that you have a basic understanding of what operating systems are and what is their purpose.

In addition to that, you should have a basic understanding of the Rust programming language. If you are new to Rust, check out The Rust Book before continuing with this tutorial.

As operating systems are developed on bare metal, it will be beneficial (but not required) to first read The Rust Embedded Book - it is a fairly quick read and as a benefit, it will guide you through the setup of basic tools like GDB or QEMU.

In the first two chapters of the tutorial, we will write a few (18) lines of AArch64 assembly. No prior knowledge of assembly language is required and every line will be explained, both in terms what is happening and why.

How to follow this tutorial?

The tutorial consists of 20 chapters in which we will build a simple OS kernel, step by step. In general, you should read along this book, which will explain every line of code we will write, and will often include explanations about the design decisions we are making, as well as (hopefully) helpful tips and explanations.

In addition to the text of the book, you can find the source code of the kernel being built in its own directory. The code will be versioned in git, each chapter having its own corresponding branch.

You can go ahead and clone the repository:

git clone https://github.com/matej-almasi/rust-sbc-os-book.git

After finishing a chapter of the book you can switch to the correspondig branch, compare your code with the code in the branch, then continue reading the next chapter.

Attribution

This book is based on Operating System development tutorials in Rust on the Raspberry Pi by Andre Richter. It is truly a giant on whose shoulders this book stands.

Help

In case you need any help, find a bug, or encounter any other issue, feel free to open an issue or contact me directly via email.

Have fun!

Chapter 1 - Baby Steps

Every adventure starts with a step. Let's start our adventure!

Setting Up the Project

Every Rust adventure starts with cargo new. Go ahead and open your favourite directory (mine is Projects) and open the terminal. Pick a name for your kernel (I decided to go with ferros) and run cargo new <kernel_name>.

No Main, No Standards

Since we are creating an Operating System, and since an Operating System, unlike a typical application doesn't have an Operating System to run on top of ('duh...), we have to give up the comfort of std and even the comfort of the main function.

This is is, among other reasons, mainly because std relies on an operating system for much of its functionality, and/or happily uses "high level" concepts like the heap for dynamic data allocation, which often are not desirable or available in bare metal environments. main, on the other hand, is not avaialble because Rust's main secretly does runtime setup for your program (like making available arguments passed to your application when it is invoked).

To tell rustc we won't be using std or main, we put two declarative macros to our main.rs (which can happily stay named main.rs):

#![allow(unused)]
#![no_main]
#![no_std]
 
fn main() {
// ... rest of main.rs
}

This has the unfortunate effect of losing println! (which is part of std), which immediately causes our code to not compile, since cargo new scaffolded our project with a single println in our main. So, with some sorrow, we delete the call to println!:

#![no_main]
#![no_std]

fn main() {
}

More Trouble Without std

In addition to that, we suffered another loss - the default panic handler (the function invoked to print the panic message) is no longer available too. Fortunately, it is not too difficult to write one ourselves:

#![allow(unused)]
fn main() {
// main.rs
use core::panic::PanicInfo;

// ...

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    unimplemented!()
}  
}

PanicInfo is a data structure contained in the core module (the part of std that is always available, on all targets, including bare metal) that contains some useful info we might want to use later, when our panic handler becomes more mature.

Even More Trouble

Our troubles that started when we dropped std have not yet ended. If you use rust-analyzer (as you should, btw), you will notice that it reports to us an error "can't find crate for test" - this is because the scaffolding needed for cargo test to build and execute any test we might have had written is also dependent on... std. This means we need to tell there will be no tests (at least for now) for our package. We announce this to cargo (and rust-analyzer by adding the following entries to Cargo.toml:

[[bin]]
name = "ferros"
test = false
bench = false

The bench entry is required as benchmark tests rely on the test scaffolding and thus also need to be explicitly turned off.

At this point, running cargo check gives us a feasible build and rust-analyzer reports no more errors.

Setting the Target

At this point, we should decide what kind of Single Board Computer are we actually building our kernel for. Since we are not very decisive and since we would like to defer such a decision until later (say, until a SBC arrives at our doorstep...) and since QEMU is a small miracle, we decide to only make a much smaller decision - what kind of CPU architecture do we want to support?

There is no right or wrong answer, but after some deliberation1 we decide to go on with 64-bit ARM (AArch64). ARM is a particularly good choice because:

  • ARM Assembly is (relatively) simple2
  • ARM boards are readily available in good quality
  • ARM boards have good emulation support in QEMU

All that being said, each chapter will have an appendix containing modifications required to run our code on a RISC-V machine.

With that decision out of the way, we promise ourselves to write code as generic as possible (so we will have the ability to choose a SBC later) and pick some SBC board that we will emulate in QEMU. Since most of you will probably grab a Raspberry Pi and since Raspberry Pi is a very decent choice anyway1 we will use it as the target of QEMU emulation.

Back to Code

Having decided the target architecture, we shall now focus on telling rustc to actually compile our kernel for that architecture. We consult the Cargo Book and learn that we should create a config file .cargo/config.toml:

mkdir .cargo
touch .cargo/config.toml

and configure cargo with the desired target:

[build]
target = [?????]

But what should our target be? Obviously, we want to target bare metal AArch64, but that gives us two options: aarch64-unknown-none and aarch64-unknown-none-softfloat - but which is the one we need?

Floating

The difference between the two target variants comes down to whether or not our kernel assumes the availability of hardware floating point unit (FPU). For the purposes of developing a kernel, we will want to stay away from FP alltogether and thus not make any assumption as to whether an FPU will or will not be available. Therefore, we will pick the -softfloat option, which simply means that any FP operations would be done by software emulation instead of an FPU use.

Thus, our .cargo/config.toml will look like this:

[build]
target = ["aarch64-unknown-none-softfloat"]

But Where To Start

Even though we fixed all of the compiler errors that haunted us so far, running cargo check gives us a somewhat disconcerting warning:

warning: function `main` is never used
 --> src/main.rs:6:4
  |
6 | fn main() {}
  |    ^^^^
  |

As you can remember, we actually told rustc that there will be #[no_main], so our fn main actually is unused - it is not invoked by anything in our program, and cargo doesn't automatically make it an entry point of our program.

On bare metal, it is up to us to manually configure the binary being built with an entry point.

Linkin Time

As a quick refresher, once rustc (and then LLVM, under the hood) does its job compiling our source code into actual instructions for the processor, it ends up with a bunch of "object" (.o) files that it needs to wire up together to form the resulting binary executable (or binary library, if we were building a lib crate).

For this, it calls a special program - the linker, which stitches all the objects together, makes sures all symbols are defined (what that means will be described in a short while) and sets a bunch of crucial metadata for programs that will eventually use the executable (this will be especially important in the next chapter).

To give a brief overview of what linker does; every function (and every global constant...) is labeled by a symbol. A function definition (fn foo() {/*...*/}) defines the symbol, and calls to other functions are actually calls to the symbols that represent them (so, let x = foo(y, z) internally is a call to label foo, which is in this case expected to be a function).

Usually, the configuration rustc passes to the linker and the linker's default settings are more than enough to create a viable binary without any input from us, the developers. That being said, the linker offers us a way to configure its behavior, in case we need such control.

The way to configure the linker for linking of a specific binary is through a linker script - a simple file that, among other things, allows as to tell the linker where in the resulting binary to place different parts of the program and where is the ENTRY of its execution.

Let's write one ourselves, and let's try to tell the linker that main is the symbol that denotes the entry of our program.

touch kernel.ld

The name of the linker script doesn't matter much, but it is customary to give it a .ld extension and name it sensibly, thus the name kernel.ld.

We place the following line inside the script:

/* kernel.ld */
ENTRY(main)

ENTRY is a keyword that does what it sounds like - tells the linker that symbol main is the ENTRY of our program.

External Help

Now, we didn't quite get rid of the compiler warning, because 1. cargo doesn't really know that there is some linker script (and so doesn't rustc) and 2. even if cargo knew we have this linker script in place, cargo can't really read it and understand that main defined in main.rs is now "used" by the linker.

We will fix problem 2. first. We can't quite teach cargo to understand the linker script, but we can tell cargo that something outside our crate will use fn main, ridding us of the warning we encountered above. We achieve this by adding pub extern "C" before declaring fn main:

// main.rs
// ...

pub extern "C" fn main() {}

// ...

extern here means main should be a symbol available to external users - in this case that means us when we write our linker script. The "C" part tells the compiler that main shall adher to C language calling conventions. It is not very important for us right now, and we could have used a different calling convention if we desired so (we could happily use "Rust", for example).

No Mangle

There is just one more thing we need to take care of - name mangling. By default, rustc "mangles" (adds lots of not very readable characters to) the names of our functions, which is true for main as well. To disable this for our main function (so that the linker will be able to find symbol main when it looks for the ENTRY(main) we defined above), we need to put #[unsafe(no_mangle)] attribute to our main:

// main.rs
// ...

#[unsafe(no_mangle)]
pub extern "C" fn main() {}  

// ...

Building

We turn our attention to problem 1. mentioned above – how do we tell cargo to use our linker script? One good way we can achieve this is to create a build script (for some time, the last script we are making, I promise) named build.rs.

In the root of our project:

touch build.rs

cargo automatically picks up a build.rs file, provided it exists in the place as Cargo.toml and executes before building a crate using cargo build. There are many great uses for the build script, but for now, we will suffice with writing the following lines in the script:

// build.rs

fn main() {
    println!("cargo:rustc-link-search={}", env!("CARGO_MANIFEST_DIR"));
    println!("cargo:rustc-link-arg=--script=kernel.ld");
}

The two lines inside the scripts main are read by cargo, which in turn is told to pass link-search and link-arg as parameters to rustc when it is invoked to compile our kernel. link-search={CARGO_MANIFEST_DIR} tells rustc to tell the linker to look for a linker script in the directory where Cargo.toml lives (as we created it there) and link-arg=--script=kernel.ld tells rustc to tell the linker that it should use kernel.ld as its linker script.

There is one small issue with build.rs as it stands however. When we build a Rust crate and call cargo build without any changes to Cargo.toml or our actual source code, cargo is smart enough to skip the entire build process, knowing there is nothing that could affect the resulting binary, which has previously been built.

We would like to tell cargo to treat changes to build.rs and kernel.ld as changes that affect the resulting binary (i.e. to treat them as it treats Cargo.toml or *.rs files in src). This is possible by adding the following lines to build.rs:

// build.rs

fn main() {
    println!("cargo::rerun-if-changed=build.rs");
    println!("cargo::rerun-if-changed=kernel.ld");

    // ...
}

If you have run cargo build before adding those two lines, make sure to run cargo clean before your next call to cargo, as cargo wouldn't know that it should rerun build.rs when build.rs changes until build.rs with the lines above runs for the first time.

Waiting for Events

Now that we have most of the build infrastructure ready, we can proceed and actually implement some code! For starters, we should implements something really small, just to make sure that our code actually is executed when we will eventually run our kernel. For this, we will implement a parking loop - the processor will wait for events (what events are doesn't matter right now) and when an event occurs, it will loop back to waiting again.

To get our kernel up and running, we will have to pull up our sleeves and write a few lines of 64 bit ARM assembly. Fortunately, we can write inline assembly in .rs files and ARM assembly is not too complicated (at least, not for simple purposes like ours).

There are a few ways we can write inline assembly in Rust. Right now, we want to make use of Rust naked functions - functions that consist only of inline assembly and for which rustc doesn't automatically generate function prologues and epilogues (small bits of assembly boilerplate at the beginning and end of a function that do some setup and teardown for the function) as we are actually going to implement this setup and teardown ourselves (in the next chapter - in fact, this setup will be the sole objective of the next chapter).

We create a naked function by adding a #[unsafe(naked)] attribute before the function declaration (unsafe is there precisely because it is up to us to do the setup and teardown properly - any mistake could corrupt program state and crash it) and including a single core::arch::naked_asm!() call in the fn:

// main.rs
// ...

#[unsafe(naked)]
pub extern "C" fn main() {
    core::arch::naked_asm!("");
}

// ...

To implement the parking loop itself, we write the following lines of assembly:

// main.rs
// ...

#[unsafe(naked)]
pub extern "C" fn main() {
    core::arch::naked_asm!(
        "1:",
        "   wfe",
        "   b 1b"
    );
}

// ...

The lines of assembly above do the following:

1: - declares a label (symbol) that we can reference from other assembly code by its number (in this case, the number is 1) wfe - is an instruction to wait for events, as mentioned above b 1b - is a branc instruction - an instruction to jump back to instruction labeled with 1, (if we wanted to jump to a hypothetical instruction labeled by 1 in the foraward direction, we would use b 1f)

As you can see, this is indeed an (infinite) wait-for-event loop.

Running the Kernel

With our parking loop in place, it is finally time to run our code. Since we don't have too much functionality yet, we will make do with emulating the Raspberry Pi with QEMU. If you haven't done so yet, now is the time to install qemu-system-aarch64 which is capable of emulating the whole RPi device.

We first build our kernel using cargo build. Then, we invoke qemu like this (don't forget to change the name of your project!):

qemu-system-aarch64 -machine raspi4b \
                    -d in_asm        \
                    -display none    \
                    -kernel target/aarch64-unknown-none-softfloat/debug/ferros

This tells qemu to emulate Raspberry Pi 4B, print out executed ARM assembly, don't use any display output (as our kernel doesn't support any display output...) and use the file target/.../debug/ferros as the kernel for the emulated machine.

You should see output similar to this, 4 times:

----------------
IN: main
0x00210120:  d503205f  wfe
0x00210124:  17ffffff  b        #0x210120

what you see are the four cores, parked, waiting for events.

There is an important note to be made here - the binary produced by cargo is in ELF file format, which is noramlly used for application executed on running on UNIX-like systems. This file format wouldn't normally be executable as a kernel on a real Raspberry Pi - for that we will need to turn it into "pure binary" - strip all the headers and sections with debuginfo, etc. For now, we will happily continue using ELF, until our first attempt to flash and run our kernel on real hardware later in the book.

Since we are going to use qemu like this a lot in this book and the command above is a little annoying to type every time, and since we really like cargo run we know and love from application development, we are going to set up a custom runner that will actually invoke qemu in the correct way. In our .cargo/config.toml:

# .cargo/config.toml
# ...

[target.aarch64-unknown-none-softfloat]
runner = """\
  qemu-system-aarch64 -machine raspi4b \
                      -d in_asm \
                      -display none \
                      -kernel
"""
    

Now, cargo run will actually invoke qemu with our kernel, rebuilding it when necessary.

Congrats!

Congrats reading all the way here. I hope you had fun and learned new things! You can check out and cross-reference the source code we built together in the branch chapter-1.

Now, let's move on and continue on our kernel journey...


  1. Ok, I admit. This book is an adaptation of a tutorial for Raspberry Pi, which is an ARM system. But for now, let's pretend we were making a decision. ↩2

  2. Don't worry, there won't be much assembly written and all 18 lines of it will be properly explained. We need to resort to asm because we have to do some prep work before the first line of Rust code can be executed in our kernel.

Jumping to Rust

In this chapter, we will do necessary setup that prepares the system for calling a normal Rust function. Specifically, we will set up the stack and zero any data that our Rust code expects to start zeroed.

Setting up the Stack

First, we set up the stack. What actually the stack is is out of scope of this book, but it should suffice for now to say that stack is a data structure in memory where functions place data they need to store for a while (by pushing them onto the stack) and which they then retrieve when they need them again (by popping them off the stack).

It is the responsibility of the kernel to set up the stack for use by its own functions (and later on, to set up stacks for programs that will run on the operating system using the kernel). Let's set it up then!

The stack, being a data structure, needs place to "grow" - as data will be pushed onto it, it will increase in size. It is customary (and assumed by rustc when copiling code) that the stack grows "downwards", i.e. from larger addresses to smaller addresses.

A Nice Place for the Stack

Since the stack grows downwards, it would not be a bad idea to place it just below the RAM top, that is at (almost) the highest available RAM address. Most SBCs have relatively small RAMs, and since our kernel will not be too complex, we will assume a 512 MiB large RAM. Bare metal developers usually don't place the stack at the extreme end of RAM (for good reasons) and leave a few unused bytes as a buffer zone (some systems may even use the reserved buffer for some useful data - not important to us in this tutorial). For the purpose of this tutorial, we will go ahead with a buffer 64 kiB large, which in turn gives us an address of 0x1FFF0000 (512 MiB - 64 kiB buffer).

Meet sp, the Stack Pointer

On 64 bit ARM, the position of the stack is stored in a special register, sp - the stack pointer. We would like to set it to point to 0x1FFF0000. To achieve this, we add mov sp, #0x1FFF0000 at the beginnging of our inline assembly.

// main.rs
// ...

#[unsafe(naked)]
#[unsafe(no_mangle)]
pub extern "C" fn main() {
    core::arch::naked_asm!(
        "mov sp, #0x1FFF0000",
        "1:",
        "wfe",
        "b 1b"
    );
}

// ...  

Zero BSS

In general, ELF files have four very important symbol sections:

  • .text - contains executable machine code instructions
  • .data - contains (mutable) global variables that star initialized to a non-zero value
  • .rodata- same as .data but for constants (ro = readonly)
  • .bss - same as .data but for uninitialized or zero-initialized data1

It is customary to set all data in .bss to zero (even for variables that were not explicitly initialized to zero in source code). You could think that the linker or the compiler would handle this automatically, but that is actually not the case. The .bss is special in the sense that no actual data is included in the binary, so there is no data to be initialized to zero by either the compiler or the linker. The reason for not storing data is entirely practical - it would be a collosal waste of space to include a bunch of zeros in the binary.

For these reasons, it is normally up to the loader to zero the .bss section at program startup. But this is only possible if the program to be executed includes headers that tell the loader where in memory .bss will be located - information that is included in the ELF headers but not included in raw binary format, which is the format expected by the RPi firmware. So, even though we have been feeding Qemu our kernel in the ELF format, in the near future when we finally run our kernel on real hardware, there will be no loader that would zero .bss for us before our kernel starts. This means it is up to us to zero the .bss section manually as part of our startup code.

To pass the memory addreses of .bss start and end to our assembly code, we set two more symbols in our linker script and use linker script keywords to obtain the addresses of interest:

ENTRY(start)

SECTIONS {}

bss_start = ADDR(.bss);
bss_end = bss_start + SIZEOF(.bss);

We then write a loop that will implement the following pseudocode:

let x0 = __bss_start

while x0 != __bss_end:
    *x0 = 0
    x0 = x0 + 8 bytes  // 8, because 8 bytes is the size of a pointer on
                       // 64-bit arch.     

Which translates into following assembly:

  ldr  x0, =bss_start
  ldr  x1, =bss_end
1:
  cmp  x0, x1
  b.eq 1f
  str  xzr, [x0], #8
  b    1b
1:
; code after the zero .bss loop

The lines of assembly above do the following:

ldr x0, =bss_start - loads the value of bss_start into reg. x0 ldr x1, =bss_end - loads the value of bss_end into reg. x1 cmp x0, x1 - compares registers x0 and x1 b.eq 1f - if the result of previous comparison is "values equal", branch to label 1 in the forward direction str xzr, [x0], #8 - store the value of register xzr (a utility register, always set to zero) at the memory address pointed to x0, then, increment x0 by 8 bytes - you can think of [x0] as a dereference of x0
b 1b - branch to label 1 in the backward direction

Which we inline in our rust setup code like this:

// main.rs
// ...

#[unsafe(naked)]
#[unsafe(no_mangle)]
pub extern "C" fn main() {
    core::arch::naked_asm!(
        // setup the stack pointer
        "mov sp, #0x1FFF0000",
        // zero the .bss section
        "  ldr  x0, =bss_start",
        "  ldr  x1, =bss_end",
        "1:",
        "  cmp  x0, x1",
        "  b.eq 1f",
        "  str  xzr, [x0], #8",
        "  b    1b",
        // parking loop
        "1:",
        "  wfe",
        "  b 1b"
    );
}

// ...  

Running with cargo run yields this output:

----------------
IN: start
0x00000000:  b27033ff  mov      sp, #0x1fff0000
0x00000004:  58000120  ldr      x0, #0x28
0x00000008:  58000141  ldr      x1, #0x30
0x0000000c:  eb01001f  cmp      x0, x1
0x00000010:  54000060  b.eq     #0x1c

... four times

If you noticed that the ldr operations seem to use "weird addresses" (like #0x28 when setting x0) - don't worry. The linker actually placed the values of symbols we defined in the linker script at those places in memory, so the ldr x0, #0x28 actually loads the value stored at 0x80028, that is, the value of bss_start we set in the linker script.

Four Times

At this point you can notice one significant issue with the code - we actually have a race condition! Qemu starts all four cores with and sets them to execute our kernel, and all four kernels try to zero .bss data stored in RAM. This is because even though each processor has its own set of registers, they all share the same RAM and thus the same memory space.

Right now, the race condition is fairly innocent - all the cores do is they write zeros to the same memory address. But in the future, when our kernel becomes more complex, we would most certainly run into a real race condition that would corrupt the state of our kernel and lead to incorrect behavior or, if we are lucky, a crash. This is not to mention running the kernel on all four cores is not desired at all - we want our kernel to execute on a single core, and perhaps eventually delegete some tasks to other cores, if we desired so.

To fix this problem, we choose a single core for the execution of our kernel. Each core has its own id, (ranging from 0 to 3 for a 4-core system). Since we would like to keep our kernel as portable as possible, let's choose the core 0, since every processor is guaranteed to have at least a single core. Then, we will adjust our setup code to check the core it's executed on, then proceed only if the core id is 0, otherwise jump to the parking loop.

To get the ID of the core, we have to read from a special register MPIDR_EL1 that contains various data about "core affinity" (besides the core's ID, it contains information about higher level groupings of the core, such as the core's cluster in a multi-cluster system, etc.). To read the value of MPIDR_EL1 into some register, we have to use a special instruction mrs, mask out only the bits that contain the core ID (we are not interested in the higher level groupings) and continue only on the core with ID 0, otherwise we jump straight to the parking loop:

mrs	x0, MPIDR_EL1 ; read core affinity data into x0
and x0, x0, 0b11  ; bitwise and: x0 = x0 | 0b11
cmp x0, 0         ; compare x0 with 0
b.ne 2f           ; if not equal, branch to the parking loop, whose label we
                  ; have to change to 2

Which we put at the start of our startup code in main:

// main.rs
// ...

#[unsafe(naked)]
#[unsafe(no_mangle)]
pub extern "C" fn main() {
   core::arch::naked_asm!(
       // check core ID, proceed only on core 0
       "mrs x0, MPIDR_EL1",
       "and x0, x0, 0b11",
       "cmp x0, 0",
       "b.ne 2f",
       // setup the stack pointer
       "mov sp, #0x1FFF0000",
       // zero the .bss section
       "ldr  x0, =bss_start",
       "ldr  x1, =bss_end",
       "1:",
       "cmp  x0, x1",
       "b.eq 2f",
       "str  xzr, [x0], #8",
       "b    1b",
       // parking loop
       "2:",
       "wfe",
       "b 2b"
   );
}
// ...

Having the parking loop at the end is now becoming a little awkward, so let's rearrange it and place it just after the core ID check:

#![allow(unused)]
fn main() {
// main.rs
// ...

#[unsafe(naked)]
#[unsafe(no_mangle)]
pub extern "C" fn start() {
   core::arch::naked_asm!(
       // check core ID, proceed only on core 0
       "mrs x0, MPIDR_EL1",
       "and x0, x0, 0b11",
       "cmp x0, 0",
       "b.eq 2f", // if this is core 1, jump to stack pointer setup
       // otherwise, fall into the infinite parking loop
       "1:",
       "wfe",
       "b 1b",
       // setup the stack pointer
       "2:",
       "mov sp, #0x1FFF0000",
       // zero the .bss section
       "ldr  x0, =bss_start",
       "ldr  x1, =bss_end",
       "1:",
       "cmp  x0, x1",
       "b.eq 1f",
       "str  xzr, [x0], #8",
       "b    1b",
       "1:",
       "nop" // no operation just yet...
   );
}
// ...
}

Running this in Qemu, you will be able to see that the stack and .bss setup code is ran only once - the other cores jump straight to the parking loop.

Jumping to Rust

We now finally have everything in place to jump to our first "normal" Rust function and leave the world of assembly.

Before we do that, we do a quick rename of main to start, so that we can use main as a name for a "normal" Rust function and tuck the startup assembly code into its own module "start":

#![allow(unused)]
fn main() {
// main.rs
#![no_main]
#![no_std]

use core::panic::PanicInfo;

mod start;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
   unimplemented!()
}
}
#![allow(unused)]
fn main() {
// start.rs

#[unsafe(naked)]
#[unsafe(no_mangle)]
pub extern "C" fn start() {    // notice the function name changed to start
    core::arch::naked_asm!(
        // check core ID, proceed only on core 0
        "mrs x0, MPIDR_EL1",
        "and x0, x0, 0b11",
        "cmp x0, 0",
        "b.eq 2f", // if this is core 1, jump to stack pointer setup
        // otherwise, fall into the infinite parking loop
        "1:",
        "wfe",
        "b 1b",
        // setup the stack pointer
        "2:",
        "mov sp, #0x1FFF0000",
        // zero the .bss section
        "ldr  x0, =bss_start",
        "ldr  x1, =bss_end",
        "1:",
        "cmp  x0, x1",
        "b.eq 1f",
        "str  xzr, [x0], #8",
        "b    1b",
        // jump to Rust main!
        "1:",
        "nop" // no operation just yet...
    );
}    
}

We now create a very simple fn main that will immediately panic!:

// main.rs
#![no_main]
#![no_std]

use core::panic::PanicInfo;

mod start;

fn main() -> ! {
   panic!();
}

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
   unimplemented!()
}

...and then jump to it from our start function:

#![allow(unused)]
fn main() {
// start.rs

use super::main;

#[unsafe(naked)]
#[unsafe(no_mangle)]
pub extern "C" fn start() {
    core::arch::naked_asm!(
    
        // ...

        "b.eq 1f",
        "str  xzr, [x0], #8",
        "b    1b",
        "1:",
        "b {}", sym main
    );
}    
}

The sym main is a special argument to the special format! macro Rust uses for formatting inline assembly. It means we pass whatever symbol main is assigned during compilation (remember, rustc magles symbol names by default) to the b instruction in the inline assembly.

Back in the Safe Waters Again

Running this yields following Qemu output (the core check and the parking loop of idle cores are left out for brevity):

----------------
IN: start
0x00000058:  d53800a0  mrs      x0, mpidr_el1
0x0000005c:  92400400  and      x0, x0, #3
0x00000060:  f100001f  cmp      x0, #0
0x00000064:  54000060  b.eq     #0x70

----------------
IN: start
0x00000058:  d53800a0  mrs      x0, mpidr_el1
0x0000005c:  92400400  and      x0, x0, #3
0x00000060:  f100001f  cmp      x0, #0
0x00000064:  54000060  b.eq     #0x70

----------------
IN: start
0x00000068:  d503205f  wfe
0x0000006c:  17ffffff  b        #0x68

----------------
IN: start
0x00000058:  d53800a0  mrs      x0, mpidr_el1
0x0000005c:  92400400  and      x0, x0, #3
0x00000060:  f100001f  cmp      x0, #0
0x00000064:  54000060  b.eq     #0x70

----------------
IN: start
0x00000058:  d53800a0  mrs      x0, mpidr_el1
0x0000005c:  92400400  and      x0, x0, #3
0x00000060:  f100001f  cmp      x0, #0
0x00000064:  54000060  b.eq     #0x70

----------------
IN: start
0x00000068:  d503205f  wfe
0x0000006c:  17ffffff  b        #0x68

----------------
IN: start
0x00000070:  b27033ff  mov      sp, #0x1fff0000
0x00000074:  580000e0  ldr      x0, #0x90
0x00000078:  58000101  ldr      x1, #0x98
0x0000007c:  eb01001f  cmp      x0, x1
0x00000080:  54000060  b.eq     #0x8c

----------------
IN: start
0x0000008c:  1400000a  b        #0xb4

----------------
IN: _ZN6ferros4main17h66b48a6dfdde69deE
0x000000b4:  d503201f  nop
0x000000b8:  10fffac0  adr      x0, #0x10
0x000000bc:  94000001  bl       #0xc0

----------------
IN: _ZN6ferros4main19panic_cold_explicit17hb892d9c16d9d0380E
0x000000c0:  94000009  bl       #0xe4

----------------
IN: _ZN4core9panicking14panic_explicit17h80c39b8a630a2655E
0x000000e4:  d10143ff  sub      sp, sp, #0x50
0x000000e8:  a9047bfd  stp      x29, x30, [sp, #0x40]
0x000000ec:  910103fd  add      x29, sp, #0x40
0x000000f0:  d503201f  nop
0x000000f4:  10fffaa8  adr      x8, #0x48
0x000000f8:  d503201f  nop
0x000000fc:  10002cc9  adr      x9, #0x694
0x00000134:  aa0003e1  mov      x1, x0
0x00000138:  00000148  udf      #0x148

----------------
IN: _ZN4core3fmt9Formatter3pad17hdc1fc7a515466962E
0x00000200:  00000058  udf      #0x58

As an interesting aside, notice the mangled symbols for main and the panic chain - you should be able to visually parse out the original function names from the mangled ones.

Besides that, there are two more important things to notice here:

  • The code jumps from our start to our main, then to a chain of panic hadnlers that eventually loop on themselves (the unimplemented! call in our panic handler is actually wrapping another panic! in itself).
  • The execution ends with a udf = undefined instruction, signalling something went wrong and the core ended up in a situtation it doesn't know what to do about.

The corrupted state at the end of execution is the result of infinite calls to panic! within our current panic handling implementation. We can fix it by changing our panic handler to instead enter an infinite wfe loop, now without direct use of inline assembly:

// main.rs
#![no_main]
#![no_std]

use aarch64_cpu::asm;
use core::panic::PanicInfo;

mod start;

fn main() -> ! {
    panic!();
}

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {
        asm::wfe();
    }
}

Here, we have used an assembly wrapper from the aarch64_cpu, a crate providin low level access to AArch 64 processor functionality.

Now, running the kernel will result with the same chain of functions as before, with the single difference in the eventual execution result - we no longer end up with an udf but instead park the core in an wfe loop.

Finally, we make one last cosmetic change - to keep our main modules neat and clean, we move the panic handler into its own submodule:

// main.rs
#![no_main]
#![no_std]

mod panic_handler;
mod start;

fn main() -> ! {
    panic!();
}
#![allow(unused)]
fn main() {
// panic_handler.rs
use aarch64_cpu::asm;
use core::panic::PanicInfo;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {
        asm::wfe();
    }
}
}

On to debugging!

Congratulations on getting through to the end of chapter 2! Our kernel is properly set up for executing compiled Rust code, and from now on we will mostly move away from assembly and finally start writing some Rust! You can check out and cross-reference the source code we built together in the branch chapter-2.

In the next chapter, we will resurrect println!() which will alow us some primitive form of debugging, and talk about debugging our kernel in general. See you!


  1. .bss stands for "block starting symbol" - a rather historical name...