October 24, 2021

Pointer Fu: An adventure in the Tokio code base

In an effort to understand the internals of asynchronous runtimes, I’ve been spending time reading Tokio’s source code. I’ve still got a long way to go but it has been a great journey so far.

Raw pointers are used all over Tokio. In one particular instance, the way they used them really blew my mind 🤯. So here I am, writing about it. Buckle in 💺.

Setting the scene

We have a type Cell that is defined as (taken directly from Tokio source)

#[repr(C)]
pub(super) struct Cell<T: Future, S> {
    /// Hot task state data
    pub(super) header: Header,

    /// Either the future or output, depending on the execution stage.
    pub(super) core: Core<T, S>,

    /// Cold data
    pub(super) trailer: Trailer,
}

It’s used when initialising a struct RawTask

struct RawTask {
  ptr: NonNull<Header>
}

// This is pseudocode
impl RawTask {
    fn new() -> RawTask {
        let ptr = Cell::new(); // returns a raw pointer to a Cell

        // Cast pointer to one that points to a Header
        let header_ptr = ptr as *mut Header;
        let ptr = NonNull::new(header_ptr); 
        RawTask { ptr }
    }
}

What is so interesting?

First, a necessary detour! Rust can represent structs in memory in multiple ways. It’s covered in detail in the The Rust Reference. By default, Rust offers no guarantee on the memory layout of your struct; it is free to modify the layout however it wants. From a developer perspective, it means you cannot write any code that makes assumptions on the memory layout of your struct. To change the representation of the struct, you can use the repr attribute (as shown in the above definition of Cell).

Now, here’s where it gets good! I omitted the comment for the Cell struct, which is

/// The task cell. Contains the components of the task.
///
/// It is critical for `Header` to be the first field as the task structure will
/// be referenced by both *mut Cell and *mut Header.

Hopefully some alarm bells started ringing in your head 🚨. Header has to be the first field in the struct, so that you can dereference a pointer to Cell into either a Cell or a Header (if this doesn’t make sense to you, check out the addendum). However, Rust’s default representation provides no guarantee that Header will remain the first struct field! Instead, as you can see, they’ve changed the representation to the C layout. In C, struct fields are stored in the order they are declared. This gives us the guarantee we need. Now we can go around dereferencing the pointer to Header without any worry!

let cell = Cell::new();
let cell_ptr = &cell as *mut Cell;
let header_ptr = cell_ptr as *mut Header;

let header = unsafe { *header_ptr };

Addendum

View in Rust playground

#[derive(Debug, Clone, Copy)]
struct Header {
    x: u32
}

#[derive(Debug, Clone, Copy)]
struct Core {
    y: u8
}

#[derive(Debug, Clone, Copy)]
#[repr(C)]
struct Task {
    header: Header,
    core: Core
}

fn main() {
    let header = Header { x: 256 };
    let core = Core { y: 10 };
    let task = Task { header, core };
    
    // Create raw pointer to task
    let task_ptr = &task as *const Task;
    // Cast pointer to point to Header
    let header_ptr = task_ptr as *const Header;
    
    // Dereference header_ptr
    // We expect this to give us our Header struct as defined above.
    // Because Header is the first field and the pointer is pointing
    // to the beginning of the Task struct, it is valid to perform
    // this dereference
    let header_from_ptr_deref = unsafe { *header_ptr };
    
    // Dereference task_ptr
    // We expect this to give us our Task struct as defined above
    let task_from_ptr_deref = unsafe { *task_ptr };
    
    println!("Header from deref: {:#?}", header_from_ptr_deref);
    println!("Task from deref: {:#?}", task_from_ptr_deref);
    
    // Bonus: we can also cast task_ptr into a pointer to Core
    // However, because its struct member is a u8 and Header's member
    // is storing 256, which u8 cannot represent, dereferencing
    // core_ptr will produce a corrupted Core struct
    let core_ptr = task_ptr as *const Core;
    let core_from_ptr_deref = unsafe { *core_ptr };
    println!("Core from deref: {:#?}", core_from_ptr_deref);
}

The output of running this is shown below. Note that Core we obtained from dereferencing has y=0 instead of y=10.

Header from deref: Header {
    x: 256,
}

Task from deref: Task {
    header: Header {
        x: 256,
    },
    core: Core {
        y: 10,
    },
}

Core from deref: Core {
    y: 0,
}

Shoutouts to Ana for reviewing this post.

Powered by Hugo & Kiss.