Fil-C

Memory SafetyC/C++ CompatibilityModern Tooling

InvisiCaps: The Fil-C Capability Model

Fil-C ensures memory safety of all operations in the C and C++ language. The hardest part of C memory safety is pointer safety. Fil-C achieves pointer safety using a capability system for pointers. Specifically, each pointer dynamically tracks what object in memory it should be allowed to access, and using that pointer to access any memory that is not that object is dynamically prohibited.

Achieving memory safety using a pointer capability model means:

In addition to memory safety, Fil-C's other goal is fanatical compatibility. This means that Fil-C's capability model also has to allow most (or ideally all) "safe" uses of C pointers; that is, idioms that are in widespread use and don't lead to exploitation unless they violate the above rules. This means even supporting uses of pointers that the C spec deems to be undefined behavior. And it means preserving sizeof(T*) to be the same as what you'd expect - i.e. 8 bytes on 64-bit platforms.

This article describes how Fil-C's capability mode, called InvisiCaps (Invisible Capapabilities), achieves all of these goals while also providing reasonable performance and memory usage.

How Hard Can It Be?

Designing a capability model that satisfies these requirements is sufficiently hard that it is an area of active research. Fil-C has now gone through multiple capability systems; InvisiCaps are just the latest. To appreciate the difficulty of this journey, let's review the previous models and why they were abandoned:

InvisiCaps are a major improvement over MonoCaps:

InvisiCaps achieve this without sacrificing thread safety of the capability model.

What Are InvisiCaps Most Like?

InvisiCaps can be thought of as a practical-to-implement and totally thread-safe variant of SoftBound. InvisiCaps are heavily inspired by that work. Unlike SoftBound, InvisiCaps don't allow memory safety escapes by racing on pointers, and they allow for making all features of C and C++ safe, including subtle stuff like function pointer usage. The greatest similarity between SoftBound and InvisiCaps is that they both place the capability metadata "outside" of the address space visible to the program.

InvisiCaps can also be thought of as a software implementation of CHERI. Unlike CHERI, InvisiCaps are more compatible (pointers are 64-bit, not 128-bit or 256-bit) and InvisiCaps have a more deterministic use-after-free story.

The Intuition Of InvisiCaps

Let's start with a simple intuition for how InvisiCaps work, without worrying about how a pointer and its capability are stored in memory. Let's just consider:

For our example, let's consider a pointer that points into the middle of an object.

Layout of a Fil-C object and flight pointer

The flight pointer has two parts:

We refer to the two parts of the flight ptr as the lower and the intval for short (though be warned, the Fil-C source code usually refers to the intval as the "ptr").

The lower cannot be modified by the C program; you get a lower when you allocate an object and then any pointer derived from the pointer that came out of the allocator has that lower. The lower is trusted by the Fil-C runtime, since it's used for bounds checks (both directly, for the lower bounds check, and indirectly, for all checks that rely on the object header just below where the lower points).

The intval can be modified by the C program and isn't trusted by the Fil-C runtime at all.

Every memory access involves an untrusted intval, indicating the location that the program wishes to access, and a trusted lower, indicating what addresses this particular pointer can access (and how).

As a bonus feature, the lower can be NULL. Pointers with NULL lower cannot be accessed (the access always traps with an error saying you have a NULL lower).

If pointers could only ever live in local variables and could not be stored to the heap, then that would be the whole story!

InvisiCaps At Rest

When a pointer is stored to the heap, we call this a pointer at rest. Like a pointer in flight, a pointer at rest must know its lower and intval. InvisiCaps for pointers at rest achieve the following goals:

The key insight behind how this is achieved is the InvisiCaps inductive hypothesis:

Every flight pointer's lower points to the top of an object header whose aux word contains a way to get the lowers for all pointers stored to that object's payload.

The aux word of an object that has no pointers is simply NULL. But if an object had any pointers stored into it, an aux allocation is allocated, which has the same size as the object's payload. The aux word then points at the aux allocation. If we ignore atomic pointers for now, all pointers at rest in the object have their lower in the aux allocation and their intval in the object payload.

Layout of a Fil-C object and flight pointer

In this example, let's consider two objects. Object #1 has a pointer at rest in its payload, which points at Object #2. So, its aux word points at an aux alloation that contains that pointer's lower. The pointer's intval is in the object's payload. Note that the flight pointer in our example points at the pointer at rest in object #1.

Hence, all nonatomic pointer loads and stores access both the aux allocation (via the aux word, which is just below where lower points) and the object payload. Stores may cause the aux allocation to be lazily created.

Note that this is approach is very friendly to garbage collection:

Atomic InvisiCaps

But what if a pointer at rest is _Atomic, volatile, std::atomic, or was stored using any other type or mechanism that clang considers to require atomicity?

Layout of a Fil-C object and flight pointer

The aux allocation can contain either lowers or atomic box pointers. The type of entry in the aux allocation is determined by the low bit of that entry (zero indicates lower, one indicates atomic box pointer). Atomic boxes are 16-byte, 16-byte-aligned allocations that contain a flight pointer that we store using 128-bit atomics. Or, if the system does not support double-CAS, we could use 64-bit atomics on the atomic box pointer itself and allocate a new atomic box every time an atomic mutation happens (luckily both X86_64 and ARM64 have 128-bit atomics, so that's not necessary).

Note that when a pointer at rest is atomic, the payload contains a copy of the intval, purely for allowing integer loads to see the intval. However, racing an integer access against a pointer access may result in time travel, but cannot result in the atomic box being corrupted -- it'll just be a logic error.

This approach allows Fil-C to support atomic InvisiCaps when the program wants atomicity, while using a cheaper approach to pointers when atomicity is not requested.

Additional Considerations

The aux word uses only its lower 48 bits for storing the aux allocation pointer; the upper 16 bits are used for flags and additional data. That additional data is helpful for nonstandard object allocations as well as special objects like functions, threads, and other builtin types provided by the Fil-C runtime. Let's consider some examples.

Aligned Allocations

If an object is allocated using posix_memalign or similar API and a sufficiently large alignment is requested, then the the lower minus the object header size points inside an aligned allocation. So, the GC can't work out what the base of the allocation is just simply by subtracting the object header size from the lower. The extra flags in the aux word have enough space to encode the object alignment, which can be used to work out the true base of the allocation.

Memory Mapping

mmap and Sys-V shared memory require special treatment. The GC knows how to manage even those allocations, but special care must be taken with them. The aux word has flags that are set if an object requires such special care.

Function Pointers

Function pointers have an intval that points at the actual function entrypoint, while the lower points to a function capability that:

Function pointer casts are handled dynamically (the Fil-C calling convention dynamically resolves mismatches in types passed by the caller and types expected by the callee).

Threads

Fil-C's internal thread abstraction, called zthread, is used internally by musl's pthread implementation. Because the zthread contains a lot of internal runtime state, it needs to be specially protected. A pointer to a zthread - or to any of the other builtin types - has the following structure:

All builtin functions that take a zthread pointer check that the pointer they are given is really a zthread pointer by looking at the aux word's flags.

Freed Objects

Freeing an object results in deterministic panics when accessing the freed memory. This is simple to achieve:

Conclusion

Fil-C uses a capability model for pointers to ensure memory safety. To ensure maximum compatibility, Fil-C's InvisiCap capability model gives the illusion that pointers are just 64-bit on 64-bit systems and allows flexible reinterpretation of memory type (including violations of the active member union rule) while preserving the soundness of the capability model itself. No matter how evil the program is, it will either get a memory safe outcome (all pointer accesses are in bounds of a capability that the pointer legitimately had) or a Fil-C panic.

Additional reading: