Foundations
- Constant items can be completly computed at compile time, and any code that refers to them is replaced with the constant’s computed value during compilation.
Types
Alignment
- All values, no matter their type, must start at a byte boundary. WE say that all values must be at least byte-aligned.
- In the CPU and the memory system, memory is often accessed in blocks larger than a single byte. For example, in a 64-bit CPU, most values are accessed in chunks of 8 bytes (64 bits) with each operation starting at an 8-byte-aligned address. This is referred to as the CPU’s word size.
- Operations on data that is not aligned are referred to as misaligned accesses and can lead to poor performance and bad concurrency problems. For this reason, many CPU operations require, or strongly prefer, that their arguments are naturally aligned. A naturally aligned value is one hose alignment matches its size. So, for example, for an 8-byte load, the provided address would need to be 8-byte-aligned.
- Build-in values are usually aligned to their size, a u8 is a byte aligned, a u16 is 2-byte-aligned, a u32 is 4 byte aligned and u64 is 8 byte-aligned. Complex types - types that contain other types - are typically assigned the largest alignment of any type they contain. For example, a type that contains a u8 and a u32 will be 4-byte aligned because of the u32.
Wide Pointers
- For example slices are two-word pointers, one for the location of the data in memory and the second describes its size.
The orphan rule
- You can only implement a trait for a type that is declared on your crate or a trait that is declared on your crate.
Designing Interfaces
Traits
Auto-traits
- Auto-traits are added by the compiler automatically (such as Sync and Send). If a type becomes !Sync it will also be automatically set. So if you are building a library you may have breaking changes for your users without you noticing.
Object-safe (&dyn Trait)
- If the trait is object-safe, users can treat different types that implement your trait as a single common type using dyn Trait.
- To be object-safe, none of a trait’s methods can be generic or use the Self type. Furthermore, the trait cannot have any static methods.
Common traits for types
- Debug, Clone, Default, PartialEq, PartialOrd, Hash, Eq and Ord.
- As a feature: Serde::{Serialize, Deserialize}
- Errors should implement: std::error::Error
Wrapper types
- Deref allows some T to call methods on some type U by calling them directly on the T-typed. Example Box
allow us to call methods of MyType directly. - AsRef which allows users to easily use &WrapperType as an &InnerType.
- Borrow is tailored for a much narrower use case: allowing the caller to supply any one of the multiple essentially identical variants of the same type. It coulde, perhaps have been called Equivalent instead. For example, for a HashSet
Borrow allows the caller to supply either a &str or a &String. While the same could have been archived with AsRef, that would not be safe without Borrowś additional requirement that the target type implements Hash, Eq and Ord exactly the same way.
Borrowed vs Owned
- If the code you write needs ownershpit of the data, it should generall also make the caller provide owned data, rather than taking values by reference and cloning them.
Fallible and Blocking Desctructors
- See page 46 on how to create an explicit desctructors (Since trait Drop eat any error and some users may want to be more careful).
Type system
- We can use the Type system to guide users, if some methods are not suppose to be available we could use an zero sized type + generic types for that:
struct Grounded;
struct Launched;
struct Rocket<Stage = Grounded> {
sage: std::marker::PhantomData<Stage>
// using phantomdata to guarantee that it is elimitend at compile time
}
impl Default for Rocket<Grounded> {}
impl Rocket<Grounded> {
pub fn launch(self)-> Rocket<Launched> {}
}
impl Rocket<Launched> {
pub fn accelerate(&mut self) { }
}
// everything that is generic and should be always available
// goes here
impl<Stage> for Rocket<Stage> {
pub fn color(&self) -> Color {}
}
// and so on...
Tests
Fuzzers
- Fuzzers keep trying your code with random input until it panic. A good library for this is cargo-fuzz. Example:
libfuzzer_sys::fuz_target!(|data: &[u8]| {
if let Ok(s) = std::str::from_utf8(data) {
let _ = url::Url::parse(s);
}
})
- The fuzzer will generate semi-random inputs to the clorsure. NOtice that the code here doesn check wheter the parsing succceeds or fails. Insterad itś looking for cases where the parser panics or otherwise crashes due to internal invariants that are violeted.
Property-based testing
- A good library for it is proptest.
- For times where you want to check not only that your program doesn crash byt also it does what it’s expected to do.
- It’s the same idea of the Fuzzer, but normally you also give the input to a naive algorithm that you know is correct. And compare the result with your production code, if they diverge you have a bug.
Macros
Declarative Macros
- Declarative macros are those defined using the macro_rules! syntax. They are useful when you find yourself writing the same code over and over.
Procedural Macros
- You define how to generate code given some input tokens rather than just writing what code gets generated.
- There are three types of Procedural Macros: (1) function like macros (e.g. println!); (2) Attribute macros, like #[test] and (3) derive macros, like #[derive(Serialize)].
Asynchronous Programming
- In Rust, an async interface is a method that returns a
Poll
.
enum Poll<T> {
Ready(T),
Pending,
}
Pin
- It is sometimes useful to have objects that are guaranteed not to move, in the sense that their placement in memory does not change, and can thus be relied upon. A prime example of such a scenario would be building self-referential structs, as moving an object with pointers to itself will invalidate them, which could cause undefined behavior. doc
Wakers
- A Waker is a handle for waking up a task by notifying its executor that it is ready to be run. doc
Blocking in async code
- In general, you should be very careful with executing compute-intensive operations or calling functions that could block in an asynchronous context. Such operations should either be converted to async where possible or executed on dedicated threads.
- A rule of thumb is: no future should be able to run for more than 1 ms without returning Poll::Pending
Unsafe Rust
Terms
- Invariants: something that must be true for your program to be correct
unsafe fn
signals to users that they must hold the invariants (and be careful) while invoking the functionunsafe {}
allows users to perform unsafe operationsunsafe fn
allow users to perform unsafe operations in the whole function, but this was a mistake and it will change.- Unsafe Traits are not unsafe to use, but unsafe to implement.
Interfaces
- MaybeUnit
is a mechanism for working with values that aren’t valid.
Concurrency (And Parallelism)
Tips / Terms
- Mutual Exclusion is the most obvious barrier to parallel speedup.
- Shared Resource Exhaustion: the kernel can handle only so many sends on a given TCP socket per second, the memory bus can do only so many reads at once and so on.
- False Sharing occurs when two operations that shouldn contend with one another contend anyway, preventing efficient simultaneous execution.
- Worker Pool model: many identical threads receive jobs from a shared job queue, which they can execute entirely independently.
- Work Stealing is a key feature of most worker pools. The basic premise is that if one thread finishes its work early, and there’s no more unassigned work available, that thread can steal jobs that have already been assigned to a different worker thread but haven been started yet.
Ordering
The CPU and compiler can reorder some instructions in order to optimize the result. Unless two instructions depend on each other (e.g. the output of one is used as input of the other),
this generates analogous programs. The issue is: when we have multi-thread we may reason about the result believing that the order of instructions will be followed
which may not be true. In Rust we can configure the reodering of instructions using atomics with the Ordering
enum.
Relaxed
ordering essentially guaranteees nothing about conurrent access to the value beyond the fact that access is atomic.Acquire
/Release
ordering is used (for example) to make sure that any loads and stores a thread runs while it holds a mutual exclusion lock are bisible to any thread that takes the lock after. Acquire is used can be applied only to loads, Release only to stores and AcqRel to operations that both load and store.SeqCst
(Sequentially consistent Ordering) is the strongest memory ordering. It requires not only that each thread sees results consistent with Acquire/Release, but also that all threads see the same ordering as another.
Methods
compare_exchange
you provide the last value you observed for an atomic variable and the new value you want to replace the original value with. It will replace the value only if it’s still the same as it was when you last osbserved it. It’s used in most syncrhonization constructs.
Tests
- https://docs.rs/loom/latest/loom/ - Loom is a tool for testing concurrent programs. At a high level, it runs tests many times, permuting the possible concurrent executions of each test according to what constitutes valid executions under the C11 memory model. It then uses state reduction techniques to avoid combinatorial explosion of the number of possible executions.
FFI
Compiler crash course
Compilers are split into different components and have three high level phases: compilation, code generation and linking.
The first phase deails with type checking, borrow checking, monorphization and other features we asssociate with a given programming language. It generates a low-level representation of the code. THe code generation takes that representation and generates machine code that actually can run in a CPU.
On the third phase the linker using the symbols, link together different binaries compiled by the previous step. It normally uses static linking for Rust genareted code and dynamic linking for FFI.
Code
#[no_mangle]
pub static RS_DEBUG: bool = true;
The no_mangle
attribute ensures that RS_DEBUG retains that name during compilation rather than having the compiler assign
it another symbol name to.
This can be accessed from other languages that ar linked against the binary.