Design Rationale

This page explains the key design decisions in Corosio and the tradeoffs they represent. Understanding these decisions helps users make informed choices about when to use Corosio and how to extend it.

Coroutine-First Design

Corosio is designed from the ground up for coroutines. Unlike frameworks that adapt callback-based operations for coroutines, every I/O operation in Corosio returns an awaitable.

Why Not Callbacks?

Traditional callback-based frameworks like Boost.Asio use templates extensively:

// Callback-based: templates everywhere
template<class Executor, class Handler>
void async_read(basic_socket<Protocol, Executor>& s,
                MutableBufferSequence const& buffers,
                Handler&& handler);

This creates several problems:

N×M template instantiations for N operations × M executor/handler combinations
Binary size growth that can reach megabytes
Compile times measured in minutes for moderate codebases
Nested move-construction overhead at runtime

The Coroutine Alternative

Corosio’s coroutine-first approach provides:

// Coroutine-first: uniform types
capy::task<void> read_data(corosio::socket& s, buffer buf);

This approach provides several benefits:

Clean public interfaces: No templates, no allocators, just task
Hidden platform types: I/O state lives in translation units
Fast compilation: Type erasure at boundaries
ABI stability: Platform-specific types never appear in headers

Affine Awaitable Protocol

The central innovation in Corosio is the affine awaitable protocol, which propagates executor affinity through coroutine chains without embedding executor types in public interfaces.

The Lost Context Problem

Consider this scenario:

capy::task<void> ui_handler()
{
    auto data = co_await fetch();  // Completes on network thread
    update_ui(data);               // Where are we now?
}

When fetch() completes, the coroutine might resume on a different thread than expected. This is the scheduler affinity problem.

The Solution: Forward Propagation

Corosio solves this by passing the dispatcher forward through await_suspend:

template<capy::dispatcher Dispatcher>
auto await_suspend(std::coroutine_handle<> h, Dispatcher const& d)
{
    // Store dispatcher, start I/O
    // When complete, resume via: d(h)
}

The dispatcher flows from caller to callee through await_transform, not through backward queries. When I/O completes, the awaitable resumes the coroutine through the stored dispatcher, guaranteeing it runs on the correct executor.

Type Erasure Strategy

Corosio uses type erasure strategically to balance performance against API simplicity.

Where Type Erasure Happens

Component Erasure Rationale

Component	Erasure	Rationale
Executor at call site	None	Full type preserved for inlining
Executor in coroutine chain	`any_executor const*`	Single pointer, no templates
Buffer sequences	`any_bufref`	One implementation, not N×M
Platform I/O state	Preallocated in socket	Hidden from headers entirely

Executor at call site

None

Full type preserved for inlining

Executor in coroutine chain

any_executor const*

Single pointer, no templates

Buffer sequences

any_bufref

One implementation, not N×M

Platform I/O state

Preallocated in socket

Hidden from headers entirely

The Encapsulation Tradeoff

We pay one pointer indirection per I/O operation for translation unit hiding. This addresses the template tax while keeping overhead negligible compared to actual I/O latency:

Network RTT: 100,000+ ns
Disk access: 10,000+ ns
Dispatch overhead: 4–60 ns (depth dependent)

The indirection cost (~1-2 ns) is invisible in I/O-bound workloads.

Platform I/O Hiding

Platform-specific types (OVERLAPPED, io_uring_sqe, file descriptors) do not appear in public headers.

How It Works

Each socket preallocates its operation state:

struct socket
{
    struct state : work
    {
        any_coro h_;
        any_executor const* ex_;
        // OVERLAPPED, HANDLE, etc. — hidden here
    };

    std::unique_ptr<state> op_;  // Allocated once
};

The state structure:

Inherits from work, enabling intrusive queuing
Stores coroutine handle and executor reference for completion
Contains platform-specific members invisible to callers
Is allocated once at socket construction, not per-operation

Comparison with Frame Embedding

An alternative approach embeds operation state in the coroutine frame:

// Hypothetical: types exposed, state in frame
template<class Socket>
task<size_t> async_read(Socket& s, buffer buf) {
    typename Socket::read_op op{s, buf};  // In frame
    co_await op;
}

This eliminates indirection but exposes platform types in headers. We chose encapsulation for the following reasons:

ABI stability across library versions
Fast compilation with minimal header parsing
Single implementation per operation (not N×M)
Clean refactoring by changing one translation unit

Executor Model

Corosio uses the term executor rather than scheduler deliberately.

Why Not Scheduler?

In std::execution, schedulers are designed for heterogeneous computing: selecting GPU vs CPU algorithms, managing completion domains, dispatching to hardware accelerators.

Networking has different needs:

Strand serialization for ordering guarantees
I/O completion contexts (IOCP, epoll, io_uring)
Thread affinity to ensure handlers run on correct threads

By using "executor" we signal this is a distinct concept tailored to networking’s requirements.

Executor Operations

An executor supports three operations:

Operation Behavior

Operation	Behavior
`dispatch(h)`	Run inline if allowed, else queue. Use when crossing context boundaries.
`post(h)`	Always queue. Use when guaranteed asynchrony is required.
`defer(h)`	Queue as continuation (optimization hint).

dispatch(h)

Run inline if allowed, else queue. Use when crossing context boundaries.

post(h)

Always queue. Use when guaranteed asynchrony is required.

defer(h)

Queue as continuation (optimization hint).

For coroutines, symmetric transfer is preferred when caller and callee share the same executor. The compiler generates tail calls between frames with zero executor involvement.

Allocation Strategy

With proper recycling, both callbacks and coroutines achieve zero steady-state allocations.

Frame Pooling

Coroutine frames are pooled per-I/O-object:

// In socket's frame allocator
void* allocate(std::size_t n)
{
    // 1. Try thread-local free list
    // 2. Try global pool (mutex-protected)
    // 3. Fall back to heap
}

After the first iteration, frames are recycled without syscalls.

Why Per-Object Pools?

Locality: Operations on the same socket produce similar frame sizes
Lifetime: Socket outlives its operations; allocator follows naturally
Cache efficiency: Thread-local lists avoid cross-thread coordination

Comparison with std::execution

Corosio diverges significantly from std::execution (P2300).

Different Design Drivers

Aspect std::execution Corosio

Aspect	std::execution	Corosio
Primary use case	GPU/parallel algorithms	Networking/I/O
Context flow	Backward queries	Forward propagation
Algorithm customization	Domain transforms	Not needed (one impl per platform)
Type exposure	`connect_result_t` in APIs	Hidden in translation units

Primary use case

GPU/parallel algorithms

Networking/I/O

Context flow

Backward queries

Forward propagation

Algorithm customization

Domain transforms

Not needed (one impl per platform)

Type exposure

connect_result_t in APIs

Hidden in translation units

What We Need That They Don’t Have

Corosio requires features absent from std::execution:

Strand serialization
Platform I/O integration (IOCP, io_uring)
Buffer lifetime management

What They Have That We Don’t Need

std::execution provides features unnecessary for networking:

Domain-based algorithm dispatch
Completion domain queries
Sender transforms

When to Use Corosio

Corosio is well-suited for projects where:

Coroutines are the primary programming model
Public APIs must hide implementation details
Compile time and binary size matter
ABI stability is required across library boundaries
Clean, simple interfaces are prioritized

Consider alternatives for projects where:

You need callback-based APIs for C compatibility
Maximum performance with zero abstraction is required
You’re already invested in the Asio ecosystem

References

Edit this Page