Synchronization Primitives in .NET: User Mode, Kernel Mode, and Thread Affinity

Post 3 of the series: Advanced C# for Your Next Interview

In the previous post, we fixed a race condition with SemaphoreSlim. Knowing which class solves a problem is useful, but a strong senior-level interview answer should go further: why are some synchronization primitives considered lightweight, why are others kernel-backed, and why is lock often described as a hybrid mechanism?

Synchronization primitives coordinate concurrent execution and protect shared resources. Their purpose is to prevent multiple threads from changing data in ways that cause race conditions, broken invariants, or lost updates.

They do not all solve this problem in the same way. The most important difference is not the API, but what happens while execution is waiting for a resource to become available.

Some primitives use inexpensive atomic CPU instructions or brief active waiting. Others rely on operating-system mechanisms such as wait handles, system calls, and the thread scheduler.

To understand the cost of those choices, we need to start with User Mode and Kernel Mode.

User Mode and Kernel Mode

Modern operating systems execute code at different privilege levels.

User Mode is where normal application code runs: C# applications, ASP.NET Core services, background workers, and console programs.

User Mode code cannot directly access kernel memory, hardware, or privileged processor instructions. This isolation is important. If an application crashes because of an unhandled exception, the process normally terminates without taking down the operating system.

Kernel Mode is the privileged environment used by the OS kernel, device drivers, the thread scheduler, and other low-level system components.

When application code needs an operation that cannot be performed directly in User Mode, it asks the operating system to perform it. Examples include creating a thread, reading a file, waiting on an OS synchronization object, or performing low-level network I/O.

For synchronization, this distinction explains the cost of waiting:

Solving the problem inside the process with atomic CPU instructions and runtime-managed state is usually cheaper.
Asking the OS to park a thread, enqueue it as a waiter, and later wake it through the scheduler is more expensive.

This is the basis for the distinction between lightweight user-space approaches and kernel-backed primitives.

Lightweight User-Space Approaches

Common examples include:

Interlocked
SpinLock
SpinWait

These tools do not create an OS wait handle and do not immediately ask the operating system to suspend the current thread.

Interlocked, for example, provides atomic operations for simple state changes such as incrementing a counter, replacing a value, or performing compare-and-swap:

Interlocked.Increment(ref counter);

The operation is atomic. Multiple threads can increment the counter concurrently without losing updates to a read-modify-write race.

SpinLock and SpinWait use a different idea. If a resource is busy, the thread can remain active for a short time and repeatedly check whether it has become available. This is called spinning.

The thread is not suspended immediately. It continues to run on a CPU core while it waits.

That sounds wasteful, and it can be. However, if a lock is held for only a very short time, spinning briefly may cost less than:

Entering the operating system.
Parking the thread.
Switching execution to another thread.
Waking the original thread through the scheduler.
Restoring its execution context.

The tradeoff changes when the wait becomes longer. A spinning thread consumes CPU without doing useful work and competes with threads that could make progress.

Spin-based synchronization is therefore most appropriate for extremely short waits and low-level code. Its central tradeoff is:

Avoiding an expensive transition to OS-managed waiting can come at the cost of actively consuming CPU.

Kernel-Backed Primitives and OS Handles

Another group of synchronization primitives is built on operating-system waiting mechanisms:

Mutex
Semaphore
AutoResetEvent
ManualResetEvent
EventWaitHandle

In .NET, these types derive from WaitHandle and represent OS synchronization objects.

If a thread calls WaitOne() on a Mutex that is already owned, the operating system can suspend the waiting thread until the mutex becomes available. The thread no longer spins and does not consume CPU merely to check the resource.

This is useful when a wait may be long. Suspending a thread is better than wasting processor time indefinitely, but the transition is not free. System calls, scheduler work, context switches, and waking a parked thread cost more than a simple atomic user-space operation.

Kernel-backed primitives also have an important capability: some can coordinate more than one process.

A named Mutex, for example, can enforce that only one process on a machine performs a particular operation. This is useful for single-instance applications or cross-process access to a shared resource.

That capability is rarely needed in ordinary ASP.NET Core code. For synchronization inside one backend process, lighter and more focused tools are usually a better fit.

Why Modern Primitives Are Often Hybrid

The line between lightweight and heavyweight synchronization is not always strict.

Modern .NET primitives commonly optimize for a fast path when no contention exists and move to more expensive waiting only when necessary.

Monitor, normally used through the lock keyword, is a good example:

lock (_sync)
{
    UpdateSharedState();
}

lock is the standard choice for protecting short synchronous critical sections inside a process, but it is not simply a “kernel lock.”

When the lock is free, the runtime can acquire it through a very fast path. Under contention, the runtime may spin briefly and then use more expensive waiting mechanisms if the lock remains unavailable.

That makes Monitor a hybrid mechanism:

It aims to stay lightweight in the uncontended case but can escalate its waiting strategy when threads compete for the lock.

SemaphoreSlim is another important example. It limits concurrency inside a process and supports WaitAsync():

private readonly SemaphoreSlim _gate = new(5, 5);

await _gate.WaitAsync(cancellationToken);
try
{
    await CallExternalApiAsync(cancellationToken);
}
finally
{
    _gate.Release();
}

Here, no more than five operations can call the external API at the same time. An operation that cannot enter immediately can await a Task rather than blocking a thread for the duration of the wait.

Unlike Semaphore, SemaphoreSlim is not intended for cross-process synchronization. It is designed for in-process scenarios such as limiting concurrent operations and protecting async critical sections.

Ownership and Thread Affinity

Waiting cost is only one criterion for choosing a primitive. Another is ownership:

Who is allowed to release the primitive after it has been acquired?

This leads to the concept of thread affinity.

A thread-affine primitive records which thread owns it. The thread that enters the critical section must also be the thread that exits it.

Thread-affine primitives include:

Monitor / lock
Mutex
ReaderWriterLockSlim

Primitives without thread affinity include:

Semaphore
SemaphoreSlim
AutoResetEvent
ManualResetEvent
EventWaitHandle

If one thread enters a lock, another thread cannot correctly call Monitor.Exit() on its behalf. Attempting to release a monitor owned by another thread throws SynchronizationLockException.

This ownership rule protects shared state. One thread cannot accidentally release a lock acquired by another thread and allow concurrent access before the original critical section is complete.

The model fits synchronous critical sections naturally:

A thread enters.
It updates the protected state.
The same thread exits.

Mutex also tracks ownership. Because it is an OS synchronization object, the operating system can detect when the owning thread or process terminates unexpectedly. A waiter may then receive AbandonedMutexException.

That exception does not mean the protected data is safe. It means the previous owner disappeared and may have left the state inconsistent. Still, it is better than waiting forever for an owner that no longer exists.

ReaderWriterLockSlim is also thread-affine. It distinguishes between read, write, and upgradeable read locks. It can be useful, but it should not be selected automatically just because an application performs more reads than writes. Its additional complexity only pays off for suitable contention and workload patterns.

Reentrancy Is a Separate Property

Thread affinity and reentrancy are related, but they are not the same thing.

Thread affinity answers: who owns the lock?

Reentrancy answers: can the owner acquire the same lock again?

Monitor is reentrant, so the following code works:

lock (_sync)
{
    lock (_sync)
    {
        // The same thread can enter again.
    }
}

Monitor tracks both the owner and an acquisition count. The thread must exit the monitor as many times as it entered.

Mutex is also reentrant. ReaderWriterLockSlim, however, does not allow recursion by default. It has a configurable recursion policy, but enabling recursion increases complexity and should be done deliberately.

SpinLock should not be treated as an ordinary reentrant lock either. Reentering it can cause failures or an indefinite wait, depending on its owner-tracking configuration.

Why Thread Affinity Conflicts with async/await

The C# compiler does not allow await inside a lock block and reports error CS1996.

lock (_sync)
{
    await SaveAsync(); // CS1996
}

This is not an arbitrary syntax restriction.

lock is based on Monitor, and Monitor has thread affinity. The thread that enters must be the thread that exits.

An async method follows a different execution model. When it reaches an incomplete await, the method is suspended and its thread is free to do other work. When the awaited operation completes, the continuation may run on a different thread.

In ASP.NET Core, there is no guarantee that execution after an await will resume on the same physical thread.

If await were allowed inside lock, one thread could acquire the monitor, suspend at the await, and later attempt to release the monitor from another thread. That would violate the monitor’s ownership rule.

For async critical sections, SemaphoreSlim is commonly used instead:

await _gate.WaitAsync(cancellationToken);
try
{
    await SaveAsync(cancellationToken);
}
finally
{
    _gate.Release();
}

SemaphoreSlim has no thread affinity. The continuation that calls Release() does not need to run on the same physical thread that called WaitAsync().

That does not make lack of thread affinity universally better. It is simply a different ownership model:

SemaphoreSlim fits async waiting and concurrency limits.
lock fits short synchronous protection of shared state.
Mutex fits cross-process synchronization.
Interlocked fits simple atomic state transitions.

Every primitive has its own waiting cost and ownership semantics.

A Practical Selection Guide

Requirement	Typical choice
Protect a short synchronous critical section inside one process	`lock`
Increment, exchange, or compare a single value atomically	`Interlocked`
Protect an async critical section	`SemaphoreSlim(1, 1)`
Limit async concurrency to N operations	`SemaphoreSlim(N, N)`
Coordinate multiple processes	Named `Mutex` or another OS primitive
Allow concurrent readers and exclusive writers	`ReaderWriterLockSlim`, after measuring
Optimize an extremely short wait in low-level code	`SpinWait` or `SpinLock`, with care

The important qualification is after measuring. A more specialized primitive is not automatically faster. Its benefit depends on contention, critical-section duration, workload shape, and whether the code is synchronous or asynchronous.

The Main Takeaway

Synchronization primitives differ in more than their names and APIs. When comparing them, ask:

How does the caller wait, and what does that waiting cost?
Who owns the primitive and who is allowed to release it?
Does its execution model work with async code?

Those questions reveal why lock, SemaphoreSlim, Mutex, Interlocked, and SpinWait solve different problems.

A senior-level answer should explain not only what a primitive does, but also the tradeoffs behind its waiting strategy, ownership model, and scope.