Building a Database in a Text File — Fewer Allocations with Span<T>

Post 4 of the series: Advanced C# for Your Next Interview

In the previous posts we fixed data corruption with SemaphoreSlim. The writes are safe now. But every operation still does this:

var json = await File.ReadAllTextAsync(_filePath, ct);
var records = JsonSerializer.Deserialize<List<FileRecord>>(json) ?? [];

That means a big string for the entire file content, a list with all the records, and all the deserialized objects behind it. When the method returns, GC has to clean most of it up. Under load this creates constant pressure on the garbage collector.

Today we fix the biggest part of that.

Step 1 - Change the File Format

The first problem is bigger than allocations. Look at what WriteAsync does:

var json = await File.ReadAllTextAsync(_filePath, ct);
var records = JsonSerializer.Deserialize<List<FileRecord>>(json) ?? [];

records.Add(record);

await File.WriteAllTextAsync(_filePath, JsonSerializer.Serialize(records), ct);

To add one record we read and rewrite the entire file every time. With 10,000 records this is a lot of wasted IO.

We switch to an append-only format. Instead of one big JSON array, each record gets its own line:

{"Id":"a1b2...","Name":"Record-1","Payload":"...","CreatedAt":"2026-01-01"}
{"Id":"c3d4...","Name":"Record-2","Payload":"...","CreatedAt":"2026-01-01"}
{"Id":"e5f6...","Name":"Record-3","Payload":"...","CreatedAt":"2026-01-01"}

This is called NDJSON: Newline Delimited JSON. Now WriteAsync just appends one line:

public async Task WriteAsync(FileRecord record, CancellationToken ct = default)
{
    await _writeLock.WaitAsync(ct);
    try
    {
        var line = JsonSerializer.Serialize(record) + "\n";
        await File.AppendAllTextAsync(_filePath, line, ct);
    }
    finally
    {
        _writeLock.Release();
    }
}

No more reading the entire file to add one record. Just one append at the end.

Step 2 - Span for Parsing

Now let’s look at how we read. The old approach allocates a string for the entire file, then deserializes all records from that one big JSON array. All of this lands on the heap and has to be collected by GC.

With NDJSON format we can read line by line. File.ReadLinesAsync still gives us a string for each line, but we avoid creating extra strings when trimming or slicing the line. We use Span<char>: a view over existing memory.

await foreach (var line in File.ReadLinesAsync(_filePath, ct))
{
    var span = line.AsSpan().Trim();
    if (span.IsEmpty) continue;

    var record = JsonSerializer.Deserialize<FileRecord>(span);
    if (record is not null)
        result.Add(record);
}

line.AsSpan() does not allocate. It creates a Span<char> that points to the same memory as the string. Trim() on a span also does not allocate. It just adjusts the start and end of the view. We pass the span directly to JsonSerializer.Deserialize, which accepts ReadOnlySpan<char>.

Span<T> is a stack-only type. The span itself is not boxed and does not become another heap object for GC to collect.

The Full ReadAllAsync

public async Task<List<FileRecord>> ReadAllAsync(CancellationToken ct = default)
{
    if (!File.Exists(_filePath))
        return [];

    var result = new List<FileRecord>();

    await foreach (var line in File.ReadLinesAsync(_filePath, ct))
    {
        var span = line.AsSpan().Trim();
        if (span.IsEmpty) continue;

        var record = JsonSerializer.Deserialize<FileRecord>(span);
        if (record is not null)
            result.Add(record);
    }

    return result;
}

Clean and simple. No big file-sized string, no JSON array that has to be parsed as one large document. We still allocate the returned List<FileRecord> and the records themselves, because this method returns a full list. But we removed the largest avoidable allocation from the read path.

Does It Actually Help?

Let’s measure it with BenchmarkDotNet instead of guessing.

Method	Mean	Gen1	Gen2	Allocated
ReadAll_JsonArray	14.74 ms	578.13	171.88	7.99 MB
ReadAll_NdjsonWithSpan	15.76 ms	312.50	125.00	6.34 MB

[MemoryDiagnoser]
[SimpleJob]
public class ReadBenchmarks
{
    private const int RecordCount = 10_000;
    private string _jsonFilePath = null!;
    private string _ndjsonFilePath = null!;

    [GlobalSetup]
    public async Task Setup()
    {
        _jsonFilePath = Path.GetTempFileName();
        _ndjsonFilePath = Path.GetTempFileName();
        await File.WriteAllTextAsync(_jsonFilePath, "[]");

        NaiveFileStorage jsonStorage = new(_jsonFilePath);
        MemoryOptimizedStorage ndjsonStorage = new(_ndjsonFilePath);

        for (int i = 0; i < RecordCount; i++)
        {
            FileRecord record = new(
                Guid.NewGuid(),
                $"Record-{i}",
                $"Payload-{i}",
                DateTime.UtcNow);

            await jsonStorage.WriteAsync(record);
            await ndjsonStorage.WriteAsync(record);
        }
    }

    [Benchmark(Baseline = true)]
    public async Task<List<FileRecord>> ReadAll_JsonArray()
    {
        string json = await File.ReadAllTextAsync(_jsonFilePath);
        return JsonSerializer.Deserialize<List<FileRecord>>(json) ?? [];
    }

    [Benchmark]
    public async Task<List<FileRecord>> ReadAll_NdjsonWithSpan()
    {
        List<FileRecord> result = [];

        await foreach (string line in File.ReadLinesAsync(_ndjsonFilePath))
        {
            ReadOnlySpan<char> span = line.AsSpan().Trim();
            if (span.IsEmpty)
            {
                continue;
            }

            FileRecord? record = JsonSerializer.Deserialize<FileRecord>(span);
            if (record is not null)
            {
                result.Add(record);
            }
        }

        return result;
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        File.Delete(_jsonFilePath);
        File.Delete(_ndjsonFilePath);
    }
}

Allocated memory dropped by 21%. Gen1 collections were cut almost in half. But notice the Mean time: it is roughly the same, even slightly higher for the optimized version. This is the part many people get wrong: fewer allocations do not automatically mean faster code right now. They mean less work for GC later. The real benefit shows up under sustained load, not in a single isolated call.

What Changed

Operation	Before	After
Write	Read and rewrite entire file	Append one line
Read	One big string plus one full result list	Line by line, `Span<char>` for trimming and parsing
GC pressure	High	Significantly lower

The file is still read fully on every ReadAllAsync call. That is the next problem. In the following post we introduce System.IO.Pipelines and IAsyncEnumerable: reading the file in chunks and streaming records to the caller without loading everything into memory first. We also bring in ArrayPool<T> there, where it actually belongs.

Previous Posts in This Series

Source code: GitHub - sigmade/sigmade.github.io