Skip to content

False sharing – low level performance trap

Introduction

False sharing is another low level programming “feature” that can cause the performance degradation. Long story short, false sharing is a situation when many threads run at the same time on many processors and they share data. However, the fact of sharing, looking like a booster at first glance, turns out to be performance trap. In General, being fully aware of the false sharing is highly recommended as it can solve performance issues in some particular cases. Without further ado, let me show you my point of view supported by code samples.

Theory

Let’s consider the following scenario:

  • we have 4 threads that run on 4 processors.
  • each processor has its own cache.
  • each thread modifies the corresponding (shared) array index (1st thread array [0], 2nd thread array [1] and so on).

Such a scenario is a perfect example of a potential false sharing.

Each cache has its own loaded array. If the thread wants to increment its element in the array it tries with the cached one. BUT if other thread changed the array in a meantime (by incrementing its own element) the thread’s corresponding cache line has been marked as invalid, hence the cache misses and reaching out to the main memory (additional cost here!). Instead of benefits from sharing something it turns out that the threads disturb each other work.

Can we overcome that? Sure we can!

Example

I’ve created a simple FalseSharingExecutor class:

public class FalseSharingExecutor
{
    private readonly int _processorCount = Environment.ProcessorCount;
    private readonly int _iterations;

    public readonly int[] _list, _list2, _list3;

    public FalseSharingExecutor(int iterations)
    {
        _list = new int[_processorCount];
        _list2 = new int[_processorCount * 16];
        _list3 = new int[_processorCount * 16 + 16];
        _iterations = iterations;
    }

    public async Task ExecuteWithFalseSharing()...
    public async Task ExecuteFalseSharingWithImprovement()
...
    public async Task ExecuteFalseSharingWithBetterImprovement()...
}

As you can see there are three lists. First list (_list) has 4 elements (Environment.ProcessorCount equals four on my laptop), second list (_list2) has 64 elements and the last one (_list3) has 80 elements (it feels uncomfortable if something is not the power of 2). There are also three methods. ExecuteWithFalseSharing is implemented in a false-sharing manner, ExecuteFalseSharingWithImprovement, as name suggests, is improved, and ExecuteFalseSharingWithBetterImprovementis improved even better (proofs in a benchmark soon).

Let’s look at the methods’ implementations.

public async Task ExecuteWithFalseSharing()
{
    var tasks = new Task[_processorCount];
    for (int i = 0; i < _processorCount; i++)
    {
        var index = i;
        tasks[index] = Task.Run(() =>
        {
            for (int j = 0; j < _iterations; j++)
                ++_list[index];
        });
    }
    await Task.WhenAll(tasks);
}

public async Task ExecuteFalseSharingWithImprovement()
{
    var tasks = new Task[_processorCount];

    for (int i = 0; i < _processorCount; i++)
    {
        var index = i;

        tasks[index] = Task.Run(() =>
        {
            for (int j = 0; j < _iterations; j++)
                ++_list2[index * 16];
        });
    }

    await Task.WhenAll(tasks);
}

public async Task ExecuteFalseSharingWithBetterImprovement()
{
    var tasks = new Task[_processorCount];

    for (int i = 0; i < _processorCount; i++)
    {
       var index = i;

       tasks[i] = Task.Run(() =>
       {
            for (int j = 0; j < _iterations; j++)
                ++_list3[index * 16 + 16];
        });
    }

    await Task.WhenAll(tasks);
}

All the methods do is to run four tasks. Each task increments the corresponding array element. The only difference is the array size and the corresponding array element that is being incremented.

Benchmark

As you can see the methods` names and the corresponding execution time/cache misses match perfectly.

Explanation

It turns out that the larger array overcomes the false sharing problem.

Cache is loaded via cache line that is 64 bytes. In the ExecuteWithFalseSharing the array object size is 40 bytes size (24 initial bytes + 4 * 4 bytes per integer, worth to mention that on 64-bit architecture .NET reference objects always start with 24 bytes = 8 bytes header + 8 bytes method table + 8 bytes inner data). All that fits in a single cache line and can be easily invalidated by another running thread. ExecuteFalseSharingWithImprovement uses 280 bytes (24 bytes + 64*4 bytes per integer). 280 bytes means 5 cache lines, therefore corresponding threads’ elements are located in the different cache lines – and that’s the point. Having corresponding elements in different cache lines causes the undisturbed thread work. In other words, the threads do not invalidate each others caches. To get even better result we can take a look at the ExecuteFalseSharingWithBetterImprovement. The idea is the same (corresponding elements are located in different cache lines) except that the object’s method table (second octet of the 24 initial bytes), which is the object’s address (in other words, if something refers to the object it uses the address of the method table – method table also stores the reference to the type description), is in the different cache line comparing to the first thread corresponding element. Thanks to that first thread incrementation operation does not invalidate the cache line where the method table is stored.

Summary

To sum up, false sharing seems to be a little tricky and not quite obvious. If the performance especially matters in your case, then false sharing might be the solution to speed up the execution time. I haven’t found many articles about that but here is one.

As always I’ve prepared an example and you can find it on my github, project: false-sharinng, test project: false-sharing-tests, benchmark project: false-sharing-benchmark.
I encourage you to go through the code, debug it and try to test your own scenarios.

Have a nice day, bye!

Published in.NET

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *