Skip to content

Branch misprediction – slow down

Introduction

Branch misprediction, as title suggests, might slow down program execution time. If you have never heard about the topic I recommend to read the stackoverflow post. Long story short, your processor sometimes (for branch instructions, namely “if” instruction) tries to predict what will be executed in the future to speed up execution time. As you can imagine, correct prediction is beneficial, whereas incorrect one costs additional time. Does it have significant impact? Let’s find out!

Example

Let’s start with a simple code snippet.

public class BranchMispredictionExecutor
{
    private readonly List<int> _sortedCollection;
    private readonly List<int> _unsortedCollection;

    public BranchMispredictionExecutor(int count)
    {
        _sortedCollection = Enumerable.Range(0, count).ToList();
        _unsortedCollection = Enumerable.Range(0, count).ToList();
        _unsortedCollection.Shuffle();
    }

    public int SortedSumLessThan(int threshold)
    {
        var sum = 0;

        for (int i = 0; i < _sortedCollection.Count; i++)
        {
            var element = _sortedCollection[i];
            if (element < threshold)
                sum += element;
        }

        return sum;
    }

    public int UnsortedSumLessThan(int threshold)
    {
        var sum = 0;

        for (int i = 0; i < _unsortedCollection.Count; i++)
        {
            var element = _unsortedCollection[i];
            if (element < threshold)
                sum += element;
        }

        return sum;
    }
}

To measure the branch misprediction slow down I’ve created BranchMispredictionExecutor class. The class has sorted and unsorted collection: _sortedCollection and _unsortedCollection. There’re also two methods: SortedSumLessThan and UnsortedSumLessThan. First one sums elements from the sorted collection that are less than a passed argument. Second one does the some but against unsorted collection. The collections consist of the same elements. The only difference is the order, hence the methods’ returned value is always the same.

WARNING: In C#, at least at the time of writing this post, there’s not a method to shuffle a collection. The above Shuffle() method is a custom implementation taken from this post.

You might ask what is the point of iterating through a sorted/unsorted collection. Well, the key point is the if (element < threshold) but let’s see the benchmark first and then go to the explanation.

Benchmark

The benchmark code looks like this:

public BranchMispredictionExecutorBenchmarks()
{
    _executor = new BranchMispredictionExecutor(_size);
}

[Benchmark]
public void SortedSumLessThan()
{
    var sum = _executor.SortedSumLessThan(_size / 2);
}

[HardwareCounters(HardwareCounter.BranchMispredictions, HardwareCounter.BranchInstructions)]
public class BranchMispredictionExecutorBenchmarks
{
    private readonly BranchMispredictionExecutor _executor;
    private readonly int _size = 100000;

    public BranchMispredictionExecutorBenchmarks()
    {
        _executor = new BranchMispredictionExecutor(_size);
    }

    [Benchmark]
    public void SortedSumLessThan()
    {
        var sum = _executor.SortedSumLessThan(_size / 2);
    }

    [Benchmark]
    public void UnsortedSumLessThan()
    {
        var sum = _executor.UnsortedSumLessThan(_size / 2);
    }
}

Comparison of execution time of the aforementioned methods looks quite impressive:

As the results present, the sorted array turns out to be much faster (~6.3 times faster) than the unsorted one. What is more (as expected), the unsorted array caused great amount of branch mispredictions and therefore the execution time was so slow.

Explanation

It is high time to go back to the if (element < threshold). You can think of the if statement as of the branch instruction (but at processor level). When it comes to the sorted collection the _executor.SortedSumLessThan(_size / 2) iterated through the collection of the consecutive elements, i.e. 0, 1, 2, … , 99999. The for loop summed first 50000 elements because all of them were less than passed argument, i.e. 50000 (_size/2) So, first 50000 if evaluations were the same (resulted true), thus the processor were able to predict correctly the consecutive executions. On the other hand, unsorted collection didn’t have such a predictive flow. Due the unsorted array the processor weren’t able to correctly predict the consecutive executions as it happened for the sorted collection.

Summary

To sum up, branch misprediction is definitely low level stuff. Presented example was written in C# but of course it applies to other programming languages as well. It is processor-realted feature, which is definitely lower in the abstraction hierarchy. Benchmark results showed that the execution time might me much faster, even if the methods looks the same at first glance. If you are interested in performance counters in BenchmarkDotNet I recommend to read this post.

As always I’ve prepared an example and you can find it on my github, project: branch-misprediction, benchmark project: branch-misprediction-benchmark. I encourage you to go through the code, debug it and try to test your own scenarios.

Have a nice day, bye!

Published in.NET

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *