{"id":281,"date":"2022-03-26T23:18:17","date_gmt":"2022-03-26T23:18:17","guid":{"rendered":"https:\/\/baldsolutions.com\/?p=281"},"modified":"2022-04-10T10:36:11","modified_gmt":"2022-04-10T10:36:11","slug":"branch-misprediction-slow-down","status":"publish","type":"post","link":"https:\/\/baldsolutions.com\/index.php\/2022\/03\/26\/branch-misprediction-slow-down\/","title":{"rendered":"Branch misprediction &#8211; slow down"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Introduction<\/h1>\n\n\n\n<p>Branch misprediction, as title suggests, might slow down program execution time. If you have never heard about the topic I recommend to read the stackoverflow <a href=\"https:\/\/stackoverflow.com\/questions\/11227809\/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array\/11227902#11227902\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"post\">post<\/a>. Long story short, your processor sometimes (for branch instructions, namely &#8220;if&#8221; instruction) tries to predict what will be executed in the future to speed up execution time. As you can imagine, correct prediction is beneficial, whereas incorrect one costs additional time. Does it have significant impact? Let&#8217;s find out!<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Example<\/h1>\n\n\n\n<p>Let&#8217;s start with a simple code snippet.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>public class BranchMispredictionExecutor\n{\n    private readonly List&lt;int&gt; _sortedCollection;\n    private readonly List&lt;int&gt; _unsortedCollection;\n\n    public BranchMispredictionExecutor(int count)\n    {\n        _sortedCollection = Enumerable.Range(0, count).ToList();\n        _unsortedCollection = Enumerable.Range(0, count).ToList();\n        _unsortedCollection.Shuffle();\n    }\n\n    public int SortedSumLessThan(int threshold)\n    {\n        var sum = 0;\n\n        for (int i = 0; i &lt; _sortedCollection.Count; i++)\n        {\n            var element = _sortedCollection&#91;i];\n            if (element &lt; threshold)\n                sum += element;\n        }\n\n        return sum;\n    }\n\n    public int UnsortedSumLessThan(int threshold)\n    {\n        var sum = 0;\n\n        for (int i = 0; i &lt; _unsortedCollection.Count; i++)\n        {\n            var element = _unsortedCollection&#91;i];\n            if (element &lt; threshold)\n                sum += element;\n        }\n\n        return sum;\n    }\n}<\/code><\/pre>\n\n\n\n<p>To measure the branch misprediction slow down I&#8217;ve created <code>BranchMispredictionExecutor<\/code> class.  The class has sorted and unsorted collection: <code>_sortedCollection<\/code> and <code>_unsortedCollection<\/code>. There&#8217;re also two methods: <code>SortedSumLessThan<\/code> and <code>UnsortedSumLessThan<\/code>. First one sums elements from the sorted collection that are less than a passed argument. Second one does the some but against unsorted collection. The collections consist of the same elements. The only difference is the order, hence the methods&#8217; returned value is always the same.<\/p>\n\n\n\n<p>WARNING: In C#, at least at the time of writing this post, there&#8217;s not a method to shuffle a collection. The above <code>Shuffle()<\/code> method  is a custom implementation taken from this <a href=\"https:\/\/stackoverflow.com\/questions\/273313\/randomize-a-listt\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"post\">post<\/a>.<\/p>\n\n\n\n<p>You might ask what is the point of iterating through a sorted\/unsorted collection. Well, the key point is the <code>if (element &lt; threshold) <\/code>but let&#8217;s see the benchmark first and then go to the explanation.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Benchmark<\/h1>\n\n\n\n<p>The benchmark code looks like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>public BranchMispredictionExecutorBenchmarks()\n{\n    _executor = new BranchMispredictionExecutor(_size);\n}\n\n&#91;Benchmark]\npublic void SortedSumLessThan()\n{\n    var sum = _executor.SortedSumLessThan(_size \/ 2);\n}\n\n&#91;HardwareCounters(HardwareCounter.BranchMispredictions, HardwareCounter.BranchInstructions)]\npublic class BranchMispredictionExecutorBenchmarks\n{\n    private readonly BranchMispredictionExecutor _executor;\n    private readonly int _size = 100000;\n\n    public BranchMispredictionExecutorBenchmarks()\n    {\n        _executor = new BranchMispredictionExecutor(_size);\n    }\n\n    &#91;Benchmark]\n    public void SortedSumLessThan()\n    {\n        var sum = _executor.SortedSumLessThan(_size \/ 2);\n    }\n\n    &#91;Benchmark]\n    public void UnsortedSumLessThan()\n    {\n        var sum = _executor.UnsortedSumLessThan(_size \/ 2);\n    }\n}<\/code><\/pre>\n\n\n\n<p>Comparison of execution time of the aforementioned methods looks quite impressive:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"832\" height=\"175\" src=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/03\/image-7.png\" alt=\"\" class=\"wp-image-285\" srcset=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/03\/image-7.png 832w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/03\/image-7-300x63.png 300w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/03\/image-7-768x162.png 768w\" sizes=\"auto, (max-width: 832px) 100vw, 832px\" \/><\/figure><\/div>\n\n\n\n<p>As the results present, the sorted array turns out to be much faster (~6.3 times faster) than the unsorted one.  What is more (as expected), the unsorted array caused great amount of branch mispredictions and therefore the execution time was so slow.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Explanation<\/h1>\n\n\n\n<p>It is high time to go back to the <code>if (element &lt; threshold)<\/code>.  You can think of the <code>if<\/code> statement as of the branch instruction (but at processor level). When it comes to the sorted collection the <code>_executor.SortedSumLessThan(_size \/ 2)<\/code> iterated through the collection of the consecutive elements, i.e. 0, 1, 2, &#8230; , 99999. The <code>for<\/code> loop summed first 50000 elements because all of them were less than passed argument, i.e. 50000 (<code>_size\/2<\/code>) So, first 50000 <code>if<\/code> evaluations were the same (resulted <code>true<\/code>), thus the processor were able to predict correctly the consecutive executions.  On the other hand, unsorted collection didn&#8217;t have such a predictive flow. Due the unsorted array the processor weren&#8217;t able to correctly predict the consecutive executions as it happened for the sorted collection.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Summary<\/h1>\n\n\n\n<p>To sum up, branch misprediction is definitely low level stuff. Presented example was written in C# but of course it applies to other programming languages as well. It is processor-realted feature, which is definitely lower in the abstraction hierarchy. Benchmark results showed that the execution time might me much faster, even if the methods looks the same at first glance. If you are interested in performance counters in BenchmarkDotNet I recommend to read this <a href=\"https:\/\/adamsitnik.com\/Hardware-Counters-Diagnoser\/\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"post\">post<\/a>.<\/p>\n\n\n\n<p>As always I&#8217;ve prepared an example and you can find it on my <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"github\">github<\/a>, project: <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\/branch-misprediction\" target=\"_blank\" rel=\"noreferrer noopener\">branch-misprediction<\/a>, benchmark project: <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\/branch-misprediction-benchmark\" target=\"_blank\" rel=\"noreferrer noopener\">branch-misprediction-benchmark.<\/a> I encourage you to go through the code, debug it and try to test your own scenarios.<\/p>\n\n\n\n<p>Have a nice day, bye!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Branch misprediction, as title suggests, might slow down program execution time. If you have never heard about the topic I recommend to read the&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/baldsolutions.com\/index.php\/2022\/03\/26\/branch-misprediction-slow-down\/\">Continue reading<span class=\"screen-reader-text\">Branch misprediction &#8211; slow down<\/span><\/a><\/div>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-281","post","type-post","status-publish","format-standard","hentry","category-net","entry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/281","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/comments?post=281"}],"version-history":[{"count":14,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/281\/revisions"}],"predecessor-version":[{"id":325,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/281\/revisions\/325"}],"wp:attachment":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/media?parent=281"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/categories?post=281"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/tags?post=281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}