{"id":145,"date":"2022-02-13T13:06:49","date_gmt":"2022-02-13T13:06:49","guid":{"rendered":"https:\/\/baldsolutions.com\/?p=145"},"modified":"2023-04-17T14:37:06","modified_gmt":"2023-04-17T14:37:06","slug":"data-locality-what-makes-your-code-slow","status":"publish","type":"post","link":"https:\/\/baldsolutions.com\/index.php\/2022\/02\/13\/data-locality-what-makes-your-code-slow\/","title":{"rendered":"Data locality &#8211; what makes your code slow?"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\" id=\"introduction\">Introduction<\/h1>\n\n\n\n<p>In this post I won&#8217;t be describing stuff strictly related to the .NET. The data locality topic is more about the processor cache and how it can affect the execution time of your application. Don&#8217;t worry though, the proper benchmark and the code itself will be written in C#, so read on! In the previous <a href=\"https:\/\/baldsolutions.com\/index.php\/2022\/02\/06\/async-eliding-part-2-drawbacks\/\" title=\"post\">post<\/a> I had unit tests but no benchmark. This time both the benchmark and unit tests are with us! <\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"quick-test\">Quick test<\/h1>\n\n\n\n<p>Before we deep dive into the data locality I have a question for you. Let&#8217;s consider the following code snippets:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>public void IterateWriteSequential()\n{\n    for (int i = 0; i &lt; _size; i++)\n    {\n        for (int j = 0; j &lt; _size; j++)\n        {\n            _array&#91;i, j] = 5;\n        }\n    }\n}<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>public void IterateWriteNonSequential()\n{\n    for (int i = 0; i &lt; _size; i++)\n    {\n        for (int j = 0; j &lt; _size; j++)\n        {\n            _array&#91;j, i] = 5;\n        }\n    }\n}<\/code><\/pre>\n\n\n\n<p>The code snippets are pretty straightforward (no tricky stuff here). Both of them do the same &#8211; assignment of &#8220;5&#8221; to all fields of the multidimensional array. The only difference is that the <code>IterateWriteSequential<\/code> method does it in row-by-row and the <code>IterateWriteNonSequential<\/code> method does it in a column-by-column way. In the test project you can ensure that the methods do the same. <\/p>\n\n\n\n<p>So, how the way the iteration goes through the multidimensional array can affect the execution time of your application?<\/p>\n\n\n\n<p>There are three options you can choose from: <\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>You have no idea.<\/li>\n\n\n\n<li>You exactly know what&#8217;s going on.<\/li>\n\n\n\n<li>You are not sure, but you guess.<\/li>\n<\/ol>\n\n\n\n<p>No matter what your answer is you should read on either to find out or verify my explanation of the topic.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"benchamrk-execution-time\">Benchamrk &#8211; execution time<\/h1>\n\n\n\n<p>Let&#8217;s take a look at the benchmark results of the aforementioned methods (size means the rows\/columns count, size = 100 means an array 100&#215;100):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"623\" height=\"277\" src=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/02\/image-3.png\" alt=\"\" class=\"wp-image-150\" srcset=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/02\/image-3.png 623w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/02\/image-3-300x133.png 300w\" sizes=\"auto, (max-width: 623px) 100vw, 623px\" \/><figcaption class=\"wp-element-caption\">Multidimensional array benchmark.<\/figcaption><\/figure>\n\n\n\n<p>I also ran similar benchmark but for the case with jagged array (i.e. int[][] instead of int[,]). Here are the benchmark results:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"644\" height=\"284\" src=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/02\/image-4.png\" alt=\"\" class=\"wp-image-151\" srcset=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/02\/image-4.png 644w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/02\/image-4-300x132.png 300w\" sizes=\"auto, (max-width: 644px) 100vw, 644px\" \/><figcaption class=\"wp-element-caption\">Jagged array benchmark.<\/figcaption><\/figure>\n\n\n\n<p>As you can see for the small arrays (size = 100) it does not matter, but as the array size increases, increases the execution time difference (significantly). Let&#8217;s focus on the greatest size, i.e. 10_000 (100_000_000 elements). For multidimensional array the <code>IterateWriteSequential<\/code> method execution time is almost 3 times faster than the <code>IterateWriteNonSequential<\/code>. For the jagged array the difference is even bigger. The <code>IterateWriteSequential<\/code> method is more than 6 times faster! Pretty impressive, isn&#8217;t it?<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"explanation\">Explanation<\/h1>\n\n\n\n<p>First of all let&#8217;s take a look at how the DRAM memory is organised. The DRAM cells are packed into matrices. To read the cell the row and the column in required. Sequential read (row number the same, iterate through columns) is much faster than reading the data by frequently changing row (not in a sequential manner).<\/p>\n\n\n\n<p>Secondly, there is the processor cache. When the processor needs to read form the memory it firstly tries to take it from its cache. If it is not present in the cache it fetches the data from the DRAM memory. It is obvious that the cache data fetching is much faster that the fetching from the RAM. To transfer data from DRAM to the cache a cache line is used. The cache line is fixed size (most commonly 64 bytes). It means you cannot transfer only 1 byte to your cache, all 64 bytes will be fetched. With this in mind we can imagine how bad &#8220;DRAM->cache&#8221; data transfer looks like when it comes to iterate through array in a non-sequential way. Reading consecutive elements (in non-sequential way) we ask for data that is not next to each other thereby we frequently use &#8220;DRAM->cache&#8221; transfer. Thanks to this your application, even if semantics is the same, the underlying operations might be way far from optimal.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"summary\">Summary<\/h1>\n\n\n\n<p>Data locality is something that you probably not think of when it comes to write an application. The consequences might never come into play as the data locality don&#8217;t matter for small data. However, once in a while the hot-paths of your application might need be improved and the data locality might turn out to be the key to success.<\/p>\n\n\n\n<p>As always I&#8217;ve prepared an example and you can find it on my <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"github\">github<\/a>, project: data-locality, test project: data-locality-tests, benchmark project: data-locality-benchmark. I encourage you to go through the code, debug it and try to test your own scenarios.<\/p>\n\n\n\n<p>Have a nice day, bye!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In this post I won&#8217;t be describing stuff strictly related to the .NET. The data locality topic is more about the processor cache and&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/baldsolutions.com\/index.php\/2022\/02\/13\/data-locality-what-makes-your-code-slow\/\">Continue reading<span class=\"screen-reader-text\">Data locality &#8211; what makes your code slow?<\/span><\/a><\/div>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-145","post","type-post","status-publish","format-standard","hentry","category-net","entry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/145","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/comments?post=145"}],"version-history":[{"count":14,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/145\/revisions"}],"predecessor-version":[{"id":696,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/145\/revisions\/696"}],"wp:attachment":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/media?parent=145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/categories?post=145"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/tags?post=145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}