{"id":434,"date":"2022-10-09T20:19:18","date_gmt":"2022-10-09T20:19:18","guid":{"rendered":"https:\/\/baldsolutions.com\/?p=434"},"modified":"2022-10-14T15:34:22","modified_gmt":"2022-10-14T15:34:22","slug":"vectorization-simd-performance-booster","status":"publish","type":"post","link":"https:\/\/baldsolutions.com\/index.php\/2022\/10\/09\/vectorization-simd-performance-booster\/","title":{"rendered":"Vectorization (SIMD) &#8211; performance booster"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Introduction<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">How would you implement a <em>Max() <\/em>method that returns the maximum value element in the array of int values? Well, the most obvious approach would be to iterate through the array and compare each element to the so far max value. And this is 100% correct. However, there is another way to do the same with significantly better performance. The faster approach can be implemented thanks to the concept of the vectorization (SIMD &#8211; Single Instruction Multiple Data is referred to as vectorization where operations are applied to all of the elements in a \u201cvector\u201d at the same time &#8211; as mentioned <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance_improvements_in_net_7\/#vectorization\" target=\"_blank\" rel=\"noopener\" title=\"\">here<\/a>). We can achieve such a vectorization by using e.g. the <a href=\"https:\/\/learn.microsoft.com\/en-us\/dotnet\/api\/system.numerics.vector-1?view=net-6.0\" target=\"_blank\" rel=\"noopener\">Vector<\/a> struct.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Alright, enough talk, it&#8217;s benchmark time! <\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Benchmark<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">I prepared an extension method:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>public static class VectorizationExtension\n{\n\n    public static int Max_Vectorized(this IEnumerable&lt;int&gt; source)\n...<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The following benchmark results show how the <code>Max_Vectorized <\/code>method performs in comparison to the <code>Max<\/code> extension method provided by the LINQ.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"966\" height=\"406\" src=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image.png\" alt=\"\" class=\"wp-image-440\" srcset=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image.png 966w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image-300x126.png 300w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image-768x323.png 768w\" sizes=\"auto, (max-width: 966px) 100vw, 966px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see the the <code>Max_Vectorized <\/code>method performance is amazing! So here is the question, if the Vector-based Max method  is so much faster, why it is not used in LINQ then? Well, it is, but in <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance_improvements_in_net_7\/#linq\" target=\"_blank\" rel=\"noopener\">.NET 7<\/a>!<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"969\" height=\"412\" src=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image-1.png\" alt=\"\" class=\"wp-image-441\" srcset=\"https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image-1.png 969w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image-1-300x128.png 300w, https:\/\/baldsolutions.com\/wp-content\/uploads\/2022\/10\/image-1-768x327.png 768w\" sizes=\"auto, (max-width: 969px) 100vw, 969px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">If you compare .NET 7  results you can see that the <code>Max<\/code> method is roughly as fast as  <code>Max_Vectorized<\/code> method. At this point I have no idea why the <code>Max_Vectorized<\/code> is faster as it&#8217;s based on the same concept but it&#8217;s  probably because of the advanced level of the<code>Max <\/code>method implementation (see the implementation <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/main\/src\/libraries\/System.Linq\/src\/System\/Linq\/MaxMin.cs\" target=\"_blank\" rel=\"noopener\">here<\/a>).<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Summary<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">To sum up, vectorization is a powerful way to improve the performance in some cases. It is definitely good to know that .NET has a support of Vectors as some hot paths may require to speed them up by using this little trick. I highly recommend to look at the <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance_improvements_in_net_7\/\" target=\"_blank\" rel=\"noopener\">Performance Improvements in .NET 7<\/a> (or at least some part of this huuuge document) as it&#8217;s a great reference in terms of further experiments and searches. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As always I&#8217;ve prepared an example and you can find it on my <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\" target=\"_blank\" rel=\"noreferrer noopener\">github<\/a>, project: <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\/Vectorization\" target=\"_blank\" rel=\"noopener\">Vectorization<\/a>, test project: <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\/Vectorization-tests\" target=\"_blank\" rel=\"noopener\">Vectorization-tests<\/a>, benchmark project:  <a href=\"https:\/\/github.com\/tglowka\/baldsolutions\/tree\/master\/.NET\/Vectorization-benchmark\" target=\"_blank\" rel=\"noopener\">Vectorization-benchmark<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Have a nice day, bye!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction How would you implement a Max() method that returns the maximum value element in the array of int values? Well, the most obvious approach&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/baldsolutions.com\/index.php\/2022\/10\/09\/vectorization-simd-performance-booster\/\">Continue reading<span class=\"screen-reader-text\">Vectorization (SIMD) &#8211; performance booster<\/span><\/a><\/div>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-434","post","type-post","status-publish","format-standard","hentry","category-net","entry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/434","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/comments?post=434"}],"version-history":[{"count":12,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/434\/revisions"}],"predecessor-version":[{"id":455,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/posts\/434\/revisions\/455"}],"wp:attachment":[{"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/media?parent=434"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/categories?post=434"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/baldsolutions.com\/index.php\/wp-json\/wp\/v2\/tags?post=434"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}