All posts tagged with benchmark

Bubble Sort Benchmark in Rust

July 01, 2020 - Søren Alsbjerg Hørup

One of my colleagues have updated my Bubble Sort Benchmark with a Rust implementation, and the results are in!

> ./start.bat

> clang -O -fsanitize=address main.cpp   && a.exe
   Creating library a.lib and object a.exp
30125ms to sort 50000 elements 10 times (Vector)
22985ms to sort 50000 elements 10 times (Array)

> clang -O main.cpp   && a.exe
10281ms to sort 50000 elements 10 times (Vector)
9906ms to sort 50000 elements 10 times (Array)

> dotnet run --configuration=Release
32547ms to sort 50000 elements 10 times  (List)
15531ms to sort 50000 elements 10 times (Array)

> node index.js
28133ms to sort 50000 elements 10 times

> rustc -O -o rust.exe

> rust.exe
Took: 10.5915778s to sort 50000 elements 10 times

As seen, sorting 50k elements 10 times took 10.59 seconds using Rust. The C++ implementation was a tad faster at 9.9 seconds when using C arrays and 10.28 seconds when using std::vector. However, the C++ implementation does not guarantee against memory corruption, which is the case with Rust. The paid memory safety overhead of 3-7% in Rust, is in my opinion worth it.

Comparing the address sanitized version of the C++ array implementation, Rust is twice as fast fast: 10.59 vs 22.98 seconds. I believe this is due to the implementation of the sanitizer in clang, i.e. it needs protect every memory access, since C++ allows pointer arithmetic. Rust does not allow such programming behavior, unless wrapped in unsafe {}.

Compared with the node and DotNet core implementations of Bubble sort, the Rust implementation is between 50% (DotNet Array) and 300% (DotNet List) faster, which again is not surprising due to the restrictions of the Rust language: which allows for tight optimizations without introducing runtime checks (except for array out of bounds).

Rust is an awesome language and I definitely hope to see it succeed!

Bubble Sort Benchmark: C++ vs .NET Core vs Node

June 12, 2020 - Søren Alsbjerg Hørup

We recently hit a bottleneck in a visualization application that we are working on, specifically in the backed of the visualization app. The app consist of an ASP .NET Core backend, a CosmosDB for hot storage, Blob storage for cold storage and a React frontend to visualize both the hot and cold data.

We have the non-functional requirement (NFR) that data must be visible within 5 seconds (easy to test, hard to achieve). Our backend collects the data both from hot and cold storage and exposes this for the React fronted through a REST API. Works fine, but the NFR of 5 seconds are not achievable using this approach since the frontend will “hang” / “freeze” with a spinner until data is ready, which can easily take 20-30 seconds.

I had the idea of streaming data from the backend to the frontend and let the frontend do the processing of the data instead of the back end. While it would still take 20-30 seconds before the data was 100% visible in the frontend, we could at-least achieve the 5 second NFR since we are able to show the data as it is ready, which would greatly enhance the user experience.

The team was not fond of my idea, due to “JavaScript not being fast enough to process the data”. While this might have been true in the nineties, it is no longer true with the V8 engine powering the JavaScript of today. I decided to convince the team through a quick experimental benchmark where I would Implement Bubble Sort in C#/.NET Core and JavaScript/Node and compare the results. Oh, and just for the kicks, I did an implementation in C++ using Clang as compiler.

The source-code of the benchmark can be found on my Github: BubblesortBenchmark

For the benchmark, 50.000 elements are generated and bubble-sorted 10 times to avoid measuring “cold start”.

For the C++ benchmark, the clang compiler is used with optimization and without the -fsanitize=address flag. This flag will introduce bounds checking to make the C++ runtime comparable to that of C# and Node. The -O flag is also used to optimize the compiled code. Two bubble sort implementations have been done for C++, one using std:vector and another using arrays.

For the C# benchmark, DotNet Core 3.1 have been used with -configuration=Release flag to produce a optimized binary. Also for C#, two implementations have been one, one using List and another using a C# array.

For the JavaScript benchmark, a single implementation has been done using JavaScript arrays. Node v12 is used as runtime.

Now for the results!
Taken directly from the shell:

> clang -O -fsanitize=address main.cpp && a.exe 
Creating library a.lib and object a.exp 
30156ms to sort 50000 elements 10 times (Vector) 
22360ms to sort 50000 elements 10 times (Array)

> clang -O main.cpp && a.exe 
9984ms to sort 50000 elements 10 times (Vector) 
8859ms to sort 50000 elements 10 times (Array)

> dotnet run --configuration=Release 
28766ms to sort 50000 elements 10 times (List) 
14687ms to sort 50000 elements 10 times (Array)

> node index.js 
29131ms to sort 50000 elements 10 times

And visualized as bar graphs (lower is better):

2020 06 12 06 06 15

The results are not that surprising.

C++ without address sanitizer outperforms all other implementations. The reason is simple, the program does not check for out of bounds when accessing the array and will therefore save one or more instructions per array access, which for Bubblesort is O(n^2). std:vector and array are also comparable in performance since very little overhead is introduced with the std:vector abstraction.

C# .NET Core using List is nearly two times slower than C# .NET using arrays. This could be attributed to the fact that an array of Integers in .NET is guaranteed to be a continuous set of integers where a List boxes the Integers into objects introducing additional indirect access.

JavaScripts approach to arrays are similar to that of C# lists, which is also evident in the benchmark. Comparing the C# List and JavaScript Array implementations, they are nearly 100% identical! Compared this with the C++ std:vector implementation using address sanitize, all three “list like” implementations performs the same.

One surprising aspect is that C# .NET Core array outperforms all C++ implementations when guarded with the address sanitize flag. I believe this is due to the just-in-time compilation nature of .NET Core, since .NET might deduce that no range-checks are needed during the JIT compilation process and thus save several instructions.

All in all, a nice little benchmark that has convinced the team that we can surely due data processing using JavaScript if needed.

Next step could be, to look into the JavaScript implementation and see if it can be improved by using e.g. Object.seal to increase performance. Another aspect could be to introduce a similar implementation in Rust which I would expect performed the same as the C++ arrays implementation using address sanitizer.

Speed comparison of the new and old blog

May 25, 2020 - Søren Alsbjerg Hørup

Before migrating from to my GatsbyJS site, I did a website speed comparison using which runs PageSpeed and YSlow tests to determine the speed score of a site.

Performance report of my Wordpress blog

As seen, my site was not exactly a high scoring blog on the Internet. To be fair, I never did anything to improve the performance of the site.

Performance report of my new GatsbyJS blog

The new GatsbyJS site performs extremely well out of the box, much better than my Wordpress site. The comparison is not even an apple to apple comparison, since for the Wordpress site, I only loaded a few of the blog posts on the initial load. For the Gatsby site, I generate an Index file containing ALL my blog posts! The latter allows searching on the site using the browser search, which is just plain awesome.

For the blog, blog posts were lazy loaded as the browser scrolls down. For the Gatsby blog, images are lazy loaded as they are shown. Looks a bit strange, but the responsiveness (as in performance) is a huge win.

The YSlow score could be better, buuut I think I will put that on my endless todo list for another time :-)

Azure Blob Storage Benchmark

July 25, 2019 - Søren Alsbjerg Hørup

Another quick benchmark I did regarding azure was the transfer of 65K messages to an Azure Blob Storage.

My test application was a .NET Core Console application that:

  • Created 1000 tasks.
  • Each tasks would instantiate a message with a unique GUID and populate the message with a payload of 65K.
  • Each task would then push the message to the Blob Storage.
  • This process would be repeated 1000 times.

In total, 1000 x 1000 Blobs were created, transmitted and stored during the run of the application.

The results were that my app was consuming 28% of my CPU and using 600Mbit/s worth of bandwidth - which is the limit of my upload.

Clearly, Azure is not the limiting factor here, neither is my CPU.