Bubble Sort Benchmark: C++ vs .NET Core vs Node
June 12, 2020 - Søren Alsbjerg HørupWe recently hit a bottleneck in a visualization application that we are working on, specifically in the backed of the visualization app. The app consist of an ASP .NET Core backend, a CosmosDB for hot storage, Blob storage for cold storage and a React frontend to visualize both the hot and cold data.
We have the non-functional requirement (NFR) that data must be visible within 5 seconds (easy to test, hard to achieve). Our backend collects the data both from hot and cold storage and exposes this for the React fronted through a REST API. Works fine, but the NFR of 5 seconds are not achievable using this approach since the frontend will “hang” / “freeze” with a spinner until data is ready, which can easily take 20-30 seconds.
I had the idea of streaming data from the backend to the frontend and let the frontend do the processing of the data instead of the back end. While it would still take 20-30 seconds before the data was 100% visible in the frontend, we could at-least achieve the 5 second NFR since we are able to show the data as it is ready, which would greatly enhance the user experience.
The team was not fond of my idea, due to “JavaScript not being fast enough to process the data”. While this might have been true in the nineties, it is no longer true with the V8 engine powering the JavaScript of today. I decided to convince the team through a quick experimental benchmark where I would Implement Bubble Sort in C#/.NET Core and JavaScript/Node and compare the results. Oh, and just for the kicks, I did an implementation in C++ using Clang as compiler.
The source-code of the benchmark can be found on my Github: BubblesortBenchmark
For the benchmark, 50.000 elements are generated and bubble-sorted 10 times to avoid measuring “cold start”.
For the C++ benchmark, the clang compiler is used with optimization and without the -fsanitize=address flag. This flag will introduce bounds checking to make the C++ runtime comparable to that of C# and Node. The -O flag is also used to optimize the compiled code. Two bubble sort implementations have been done for C++, one using std:vector and another using arrays.
For the C# benchmark, DotNet Core 3.1 have been used with -configuration=Release flag to produce a optimized binary. Also for C#, two implementations have been one, one using List and another using a C# array.
For the JavaScript benchmark, a single implementation has been done using JavaScript arrays. Node v12 is used as runtime.
Now for the results!
Taken directly from the shell:
> clang -O -fsanitize=address main.cpp && a.exe
Creating library a.lib and object a.exp
30156ms to sort 50000 elements 10 times (Vector)
22360ms to sort 50000 elements 10 times (Array)
> clang -O main.cpp && a.exe
9984ms to sort 50000 elements 10 times (Vector)
8859ms to sort 50000 elements 10 times (Array)
> dotnet run --configuration=Release
28766ms to sort 50000 elements 10 times (List)
14687ms to sort 50000 elements 10 times (Array)
> node index.js
29131ms to sort 50000 elements 10 times
And visualized as bar graphs (lower is better):
The results are not that surprising.
C++ without address sanitizer outperforms all other implementations. The reason is simple, the program does not check for out of bounds when accessing the array and will therefore save one or more instructions per array access, which for Bubblesort is O(n^2). std:vector and array are also comparable in performance since very little overhead is introduced with the std:vector abstraction.
C# .NET Core using List is nearly two times slower than C# .NET using arrays. This could be attributed to the fact that an array of Integers in .NET is guaranteed to be a continuous set of integers where a List boxes the Integers into objects introducing additional indirect access.
JavaScripts approach to arrays are similar to that of C# lists, which is also evident in the benchmark. Comparing the C# List and JavaScript Array implementations, they are nearly 100% identical! Compared this with the C++ std:vector implementation using address sanitize, all three “list like” implementations performs the same.
One surprising aspect is that C# .NET Core array outperforms all C++ implementations when guarded with the address sanitize flag. I believe this is due to the just-in-time compilation nature of .NET Core, since .NET might deduce that no range-checks are needed during the JIT compilation process and thus save several instructions.
All in all, a nice little benchmark that has convinced the team that we can surely due data processing using JavaScript if needed.
Next step could be, to look into the JavaScript implementation and see if it can be improved by using e.g. Object.seal to increase performance. Another aspect could be to introduce a similar implementation in Rust which I would expect performed the same as the C++ arrays implementation using address sanitizer.