We recently hit a bottleneck in a visualization application that we are working on, specifically in the backed of the visualization app. The app consist of an ASP .NET Core backend, a CosmosDB for hot storage, Blob storage for cold storage and a React frontend to visualize both the hot and cold data.
We have the non-functional requirement (NFR) that data must be visible within 5 seconds (easy to test, hard to achieve). Our backend collects the data both from hot and cold storage and exposes this for the React fronted through a REST API. Works fine, but the NFR of 5 seconds are not achievable using this approach since the frontend will “hang” / “freeze” with a spinner until data is ready, which can easily take 20-30 seconds.
I had the idea of streaming data from the backend to the frontend and let the frontend do the processing of the data instead of the back end. While it would still take 20-30 seconds before the data was 100% visible in the frontend, we could at-least achieve the 5 second NFR since we are able to show the data as it is ready, which would greatly enhance the user experience.
The source-code of the benchmark can be found on my Github: BubblesortBenchmark
For the benchmark, 50.000 elements are generated and bubble-sorted 10 times to avoid measuring “cold start”.
For the C++ benchmark, the clang compiler is used with optimization and without the -fsanitize=address flag. This flag will introduce bounds checking to make the C++ runtime comparable to that of C# and Node. The -O flag is also used to optimize the compiled code. Two bubble sort implementations have been done for C++, one using std:vector and another using arrays.
For the C# benchmark, DotNet Core 3.1 have been used with -configuration=Release flag to produce a optimized binary. Also for C#, two implementations have been one, one using List and another using a C# array.
Now for the results!
Taken directly from the shell:
> clang -O -fsanitize=address main.cpp && a.exe
Creating library a.lib and object a.exp
30156ms to sort 50000 elements 10 times (Vector)
22360ms to sort 50000 elements 10 times (Array)
> clang -O main.cpp && a.exe
9984ms to sort 50000 elements 10 times (Vector)
8859ms to sort 50000 elements 10 times (Array)
> dotnet run --configuration=Release
28766ms to sort 50000 elements 10 times (List)
14687ms to sort 50000 elements 10 times (Array)
> node index.js
29131ms to sort 50000 elements 10 times
And visualized as bar graphs (lower is better):
The results are not that surprising.
C++ without address sanitizer outperforms all other implementations. The reason is simple, the program does not check for out of bounds when accessing the array and will therefore save one or more instructions per array access, which for Bubblesort is O(n^2). std:vector and array are also comparable in performance since very little overhead is introduced with the std:vector abstraction.
C# .NET Core using List is nearly two times slower than C# .NET using arrays. This could be attributed to the fact that an array of Integers in .NET is guaranteed to be a continuous set of integers where a List boxes the Integers into objects introducing additional indirect access.
One surprising aspect is that C# .NET Core array outperforms all C++ implementations when guarded with the address sanitize flag. I believe this is due to the just-in-time compilation nature of .NET Core, since .NET might deduce that no range-checks are needed during the JIT compilation process and thus save several instructions.