Enum Matching vs. Type Matching: Benchmarked
Recently, I have come to the realization that the event delivery system we designed for the new PAL2 API for OpenTK was not exactly comfortable to use. The API was designed with an older, more C like approach, where there is an enum with a listing of all the possible event types, where you would switch on the event type enum, and cast the event arguments as the correct type manually.
void OnEventRaised(PalHandle? handle, PlatformEventType type, EventArgs args)
{
switch (type)
{
case PlatformEventType.MouseDown:
{
MouseButtonDownEventArgs down = (MouseButtonDownEventArgs)args;
// Mouse handling code...
}
break;
// ....
}
}
Whilst this approach is pretty conventional, we have to recognize that we are not writing pure C. The cleanest approach, in my opinion, would be to use a tagged union, but, tagged unions are not available as a language feature in C#. The second cleanest approach would be to use the newer type matching features in C#.
void OnEventRaised(PalHandle? handle, PlatformEventType type, EventArgs args)
{
switch (args)
{
case MouseButtonDownEventArgs mouseButtonDown:
// Mouse handling code...
break;
// ....
}
}
Although window events should ideally never be a bottleneck in regular applications, this did raise a question of whichever approach is the fastest. I feel like the results will intrigue you as much as it did me. Whilst the findings might not affect our final decision, it might be important for something else in another field or in another application where performance is a priority.
The Premise
Imagine a scenario where you are receiving a significant number of events into an event queue per second. Unfortunately for our poor program, it is very important that these events be handled as quickly as possible. Each event has something to identify its type, and must be processed by different code paths.
For our premise, I will make each unique branch actually do the same work This keeps the benchmark code cleaner and is one less variable to account for.
Different Approaches
1. Matching with an enum with a concrete member defined in the base type
If there happen to be any seasoned C/C++ developers reading this, this is likely
the approach you are most familiar with. Since these languages don’t have any
way to identify types at runtime1, most codebases which require this behavior
have created their own tagged union type using union and struct, or perhaps
using C++ inheritance with class.
C++.
enum MyEventType { Foo, Bar, Baz }
// Fancy primary constructor to keep the psuedocode short and concise.
class MyEvent(MyEventType type) : EventArgs
{
public MyEventType Type { get; } = type;
}
// Later
switch (args.Type)
{
case MyEventType.Foo: // ...
}
2. Matching with an enum with a abstract/virtual member defined in the base type
A more object oriented programming approach might be to introduce a virtual member instead of passing the value in a constructor. The reasoning is simple: what if we want to delegate the exact type of the event to a class in the inheritance hierarchy without using a constructor every step of the way?
abstract class MyEvent : EventArgs
{
public abstract MyEventType Type { get; }
}
3. Type Matching
The final approach would be to use the newest language features with type matching in switch case statements. This is probably the prettiest solution out of the bunch, and it is less prone to human error because there is no secondary language construct involved.
class FooEventArgs : EventArgs {}
class BarEventArgs : EventArgs {}
class BazEventArgs : EventArgs {}
// Later
switch (args)
{
case FooEventArgs foo: // ...
}
Methodology
In order to test the speed of the different approaches I wrote a BenchmarkDotnet test case which I can run on my own computer as well as share to other people for them to test. Publishing the findings on Discord and refining with multiple people on the server we have come up with the following method:
- Create a base enum to use for the enum based approaches.
- Create a base type to derive all the unique event types.
- Define one field with a concrete member.
- Define another field with an abstract member of the same enum. This virtual member is created in order to factor in the cost of a virtual function call within a tight loop.
- Create a set of non-sealed unique classes which represent each event type.
- For each unique class create a new member which can be used to simulate a workload. In our testing we used a float value to sum.
NOTE
I have chosen to make the workload be a member of the child class to prevent the C# and JIT compiler from doing any sort of optimization feasibly. If it were a member of the parent class, the compiler could just chose to not do any type resolution at all.
- Create a
sealedvariant of each unique class to test against. - For each test case:
- Generate a large number of the events randomly, cache into a list. The generated events are of the sealed kind, so the same objects may be use for all tests.
- The events are dispatched through delegate invocation in a loop to simulate a more realistic code base.
- The event handler is expected to sum all of the numbers defined in the events.
- The test case exits with the final result printed out to standard output.
With our test case, we have chosen to use 16 unique events with approximately 32 million instances. However, feel free to extend our example to more event types and share the results with us on your own time.
The source used for the benchmarks can be found here: https://github.com/utkumaden/OpenTK.EventDeliveryTest.
Testing and Results
BenchmarkDotNet v0.15.8, Linux Fedora Linux 43 (Workstation Edition) AMD Ryzen 7 8700G w/ Radeon 780M Graphics 2.91GHz, 1 CPU, 16 logical and 8 physical cores [Host] : .NET 10.0.1 (10.0.1, 10.0.125.57005), X64 RyuJIT x86-64-v4
Method Mean Error StdDev Median Ratio RatioSD Allocated Alloc Ratio MatchOnEnum 258.5 ms 1.86 ms 1.74 ms 258.1 ms 1.03 0.03 168 B 1.17 MatchOnEnumVirtual 457.9 ms 1.76 ms 1.56 ms 458.1 ms 1.82 0.05 144 B 1.00 MatchOnType 654.4 ms 3.94 ms 3.68 ms 653.7 ms 2.60 0.07 144 B 1.00 MatchOnEnumSealed 252.2 ms 4.99 ms 6.67 ms 247.9 ms 1.00 0.04 144 B 1.00 MatchOnEnumVirtualSealed 454.4 ms 2.05 ms 1.92 ms 453.6 ms 1.80 0.05 144 B 1.00 MatchOnTypeSealed 266.0 ms 1.24 ms 1.16 ms 265.4 ms 1.06 0.03 168 B 1.17 Mean : Arithmetic mean of all measurements Error : Half of 99.9% confidence interval StdDev : Standard deviation of all measurements Median : Value separating the higher half of all measurements (50th percentile) Ratio : Mean of the ratio distribution ([Current]/[Baseline]) RatioSD : Standard deviation of the ratio distribution ([Current]/[Baseline]) Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B) Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline]) 1 ms : 1 Millisecond (0.001 sec)
As you can see in the results, using sealed on the types makes a huge
difference in the performance. Making the type sealed enables the compiler to
generate code which is much more efficient at matching the type. It is also
important to recognize that it is not always possible or beneficial to make
types sealed, especially in a toolkit like OpenTK. It might be valuable to have
a platform specific event structure which may have details specific to each
driver.
Although the non-sealed type matching syntax is somewhat slower, matching on types is significantly more advantageous to the human writing the code. If you can guarantee that the types will be sealed, there is no need to mess around with writing an enum as it has very diminishing returns for applications without significant performance considerations.
The results are way clearer when you consider the test cases where a virtual member is used. The runtime has to either resolve the actual method to run, or resolve the type and then execute the concrete method, if it even is concrete to begin with. The key factor to keep in mind when writing any code is how can you help the runtime eliminate which code paths are not possible in your code base, and an enum is an excellent model for that. Although your code base may never run into certain cases, the runtime does not see the code as you mentally model it. It only has a limited view into your code to inspect and a list of optimizations it can do safely when certain conditions are satisfied.
A very close runner up, the type itself happens to be the second best option. The current runtime implementation assigns each type a unqique type ID, which is used to match types. The type IDs, being integers, are incredibly fast to compare. If the type being matched is sealed, the code only has to compare the ID of the type itself as it cannot have any inheritors. Otherwise, it will have to find all decendent types and compare their hash codes as well.
Performance for .NET 8.0 and likely older versions.
When I started writing this blogpost, I tested in .NET 8.0. The .NET team works very hard to improve the performance of the generated code through the releases. Interestingly, matching on enums with non-sealed types is signigicantly slower than type matching on sealed types in the older version.
Method Mean Error StdDev Ratio Allocated Alloc Ratio MatchOnEnum 290.6 ms 0.62 ms 0.49 ms 1.22 168 B 1.17 MatchOnEnumVirtual 483.9 ms 3.78 ms 3.54 ms 2.03 144 B 1.00 MatchOnType 677.3 ms 1.84 ms 1.72 ms 2.85 144 B 1.00 MatchOnEnumSealed 238.0 ms 0.51 ms 0.47 ms 1.00 144 B 1.00 MatchOnEnumVirtualSealed 447.5 ms 0.28 ms 0.23 ms 1.88 144 B 1.00 MatchOnTypeSealed 273.5 ms 1.09 ms 1.02 ms 1.15 168 B 1.17
Conclusions
Comparing the results, when it comes to absolute raw performance, the clear
winner is still the good old fashioned enum based matching method. When you
factor in sealed types, the type matching switch expression which has many
syntactic advantages and is easier to maintain overall becomes a very
competitive option. The biggest takeaway here is that you should avoid virtual
methods in performance critical code, which I believe is well known amongst the
performance concious developers in the field. Please note that benchmarks like
this should not be used to make architectural decisions unless it is absolutely
necessary to improve the performance of a bottleneck.
It is also important to highlight how little the performance of the event matching matters in the use case of OpenTK. In the constraint of a 16ms frame time budget (for 60fps), you can handle hundreds of thousands of events, meanwhile realistically you will recieve less 20 to 30 events at most under most circumstances. This experiement was more to satisfy my curiousity and I find the results surprisingly close between some test cases.
I highly suggest reading the source code, as well as the assembly output of the benchmark code in SharpLab.io for further insights into this test, the results and more information. Feel free to try and modify the code at home as well as discuss further improvements in the issue attached to this article.
Further Reading
-
assuming you don’t use RTTI constructs like
typeid()available in modern ↩
Comments