There doesn’t appear to be a high performance concurrent counter available in the .NET framework. I timed various simple implementations on my Core 2 Duo laptop:
- A single thread can increment a static field 1.5 billion times in 4 seconds.
- Two threads that update a static field runs slightly slower than 4 seconds probably due to the locking overhead.
- Two threads that use Interlocked.Increment to update a shared field takes 52 seconds!! Why is it so slow?
- Two threads that update a ThreadStatic field takes 2.5 seconds. Of course this isn’t a global counter, but it sets a baseline for performance.
- Two threads that update separate fields takes 3.7 seconds. Why? Probably because of cache contention.
- Two threads that update separate fields on different cache lines takes 2 seconds. A cache line is 64 bytes, or 8 long fields. So I added 7 long fields to force the 2 counter fields on different cache lines.
- Same as #6, but use an array of counters rather than fields. Now takes 8 seconds. Is this because of bounds checking?
- Same as #6, but use CompareExchange (CAS) to update fields, takes 22 seconds. There’s no contention and it always succeeds. Why is this operation so insanely expensive? It’s supposed to be a single machine op on x86.
Notice how small implementation differences result in giant performance gaps. Updating separate cache lines is important for speed, but I still need to sum the values to return the current count. Cliff Click has a clever implementation in Java here. However, it still relies on CAS to perform updates which will, on .NET at least, slow down performance by 10X. This post from Joe Duffy confirms that CAS operations are expensive. I suppose if these designs scale to a large number of threads then it’s worthwhile. But if it sucks for a few threads then it’s unusable for most programmers.