Microsoft has an excellent library for parallel programming in .NET. However, there’s a strange problem with nested parallel loops. Consider this sample code:
void MultiplyMatrices(int size, double[,] m1, double[,] m2, double[,] result) { Parallel.For(0, size, i => { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } }); }
Why aren’t all the loops turned into parallel loops? The documentation says that you should not convert the inner loops into parallel loops because it will degrade performance. This makes sense because there’s a significant cost to run the loop body through the scheduler, particularly in this example where it’s doing a cheap multiply. But what happens if the work in the inner loop is expensive? Then you would want to run the inner loops in parallel, too. So it really depends on how expensive the work is relative to the cost of using Parallel.For.
I would like to write a library of matrix operations that I can reuse in my code. Therefore, this MatrixMultiply method should take a delegate parameter and use that in the loop body. Now I won’t know whether or not I should use parallel loops. Or, what if I compose several matrix operations, each of which uses some parallel loops? Performance may actually degrade because I’m using too much parallelism.
The problem with the parallel library is that the operations aren’t composable. To be fair, I would have shipped the same parallel library because I don’t know how to solve this problem either. One crude idea is to limit parallelism to a small number based on the number of available CPUs. You maintain the nesting level of these parallel loops in the environment somehow (like nested transactions). At some point, it’s not worth doing things in parallel anymore, so convert the parallel loops into normal, serial loops. I assume the Fortran folks have already figured out some solution. I’ll post it when I find it.