Errors in the implementation are often only seen for specific preconditioners and number of precontitioner iterations. In #817 (closed) ssor with PrecontitionerIterations=1 showed no significant differences compared to the new version with fixed cclocalassembler. Since preconditioning is sensitive to the problem, a test will be limited to good ones.
I know it’s difficult to test exactly the case of #817 (closed). But we should still have a parallel regression test for unstructured grids and preferably one where a lot happens at the processor boundary. The Richards lens test is actually not that good in that aspect.
You could also do more precond iterations. as i recall that went bad for your case.
For a shallow water test we have to wait until the parallel solverfactury is merged in dune-istl and the other work of Markus is available in the dumux-master. I would be glad if I can add a parallel test for the SWEs with UGGrid.
Currently I have a version of DuMux which is based on 99e5972c and 35ca769e. My applications scale now excellent with hundreds of cores (applying SOR, SSOR and GS with 3-5 preconditioning steps).
The performance of my DuMux based application is about 2.4 times faster than my old dune-swf application. Not sure how it is possible to gain such a performance boost, I'm still wondering if I did a fundamental mistake somewhere. Maybe there were lots of improvements in the dune-modules over the last years (dune-swf uses older versions).
I'm very happy to hear this. If the linear solver was the bottleneck also before, obviously Dune should get the credit, otherwise we're happy to take it...