Investigate parallel assembly with Python
Calling Python code from C++ (which is what we do for the problem for example) requires acquiring the GIL because the Python interpreter is serial. We should test whether this nulls all of the speed-up of multithreading. And if yes, think about some other options. There might be options around this like using multithreading from Python already (?)