Skip to content
Snippets Groups Projects
Commit 824eb40b authored by Timo Koch's avatar Timo Koch
Browse files

[handbook] Update parallel

parent 5cb67f60
No related branches found
No related tags found
Loading
\section{Parallel Computation}
\label{sec:parallelcomputation}
Multicore processors are standard nowadays and parallel programming is the key to gain
performance from modern computers. This section explains how \Dumux can be used
on multicore systems, ranging from the users desktop computer to high performance
computing clusters.
This section explains how \Dumux can be used
on multicore / multinode systems.
There are different concepts and methods for parallel programming, which are
often grouped in \textit{shared-memory} and \textit{distributed-memory}
approaches. The parallelization in \Dumux is based on the
\textit{Message Passing Interface} (MPI), which is usually called MPI parallelization (distributed-memory approach).
It is the MPI parallelization that allows the user to run
\Dumux applications in parallel on a desktop computer, the users laptop or
large high performance clusters. However, the chosen \Dumux
model must support parallel computations.
This is the case for most \Dumux applications, except for multidomain and
freeflow.
The main idea behind the MPI parallelization is the concept of \textit{domain
decomposition}. For parallel simulations, the computational domain is split into
subdomains and one process (\textit{rank}) is used to solve the local problem of each
subdomain. During the global solution process, some data exchange between the
ranks/subdomains is needed. MPI is used to send data to other ranks and to receive
data from other ranks.
Most grid managers contain own domain decomposition methods to split the
computational domain into subdomains. Some grid managers also support external
often grouped in \textit{shared-memory} and \textit{distributed-memory}
approaches. The parallelization in \Dumux is based on the model supported by Dune which is currently based on
\textit{Message Passing Interface} (MPI) (distributed-memory approach).
The main idea behind the MPI parallelization is the concept of \textit{domain
decomposition}. For parallel simulations, the computational domain is split into
subdomains and one process (\textit{rank}) is used to solve the local problem of each
subdomain. During the global solution process, some data exchange between the
ranks/subdomains is needed. MPI is used to send data to other ranks and to receive
data from other ranks. The domain decomposition in Dune is handled by the grid managers.
The grid is partitioned and distributed on several nodes. Most grid managers contain own domain decomposition methods to split the
computational domain into subdomains. Some grid managers also support external
tools like METIS, ParMETIS, PTScotch or ZOLTAN for partitioning.
On the other hand linear algebra types such as matrices and vectors
do not know that they are in a parallel environment. Communication is then handled by the components of the
parallel solvers. Currently, the only parallel solver backend is \texttt{Dumux::AMGBackend}, a parallel AMG-preconditioned
BiCGSTAB solver.
Before \Dumux can be started in parallel, an
MPI library (e.g. OpenMPI, MPICH or IntelMPI)
must be installed on the system and all \Dune modules and \Dumux must be recompiled.
In order for \Dumux simulation to run in parallel, an
MPI library (e.g. OpenMPI, MPICH or IntelMPI) implementation
must be installed on the system.
\subsection{Prepare a Parallel Application}
Not all parts of \Dumux can be used in parallel. One example are the linear solvers
of the sequential backend. However, with the AMG backend \Dumux provides
a parallel solver backend based on Algebraic Multi Grid (AMG) that can be used in
parallel.
If an application uses not already the AMG backend, the
user must switch the backend to AMG to run the application also in parallel.
First, the header file for the parallel AMG backend must be included.
Not all parts of \Dumux can be used in parallel. In order to switch to the parallel \texttt{Dumux::AMGBackend}
solver backend include the respective header
\begin{lstlisting}[style=DumuxCode]
#include <dumux/linear/amgbackend.hh>
\end{lstlisting}
so that the backend can be used. The header file of the sequential backend
Second, the linear solver must be switched to the AMG backend
\begin{lstlisting}[style=DumuxCode]
#include <dumux/linear/seqsolverbackend.hh>
using LinearSolver = Dumux::AMGBackend<TypeTag>;
\end{lstlisting}
can be removed.
Second, the linear solver must be switched to the AMG backend
and the application must be recompiled. The parallel \texttt{Dumux::AMGBackend} instance has to be
constructed with a \texttt{Dune::GridView} object and a mapper, in order to construct the
parallel index set needed for communication.
\begin{lstlisting}[style=DumuxCode]
using LinearSolver = Dumux::AMGBackend<TypeTag>;
auto linearSolver = std::make_shared<LinearSolver>(leafGridView, fvGridGeometry->dofMapper());
\end{lstlisting}
and the application must be compiled.
\subsection{Run a Parallel Application}
The starting procedure for parallel simulations depends on the chosen MPI library.
The starting procedure for parallel simulations depends on the chosen MPI library.
Most MPI implementations use the \textbf{mpirun} command
\begin{lstlisting}[style=Bash]
mpirun -np <n_cores> <executable_name>
\end{lstlisting}
where \textit{-np} sets the number of cores (\texttt{n\_cores}) that should be used for the
computation. On a cluster you usually have to use a queueing system (e.g. slurm) to
submit a job.
where \textit{-np} sets the number of cores (\texttt{n\_cores}) that should be used for the
computation. On a cluster you usually have to use a queuing system (e.g. slurm) to
submit a job. Check with your cluster administrator how to run parallel applications on the cluster.
\subsection{Handling Parallel Results}
For most models, the results should not differ between parallel and serial
runs. However, parallel computations are not naturally deterministic.
A typical case where one can not assume a deterministic behavior are models where
small differences in the solution can cause large differences in the results
(e.g. for some turbulent flow problems). Nevertheless, it is useful to expect that
the simulation results do not depend on the number of cores. Therefore you should double check
the model, if it is really not deterministic. Typical reasons for a wrong non-deterministic
behavior are errors in the parallel computation of boundary conditions or missing/reduced
data exchange in higher order gradient approximations. Also, you should keep in mind that
for iterative solvers differences in the solution can occur due to the error threshold.
For serial computations, \Dumux produces single vtu-files as default output format.
During a simulation, one vtu-file is written for every output step.
In the parallel case, one vtu-file for each step and processor is created.
For parallel computations, an additional variable "process rank" is written
into the file. The process rank allows the user to inspect the subdomains
after the computation.
\subsection{MPI scaling}
For parallel computations, the number of cores must be chosen
carefully. Using too many cores will not always lead to more performance, but
can lead to inefficiency. One reason is that for small subdomains, the
communication between the subdomains becomes the limiting factor for parallel computations.
The user should test the MPI scaling (relation between the number of cores and the computation time)
for each specific application to ensure a fast and efficient use of the given resources.
For serial computations, \Dumux produces single vtu-files as default output format.
During a simulation, one vtu-file is written for every output step.
In the parallel case, one vtu-file for each step and processor is created.
For parallel computations, an additional variable \texttt{"process rank"} is written
into the file. The process rank allows the user to inspect the subdomains
after the computation. The parallel vtu-files are combined in a single pvd file
like in sequential simulations that can be opened with e.g. ParaView.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment