### Zhukova M.V. Калинкин А.А.

## Cluster kernels based on Intel(R) MKL Poisson Solver

### Reporter: Zhukova M.V.

**Cluster kernels based on Intel® MKL Poisson Solver**

*M. Zhukova, A. Kalinkin*

The Helmholtz equation plays an important role in mathematical models of various physical processes. Intel® Math Kernel Library (Intel® MKL) Poisson solver was developed for 2D and 3D Helmholtz problems in Cartesian and spherical coordinates with three kinds of boundary conditions. One of the disadvantages of the existing solver is that the scalability of the solution time is limited by the number of OpenMP threads used. Another issue is related to the amount of memory required. Finally, the solver was designed only for shared-memory architectures which can be a significant constraint for large-scale applications.

To address these difficulties, an extension of the Poisson solver was implemented using a hybrid of MPI and OpenMP parallelization. This paper is a logical continuation of the previous works [1], [2].

In this paper two approaches are proposed for parallelization. Both of them are based on 1D Fourier transforms combined with a tridiagonal matrix algorithm similar to the existing Poisson solver implementation. 2D decomposition is used to distribute the data among MPI processes. As a result, it is necessary to perform data exchanges (using the MPI_Alltoall command) between steps of the algorithm. The resulting parallel implementation is the first approach. The second approach is proposed to reduce data transfer overhead before and after the tridiagonal solve step. The reduction is achieved using a modification of the distributed tridiagonal matrix algorithm [3].

The performance results obtained are demonstrated for both implementations and compared to the performance results for the existing solver. A chart demonstrating speed-up vs. the existing Intel MKL Poisson solver is presented for the proposed parallel implementations. Scalability up to 64 MPI processes of the parallel algorithms is tested on the Helmholtz problem for meshes with 1024^3 and 2048^3 points. The results obtained show almost linear scalability and significant improvement over existing Intel MKL Poisson solver. Based on the results obtained, we conclude that the proposed hybrid implementations can solve large-scale problems efficiently in terms of performance and memory usage benefiting from modern multicore and manycore architectures.

REFERENCES:

1. A. A. Kalinkin, Y.M. Laevsky, S.V. Gololobov, 2D Fast Poisson Solver for High-Performance Computing,

In Proceedings of the 10th International Conference on Parallel Computing Technologies (PaCT '09), Victor Malyshkin (Ed.), Springer-Verlag, Berlin, Heidelberg, 112-120.

2. A. Kalinkin, A. Kuzmin, Intel® MKL Poisson Library for scalable and efficient solution of elliptic problems with separable variables,

Parallel Computation Technologies (PCT '12), ISBN: 978-5-696-04237-4, 336-341.

3. A. N. Konovalov, A. N. Bugrov, V. V. Elinov, Algorithm of Parallel Solution for Grid Problems,

Modern Problems of Computational and Applied Mathematics, Novosibirsk (1979) (in Russian).

To reports list