A team led by Paderborn scientists Professor Thomas D. Kühne and Professor Christian Plessl has succeeded in becoming the first group in the world to break the major “exaflop” barrier – more than a trillion floating-point operations per second – for a computational science application. With this accomplishment, they have set a new world record.

The two professors – Plessl is a computer scientist, Kühne a chemist – overcame the exaflop challenge during a simulation of the SARS-CoV-2 spike protein in a real-world scientific computing application. They made the breakthrough with the aid of the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC) in the USA. Perlmutter is currently the fifth fastest computer in the world. The basis was a new simulation method that Plessl and Kühne developed in recent years and integrated into the CP2K open-source quantum chemistry software program.

**Race for the fastest computer in the world**

The background: In the world of high-performance computing, the number of floating-point arithmetic operations performed per second with double (64-bit) precision is a benchmark for the performance of a supercomputer. In 1984, the mark of one billion computing operations per second was reached for the first time – a figure that is surpassed today by every smartphone. “The growing importance of high-performance computing for science has given rise to an increasingly competitive technological race for the fastest computer. Since then, an updated ranking of the 500 fastest supercomputers worldwide has been published every six months,” explains Plessl, Chairman and Managing Director of the Paderborn Center for Parallel Computing (PC2), which operates the Noctua supercomputer at the university. Additional milestones were reached in 1997 (one trillion operations per second or one teraflop) and 2008 (one quadrillion operations or one petaflop). The race for the next symbolic mark of one exaflop immediately heated up. Plessl: “The dimension of this number becomes clearer when you consider that the universe is about 10^{18} seconds old. That means that if a human had performed a calculation every second since the time of the Big Bang, an exascale computer could do the same work in a single second.”

As technology leadership in the digitalization of science increasingly became an issue of international competitiveness, the race for the first exascale computer has now evolved into a global contest, also called the “space race of the 21st century.” As Pessl explains, “We are currently on the cusp of the exascale era. It is widely expected that the first supercomputer to break the exascale threshold for 64-bit floating-point arithmetic operations will be publicly announced at the International Conference on High Performance Computing, the ISC, taking place Hamburg in late May.”

**New method for massively parallel quantum chemistry simulation**

As the standard evaluation to determine a supercomputer’s speed for the TOP500 list, a program is used that calculates the solution time for a very large system of equations. Plessl expounds on this: “Due to the excellent parallelization properties of the program, supercomputers can use a very high proportion of the theoretically maximum available computing power. One criticism of this measurement method is that the computing power that can be used for practical real-world scientific applications is often only a small fraction of the maximum computing capacity. This is because the distribution of computational tasks, transfer of data, and coordinating the execution on hundreds of thousands of computing elements usually involves major administrative time and expense.” The development of optimized simulation methods and algorithms for more efficiently harnessing the computing power of massively parallel supercomputers is therefore a key research topic in the field of computational science. Plessl and Kühne and their team have taken on this challenge. In the context of using exascale computers for the simulation of chemical systems, they presented the “Submatrix Method for the Approximate Calculation of Matrix Functions” in 2020, a new method that is ideally tailored to the requirements of exascale supercomputers. The core of the method is an approach in which many independent calculations are performed on small dense matrices. “It is precisely these kinds of operations that can be executed with very high computing power and energy efficiency on extremely powerful supercomputers equipped with GPU acceleration hardware,” adds Kühne.

**Record-size simulation on JUWELS Booster supercomputer**

In 2021, the Paderborn scientists already performed simulations of the HI-virus with up to 102 million atoms on the “JUWELS Booster” at the Jülich Supercomputing Centre, which back then was the fastest supercomputer in Europe (now ranked 8th place worldwide), thereby setting a record for the largest electron structure-based ab initio molecular dynamics simulation. This simulation achieved a computing performance of 324 petaflops in mixed-precision floating-point arithmetic and an efficiency of 67.7 percent of the theoretically available computing power, a remarkable figure for this application domain. Since the record-setting simulation in Jülich, the method has been continuously optimized to increase the efficiency of using the GPU hardware accelerators. To test the exascale capability of the method in practice, the team was able to obtain access to the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC) in the USA. The computer has sufficient resources to smash the exascale barrier when using mixed 32-/16-bit precision instead of 64-bit precision for the computing. The method can thus be classified in the context of approximate computing, which – in simplified terms – works with approximate instead of exact values.

“Then in April, when conducting a simulation of the SARS-CoV-2 spike protein using 4,400 GPU accelerators, we broke the exaflop threshold and reached 1.1 exaflops in mixed-precision arithmetic in the computing time-critical part of the application,” Plessl says. He goes on to explain that “As a frame of reference, a single simulation step takes 42 seconds for 83 million atoms, meaning that approximately 47 x 10^{18} floating-point operations are performed in the process. Not accounting for memory requirements, such a calculation would have taken about 13 hours with the first petaflops system, the Roadrunner supercomputer back in 2008, and about 1.5 years with the first teraflops system, the ASCI Red used in 1997.”

The Paderborn scientists are already busy working on their next coup: “The gold standard for atomistic simulations in chemistry and solid-state physics is the density functional theory method. We are very confident that we will succeed in applying the submatrix method in this area as well,” affirms Kühne.