Event summary: Exascale computers

DOI: https://www.doi.org/10.53289/XQUR8281

How can we push the boundaries of exascale computers in the UK?

The UK has provided a succession of increasingly powerful high-performance computing facilities for UK researchers for many years. Do recent developments in AI change what researchers might need in terms of high-performance computing? What are the implications of any future provision of exascale computing of expected developments in quantum computing? How can the environmental impact of exascale computing be reduced? On Thursday 29th May, the Foundation held a discussion event in collaboration with the University of Edinburgh, which is the home of UK national high performance computing research systems such as ARCHER2, Cirrus and DiRAC-Tursa.

Speakers included:

  • The Rt Hon the Lord Willetts FRS, Chair of The Foundation for Science and Technology [Chair]
  • Professor Mark Wilkinson, Professor of Theoretical Astrophysics and Director of the DiRAC High Performance Computing Facility at the University of Leicester
  • Professor Mark Parsons, EPCC Director and Dean of Research Computing, College of Science & Engineering at the University of Edinburgh
  • Professor Katherine Royse, Director at Hartree Centre, STFC.

Professor Mark Wilkinson spoke first, laying a case for putting people front and centre.

He began by giving some thoughts on how things had changed with regards to large-scale computers. Large-scale computing has become a crucial requirement that underpins research across all fields—be it scientific research, industry, academia, or government. We should view these large-scale computers as essential research instruments rather than just laptops; they are integral to the research process. One key challenge is that research continually evolves. As research questions change, we need to refine our tools—both hardware and software. There are several reasons why research questions lead to changes in computational requirements. For instance, researchers might need increased resolution to conduct more accurate calculations or incorporate new processes into their studies. In fields like social sciences, there might be a need to consider different types of actors, while in fields like meteorology, researchers may need to merge small-scale effects, like ocean temperatures, with larger-scale phenomena. Another vital aspect is the quantification of uncertainty.

Professor Wilkinson said that ultimately, researchers seek actionable outcomes—information that can be used to inform decisions. This requires an understanding of the associated uncertainties, which often calls for more computational resources. Regardless of efficiency, larger and more complex questions consistently demand greater computational power. He presented a case in point which was the Tersa GPU-based system, part of the Dirac facility based in Edinburgh, which had been designed for particle physics theory research. When large language models became prominent, this system was perfectly suited for training them. This flexibility highlights the importance of having a diverse ecosystem of services that can evolve with research needs, and the first large language model trained entirely on UK-based computer resources was developed on the Tersa system.

So, what are the components of a productive ecosystem? Firstly, people are crucial. It takes talented individuals to design, build, and run systems, as well as to write software and conduct research. The significance of the human element is often overlooked, yet it is vital.

The second speaker, Professor Katherine Royce, focused on the adoption of exascale technologies by the SME community. 

According to Professor Royce, the final column (Accelerated Discovery) in the diagram above is where research is at, regardless of the discipline. This is where new discoveries will be made. The combination of big data and high-performance computing is generating new paradigms, much more quickly than in previous generations.

The IBM diagram above is what the future of computing looks like- according to Professor Royce. It is going to be about qubits (the quantum mechanical analogue of a quantum bit), neurons, and neuromorphics (computing inspired by the function and structure of the human brain). She predicts that the convergence of high-performance computing, AI and quantum computing is where the next big discoveries will be. For SMEs, a variety of tools will be essential to achieve early adoption, so composable infrastructures (where compute, storage, and networking resources are abstracted from their physical locations and managed by software through a web-based interface) are going to become increasingly important to run things efficiently.

Being smaller, the challenges of adoption for SMEs can come from funding, skills and knowledge, and not having big AI or data teams, meaning that they are going to be reliant on collaboration with others - whether that is with the big hyperscales or with other providers of HPC AI algorithms. We must understand that there is a community out there with training needs. We also must understand the ethical and regulatory considerations. People often approach the Hartree Centre with ideas for which they want to use quantum computing, but quantum computing is not always the answer to every problem. For example, data might not be in a usable format, or there might not be a suitable data policy or governance in place - which could lead to questions around data bias and its impact on the decisions.

It could be that machine learning or AI would be more useful than quantum. SMEs need to have a breadth of understanding across emerging digital technologies, so that they can piece them together to resolve challenges.

Professor Mark Parsons, the final speaker, began by passing around the audience a GPU typically found inside ‘Frontier’ - the world's first exascale computer.

The UK has a good surfeit of plans around supercomputing such as the Future of Compute review linked to the Science and Technology Framework. Also the AI opportunities Action Plan, and the government’s response.

Professor Parsons explained that long timelines for investment decisions has presented problems. In his view, the UK spends far too long trying just to get to the point where somebody provides funding, and if we can reduce that, we could really speed up the value of supercomputing in the UK. There are some good examples of complex systems built with Chinese technology. The Sunway system has 41 million cores. On average, we have four cores in a laptop. The world's first exascale supercomputer (publicly) was the Frontier system and Aurora followed with El Capitan.

The chip that was being passed around the audience was funded by a programme called Fast Forward (about 12 years before Frontier came into being), and that was funded in part by the US government. He explained that the US government realised early on that science can push the boundaries of computing forward, but it was also lucky to have companies able to commercialise and make significant amounts of money for the American economy - and tax receipts for the government. Giving some European examples, including The Jupiter system from Germany (expected June 2025), he suggested that the UK were not playing the right game yet.

What is next for UK national supercomputers? We have now got the Archer2 extension continued until November 26. Professor Parsons noted that the UK is not going to have an exascale system. Instead, we are going to have the next National Supercomputer Service, because we are in the ‘post exascale age’. He said that the UK should aim for two ‘exaflops’ - a measure of performance for a supercomputer that can calculate at least one quintillion floating-point operations per second. He said that we should also be planning for 2035 and looking at the next system, beyond exascale.

With reference to AI and Quantum computing, Professor Parsons said that he viewed AI as an application of supercomputing for many years. He does not think playing exascale or numerical supercomputing off against AI makes sense and said that it is the same thing. As a professor based at Edinburgh, he is around quantum-based work a lot, particularly through collaborations withthe National Quantum Computing Centre and the Quantum Software Lab. We should explore how we programme these devices, what sort of algorithms they could solve and what languages are needed to programme them with. Supercomputers could be used to simulate quantum computers, allowing us to start solving some of these challenges in parallel with developing the quantum computers themselves. Professor Parsons noted that quantum simulation was happening regularly and that supercomputing was going to be critical to delivering the commercial application of quantum computing over the next five to 10 years.

With regards to sustainability, Professor Parsons said that Archer2 is operationally Net Zero already, and that there were plans to reuse heat created by exascale computing by heating the water under data centres. The heat would flow towards built up areas in Edinburgh, and heat would be taken back out with heat pumps.

Overall, he concluded that we should be pushing the boundaries of exascale, AI and quantum and competing again as a country on the world stage.

To listen to the Q&A and debate that followed the presentations, view the full event recording here.