C++ 23 Standard Won’t Have a Key Parallelism Feature
The next version of the C++ standard due out next year will lack a key feature that makes it easier to write code for execution in parallel computing environments.
The C++ 2023 standard will lack an asynchronous algorithm feature called senders and receivers, which would allow the simultaneous execution of code on a system with multiple chips such as CPUs and GPUs.
“The goal there is maybe to try to get it in the working draft next year — the [C++ 26] working concept — so once it’s there, then people will take it much more seriously,” said Nevin Liber, a computer scientist at Argonne National Laboratory’s Advanced Leadership Facility and a C++ committee member, during a breakout session at last month’s . Supercomputing 2022 Conference in Dallas.
Software applications written in C++ are going through fundamental changes, with computers, servers and mobile devices executing code simultaneously on multiple chips. The goal with senders and receivers is to update the standard C++ framework so that programmers find it easier to write applications that take advantage of the new execution environments.
Programmers are increasingly writing code for CPUs and accelerators such as GPUs and AI chips, which are important for faster execution of applications.
“While the C++ Standard Library has a rich set of concurrency primitives … and lower-level building blocks … we lack a Standard vocabulary and framework for asynchrony and parallelism that C++ programmers desperately need,” says a document presenting the proposal plot out
Senders and Receivers
Currently, C++ code must be optimized for specific hardware. But senders and receivers will add an abstraction layer for standard C++ code to run across multiple parallel environments. The goal is to add portability, so that the code works across different installations.
“We certainly have ideas how to connect it with algorithms. My hope would be that we can do this for C++26. You have a good way of connecting these things and also have … algorithms that can do asynchronous work,” said Christian Trott, a principal staff member at Sandia National Laboratory and also a C++ standards committee member.
The asynchronous communication feature is largely powered by Nvidia, whose CUDA parallel programming framework is widely used in machine learning, relying on the concurrency of CPUs and GPUs to reduce training time.
Nvidia has open sourced its libcu++ C++ library. The company also released the CUDA 12.0 parallel programming framework last week, which supports the C++20 standard, and supports host compilers such as GCC 10, Clang 11 and ArmC/C++ 22.x.
Senders/receivers may not make it to C++ 23, but it will make life easier for coders in the future, Stephen Jones, CUDA architect at Nvidia, told The New Stack.
“I feel pretty confident about 2026, but senders/receivers — that’s a big shift in C++. It’s a very new thing for them to try to embrace asynchronous kind of pipelined execution,” Jones said.
Mature technology required
While delaying a key feature might not look good on paper, C++ committee members said it’s better to wait for a technology to mature before adding it as a standard. Computing with accelerators is in its early days, with chip designs, memory and storage requirements constantly changing.
“I think we need to see more accelerators,” said James Reinders, a software engineer at Intel, adding, “I think it needs a little more time to play out.”
Intel offers a tool called SYCLomatic that makes code portable across hardware by removing CUDA code that limits applications to Nvidia GPUs. Reinders said that GPUs won’t be the only accelerators.
Reinders also pointed out a strong debate about whether hooks for technologies such as remote memory are permanently required in standard C++. Some are better than extensions, he said.
“Give it some time to play out and we’ll see if it’s the right thing to put in C++ or if it’s better as an extension, OpenMP has been very strong for a long time. It has never been incorporated into Fortran or C. It is appropriate not to complicate a core language,” said Reinders.