Nvidia Blackwell Reigns Supreme in MLPerf Training Benchmark

News Fetcher4 weeks ago

0 22 4 minutes read

Nvidia Blackwell Reigns Supreme in MLPerf Training Benchmark

For those who have a root for the weakened, the most recent Mlperf Standard results will be disappointed: Nvidia’s Graphics processing units It took control of the competition yetonce again. This includes a performance that leads the chart on the last and most demanding standard, which is before that Lama 3.1 403b a large language model. However, computers built around the latest AMD GPU, Mi325X, NVIDIA H200 performance match, Blackwell Ancestor, on the most popular standard llm, precise polishing. This indicates that AMD is one generation behind Nafidia.

Mlperf Training is one of Automated learning The competitions it runs mlcommons Consortium. “The performance of artificial intelligence can sometimes be a kind of wild West. Mlperf seeks to introduce the system to that chaos,” he says. Dave SalvatorThe director of accelerating computing products at NVIDIA. “This is not an easy task.”

Competition consists of six criteria, each of which searches for a different automatic learning task with industry. Standards are content recommendations, pre -training, large language model, and discovering objects for Seeing the machine Applications, images generation and chart node classification For applications like Detection of fraud and Discovery of drugs.

The task of training the big linguistic model is the most intense resource, and this tour has been updated to be more than that. The term “pre -training” is somewhat misleading – it may give the impression that it is followed by a stage called “training”. it’s not. The gradual is the place where most numbers occur, and the following is usually a good refinement, which improves the model for specific tasks.

In previous repetitions, training on the GPT3 model was conducted. This repetition was replaced by Meta’s Llama 3.1 403B, which is more than twice the size of GPT3 and uses a four -time context window. The window of context is the amount of the text of the input that the model can treat once. This is the largest standard of industry for larger models than ever, as well as including some architectural updates.

Blackwell tops the plans, AMD on its tail

For all six criteria, the fastest training time for Black Cwellings in NVIDIA. NVIDIA has provided itself to each standard (other companies have also been offered using various computers built around NVIDIA GPU). SALVATOR of NVIDIA emphasized that this is the first publication process to measure Blackweell on a large scale and that this performance is likely to improve only. “We are still somewhat early in the Blackwele Development Bay.”

This is the first time that AMD has been introduced to the training standard, although in previous years other companies have provided computers that included AMD graphics processing units. In the most popular standard, Llm Tuning, AMD showed that the latest GPU MI325X instinctively displayed with NVIDIA H200S. In addition, the MI325X instinct showed a 30 percent improvement over its predecessor, Mi300x instinct. (The main difference between the two is that Mi325X comes with a 30 percent high cross memory of Mi300X.)

Because it is part, Google It was presented to one standard, the task of meeting the pictures, with it Trillium Tpu.

The importance of networks

Among all the presentations to LLM minute control standards, the system that contains the largest number of graphics processing units by NVIDIA, a computer connecting 512 B200s. On this range, communication between graphics processing units begins to play an important role. Ideally, adding more than GPU will divide time to train in the number of graphics processing units. In fact, it is always less efficient than that, as some time is lost to contact. Reducing this loss is the training key efficiently the largest models.

This becomes more important in the pre -standard, as the smallest introduction of 512 graphics processing units, and the largest 8,192. For this new standard, the scaling of performance with more graphics processing units in particular was close to the linear, achieving 90 percent of the ideal performance.

This NVIDIO NvlinkTo form a system as a “single graphics processing unit,” data sheet Claims. Then multiple NVL72s was connected with Inviniband Network technology.

It is worth noting that the largest presentation of this round of Mlperf – in graphics processing units 8192 – is not the largest ever, although the increasing increasing in the pre -standard. Previous rounds witnessed the presentations with more than 10,000 graphics processing units. Kenneth LeachThe main AI and an automatic learning engineer at Hewlett Packard Enterprise, reduce to improvements in graphics processing units, as well as networks between them. “Previously, we needed 16 server contract [to pretrain LLMs]But today we are able to do this with 4. I think this is one of the reasons why we see a lot of huge systems, because we get a lot of effective scaling. ”

One way to avoid networks related to networks is to put a lot Amnesty International accelerators On the same huge chip, as I did BrainThat recently claimed Defeat Blackwell graphics units from NVIDIA with more than two factor on inferiority tasks. However, this result was measured Artificial analysisAny inquiries from various service providers without controlling how to implement the work burden. So it is not a comparison between apples to what is guaranteed by the MLPERF standard.

Lack of power

The MLPERF standard also includes an energy test, measuring the amount of energy consumed to achieve each training task. This tour, only one introduction – Lenovo – was able to measure energy in its introduction, making it impossible to make comparisons through artists. The energy it took to adjust LLM to the Blackwell 6.11 Gigajoules graphics units, or 1698 kilowatt hours, or almost the energy that it would take to heat a small winter house. With growth Fears About the use of artificial intelligence energy, Energy efficiency Training is crucial, and this author may not be alone in the hope that more companies provide these results in future rounds.

From your site articles