By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
KriptotekaKriptoteka
  • Home
  • News
    • Web3
    • Crypto News
    • Market Analysis
  • Market
    • AI
    • Altcoins
    • Bitcoin
    • Blockchain
    • CEX
    • Defi
    • DePIN
    • DEX
    • ETFs
    • Ethereum
    • Gaming
    • ICO/IDO
    • Institutions
    • L1&L2
    • Meme
    • NFT tech
    • RWA
    • Stable coins
  • Data
  • Events
  • Learn
  • Reports
  • Podcasts
  • Pro membership
Reading: NVIDIA Launches NCCL 2.22: Boosted Memory Efficiency & Speed
Share
Notification Show More
Font ResizerAa
Font ResizerAa
KriptotekaKriptoteka
  • Home
  • News
  • Market
  • Data
  • Events
  • Learn
  • Reports
  • Podcasts
  • Pro membership
  • Home
  • News
    • Web3
    • Crypto News
    • Market Analysis
  • Market
    • AI
    • Altcoins
    • Bitcoin
    • Blockchain
    • CEX
    • Defi
    • DePIN
    • DEX
    • ETFs
    • Ethereum
    • Gaming
    • ICO/IDO
    • Institutions
    • L1&L2
    • Meme
    • NFT tech
    • RWA
    • Stable coins
  • Data
  • Events
  • Learn
  • Reports
  • Podcasts
  • Pro membership
Have an existing account? Sign In
Follow US
  • Advertise
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Kriptoteka > Market > AI > NVIDIA Launches NCCL 2.22: Boosted Memory Efficiency & Speed
AI

NVIDIA Launches NCCL 2.22: Boosted Memory Efficiency & Speed

marcel.mihalic@gmail.com
Last updated: September 21, 2024 3:51 pm
By marcel.mihalic@gmail.com 5 Min Read
Share
SHARE

Caroline Bishop
Sep 21, 2024 13:38

NVIDIA has unveiled NCCL 2.22, emphasizing memory optimization, quicker initialization, and cost estimation to enhance HPC and AI applications.

NVIDIA Launches NCCL 2.22 with Improved Memory Optimization and Speedier Initialization

The NVIDIA Collective Communications Library (NCCL) has launched its newest version, NCCL 2.22, which introduces important upgrades focused on maximizing memory efficiency, speeding up initialization, and providing a cost estimation API. These advancements are vital for high-performance computing (HPC) and artificial intelligence (AI) applications, as outlined in the NVIDIA Technical Blog.

Release Highlights

NVIDIA Magnum IO NCCL is tailored for optimizing inter-GPU and multi-node communications, which are crucial for effective parallel computing. The NCCL 2.22 update boasts the following key features:

  • Lazy Connection Establishment: This feature postpones connection creation until necessary, thus significantly lowering GPU memory consumption.
  • New API for Cost Estimation: A novel API aids in optimizing compute and communication overlap or exploring the NCCL cost model.
  • Enhancements for ncclCommInitRank: Elimination of redundant topology queries, resulting in up to 90% faster initialization for applications with multiple communicators.
  • Support for Multiple Subnets via IB Router: Enables communication for jobs spanning various InfiniBand subnets, facilitating larger deep learning training jobs.

Features Explained

Lazy Connection Establishment

NCCL 2.22 introduces lazy connection establishment, which greatly decreases GPU memory usage by postponing connection creation until absolutely necessary. This feature is especially advantageous for applications with a limited scope, such as those repeatedly executing the same algorithm. While this feature is on by default, it can be turned off by configuring NCCL_RUNTIME_CONNECT=0.

New Cost Model API

The newly introduced API, ncclGroupSimulateEnd, enables developers to predict the time required for certain operations, assisting in optimizing the overlap of compute and communication. Although the estimates may not always reflect actual performance, they serve as a valuable reference for fine-tuning.

Initialization Enhancements

To reduce initialization delays, the NCCL team has rolled out several enhancements, such as lazy connection establishment and intra-node topology fusion. These improvements can hasten ncclCommInitRank execution time by as much as 90%, making it significantly quicker for applications that set up multiple communicators.

New Tuner Plugin Interface

The new tuner plugin interface (version 3) features a per-collective 2D cost table that indicates the estimated time required for operations. This allows external tuners to refine algorithm and protocol combinations for superior performance.

Static Plugin Linking

To enhance convenience and mitigate loading challenges, NCCL 2.22 supports the static linking of network or tuner plugins. Applications can choose this option by adjusting NCCL_NET_PLUGIN or NCCL_TUNER_PLUGIN to STATIC_PLUGIN.

Group Semantics for Abort or Destroy

NCCL 2.22 implements group semantics for ncclCommDestroy and ncclCommAbort, allowing multiple communicators to be simultaneously destroyed. This functionality aims to alleviate deadlocks and enhance user experience.

IB Router Support

This release enables NCCL to function across different InfiniBand subnets, improving communication capabilities for larger networks. The library autonomously identifies and establishes connections between endpoints on various subnets, utilizing FLID for enhanced performance and adaptive routing.

Bug Fixes and Minor Changes

The NCCL 2.22 release also encompasses several bug fixes and minor modifications:

  • Enabled support for the allreduce tree algorithm on DGX Google Cloud.
  • Logged NIC names in IB async errors.
  • Enhanced performance of registered send and receive operations.
  • Incorporated infrastructure code for NVIDIA Trusted Computing Solutions.
  • Isolated traffic class for IB and RoCE control messages to facilitate advanced QoS.
  • Facilitated support for PCI peer-to-peer communications across partitioned Broadcom PCI switches.

Conclusion

The NCCL 2.22 release introduces numerous vital features and optimizations aimed at boosting performance and efficiency for HPC and AI applications. Notable improvements include a new tuner plugin interface, support for static linking of plugins, and refined group semantics to prevent deadlocks.

Image source: Shutterstock


You Might Also Like

Claude.ai Launches Advanced Tool for Enhanced Data Analysis

Litecoin’s 2.6-Year HODL Time Ranks Second Behind Bitcoin

LINK Price Analysis: Can It Breach $12 to Reach New Highs?

Retail Bitcoin Holdings Grow Slowly Amid Market Recovery

Top AI Coins This Week: VIRTUAL, NOS, and DMTR Surge

Share This Article
Facebook Twitter Email Print
Previous Article Analysis of 2009 Bitcoin Wallet Activity Since 2015
Next Article CFTC Imposes $36M Fine on Crypto Fraudster William Ichioka
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
- Advertisement -
Ad image

Latest News

4 Cryptos to Challenge Solana: Potential Growth for Investors
Defi
Bitcoin ETF Inflows Exceed $3B, Demand Reaches 6-Month Peak
ETFs
Japan’s Push for Bitcoin and Ethereum ETFs Gains Momentum
Institutions
Ripple Appeals Court Ruling on XRP’s Institutional Sales
Meme
//

We influence millions of users and is the number one Crypto and Web3 news network on the planet

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
nl Dutchen Englishfr Frenchde Germanel Greekit Italianpt Portugueseru Russianes Spanish
en en
Join Us!

Subscribe to our newsletter and never miss our latest news, podcasts etc..

Zero spam, Unsubscribe at any time.
Welcome Back!

Sign in to your account

Lost your password?