Microsoft Bing Visual Search, a platform that allows users globally to search with images, has undergone significant enhancements through a partnership with NVIDIA, culminating in impressive performance improvements. As reported in the NVIDIA Technical Blog, the integration of NVIDIA’s TensorRT, CV-CUDA, and nvImageCodec into Bing’s TuringMM visual embedding model resulted in a 5.13x throughput enhancement for offline indexing processes, thereby decreasing energy use and costs.
Multimodal AI and Visual Search
Multimodal AI technologies, such as Microsoft’s TuringMM, play a crucial role in applications that necessitate fluid interactions among various data types like text and images. A favored model for integrated image-text understanding is CLIP, which employs a dual encoder framework to analyze millions of image-caption pairings. These sophisticated models are vital for applications including text-based visual searches, zero-shot image classification, and image caption creation.
Optimization Efforts
The enhancement of Bing’s visual embedding pipeline was accomplished by harnessing NVIDIA’s GPU acceleration technologies. The optimization concentrated on improving the TuringMM pipeline’s performance by implementing NVIDIA’s TensorRT for model execution, optimizing the efficiency of computation-intensive layers within transformer architectures. Moreover, the incorporation of nvImageCodec and CV-CUDA expedited the image decoding and preprocessing phases, leading to a substantial decrease in latency for image processing tasks.
Implementation and Results
Before the optimization, Bing’s visual embedding model functioned on a GPU server cluster designed to manage inference tasks across various deep learning services within Microsoft. The initial setup, which utilized ONNXRuntime in conjunction with CUDA Execution Provider, encountered limitations due to image decoding tasks processed by OpenCV. The integration of NVIDIA’s libraries elevated the pipeline’s throughput from 88 queries per second (QPS) to 452 QPS, thereby achieving a speedup of 5.14x.
These improvements enhanced not only the speed of processing but also alleviated the computational burden on CPUs by delegating tasks to GPUs, thus optimizing energy efficiency. The NVIDIA TensorRT was the primary contributor to the performance enhancements, while the nvImageCodec and CV-CUDA libraries provided an extra 27% boost.
Conclusion
The effective optimization of Microsoft Bing Visual Search underscores the immense potential of NVIDIA’s accelerated libraries in refining AI-driven applications. This collaboration exemplifies how GPU resources can be leveraged to expedite deep learning and image processing tasks, even within existing systems that already utilize GPU acceleration. These advancements set the stage for more efficient and agile visual search functionalities, benefiting both users and service providers.
For deeper insights into the optimization journey, please visit the original NVIDIA Technical Blog.
Image source: Shutterstock