NVIDIA has announced a new partnership with Amazon Web Services (AWS) to enhance its product offerings for generative AI applications.
In this latest collaboration, AWS becomes the very first cloud service provider to feature Team Green’s GH200 Grace Hopper Superchips with multi-node NVLink. This integration utilizes the AWS 3rd-Gen Elastic Fabric Adapter (EFA) interconnect, delivering up to 400Gbps per Superchip of low latency and high bandwidth networking throughput. The EC2 UltraClusters also boast superior scalability.
Specifically, the EC2 instances will now include 4.5 TB of HBM3e memory, a significant increase compared to the H100-powered EC2 P5d instances, offering 7.2x more capacity. The CPU-to-GPU memory interconnect provides up to 7x higher bandwidth than PCIe. Liquid cooling solutions will be deployed for maximum efficiency and rack space utilization. Additionally, AWS Nitro System aids EC2 instances by offloading I/O functions to dedicated hardware, ensuring consistent performance and a secure execution environment.
Meanwhile, NVIDIA DGX Cloud, integrated with NVIDIA AI Enterprise, will join AWS, providing seamless access to LLM and generative AI model training.
The collaboration also brings improvements to other AWS instances as well like the following:
- AWS P5e: H200 GPU with 141GB HBM3e memory (1.8x more capacity, 1.4x faster, up to 3200Gbps of EFA networking).
- AWS EC2 G6e: L40S GPU (optimized for video and graphics workloads, cost-effective, and energy-efficient).
- AWS EC2 G6: L40 GPU (optimized for video and graphics workloads, cost-effective, and energy-efficient).
Additionally, a new NVIDIA NeMo Retriever microservice platform has been introduced, offering tools for creating highly accurate chatbots and summarization tools. This includes BioNeMo, a specialized version for drug discovery, soon to be available on AWS via NVIDIA DGX Cloud and Amazon SageMaker.
Focusing on NeMo Retriever, the microservice employs NVIDIA-optimized algorithms for generative AI apps, enabling more accurate responses based on business data stored in the cloud or data centers.
This retrieval-augmented generation (RAG) capability is being developed in collaboration with Cadence, Dropbox, SAP, and ServiceNow to create production-ready models for businesses, facilitating the quick development of custom generative AI applications and services.
While open-source RAG toolkits exist, NVIDIA’s NeMo Retriever stands out for its commercially viable models, API stability, security patches, and enterprise support. It includes optimized embedding models that capture word relationships for the highest accuracy and supports various data types like images, videos, and PDFs if needed.