Ever wondered how Google’s latest LLM model Gemma can achieve the astounding 2B and 8B parameters? Well, NVIDIA’s the one behind this.
The joint statement from both companies further revealed that the same TensorRT-LLM that powered various popular LLMs around the tech world including Google Gemini, is the fundamental library that makes most of it possible.
And of course, being partners means that Gemma also gets exclusive optimizations in terms of quicker and more efficient inference performance running on NVIDIA GPUs, whether in data centers, cloud environments, or even local PCs equipped with NVIDIA RTX GPUs, allowing users from all tiers to get access and become more familiar with Gemma for various applications.
Furthermore, NVIDIA’s popular Chat with RTX tech demo is expected to soon integrate Gemma as a supported model. This integration will allow users of Chat with RTX to interact directly with the powerful Gemma language model on their local RTX-powered Windows PCs. This personalized approach empowers users to create custom chatbots using their own data, ensuring privacy as everything remains on their device.
As for when will it be available to the mass public, I think Chat with RTX is still in the early phase so it might not be so soon but hey, who knows.