Although the initial launch of the new Microsoft Copilot+ AI PCs category is done in partnership with Qualcomm, utilizing the new Snapdragon X series CPUs, any device meeting the necessary requirements will eventually qualify as a Copilot+ AI PC, including those equipped with an NVIDIA RTX GPU.
Let’s get straight to the developer aspects since the impact on gaming isn’t significant. This initiative began with Microsoft’s recent ORT Gen-AI Extension, a cross-platform AI inference library. This extension supports optimization techniques like quantization for large language models (LLMs) such as Phi-3, Llama 3, Gemma, and Mistral across various execution providers for both hardware and software stacks like DirectML.
Because DirectML is also developed by Microsoft, Windows-based AI developers can rely less on Linux-specific environments and libraries, streamlining their work within the Windows ecosystem. From NVIDIA’s perspective, they are ensuring that optimizations provided through R555 drivers, including GeForce Game Ready/Studio/RTX Enterprise Drivers, lead to performance improvements of up to 3x compared to previous driver versions. See details below.
Additional benefits of the new R555 driver include:
– Support for DQ-GEMM metacommand to handle INT4 weight-only quantization for LLMs
– New RMSNorm normalization methods for Llama 2, Llama 3, Mistral, and Phi-3 models
– Group and multi-query attention mechanisms, and sliding window attention to support Mistral
– In-place KV updates to improve attention performance
– Support for GEMM of non-multiple-of-8 tensors to enhance context phase performance
AI workflows running in browsers, like those using WebNN, have also seen improvements and are now available via Developer Preview builds for testing and familiarization.