NVIDIA recently released their Ampere architecture-based graphics cards. Known as the new NVIDIA GeForce RTX 30 Series, these graphics card delivers what NVIDIA dubs as “the greatest-ever generational leap” in GeForce history.
Based on initial benchmarks, there is no doubt that these new graphics cards are currently the best in the world. What makes them so special? In this article, we are going to take a deeper look into the Ampere Architecture to understand what really contributes to all the goodness that is within the new NVIDIA GeForce RTX 30 Series.
An Introduction – NVIDIA GeForce RTX 30 Series
On 1st September 2020, NVIDIA co-founder and CEO Jensen Huang introduced three new NVIDIA GeForce RTX graphics card for the market. They are the RTX 3090, RTX 3080 and RTX 3070.
NVIDIA GeForce RTX 3090
At the top of the line is the GeForce RTX 3090. It is known as the “Big Ferocious GPU”, as it is currently the fastest graphics card in the world.
The GeForce RTX 3090 is massive both physically and in performance gains. It uses a triple slot design in order to appropriately house its thermal dissipation solution. It’s also the first graphics card that can effectively support RTX On with DLSS, running on 8K resolution.
The GeForce RTX 3090 starts at USD1499, and is already currently available from NVIDIA and partners.
NVIDIA GeForce RTX 3080
Known as the flagship of the series, the GeForce RTX 3080 is expected to perform two times faster than the RTX 2080. NVIDIA has decided to retain the USD699 price for this new graphics card, which is the same for the RTX 2080. It also comes with the new Dual Axial Flow design, which is said to provide 55% more airflow, 3 x quieter and is 30% more efficient.
The GeForce RTX 3080 starts at USD699 and is already currently available from NVIDIA and partners.
NVIDIA GeForce RTX 3070
Unlike the higher end RTX 3090 and RTX 3080, the RTX 3070 only comes with GDDR6 memory instead of GDDR6X. However, don’t belittle this USD499 card, as its performance is even higher than the RTX 2080Ti. This graphics card is coming to the market on the 29th October 2020.
GeForce RTX 3090 | GeForce RTX 3080 | GeForce RTX 3070 | |
Key Point | Big Ferocious GPU – Enables 8K Gaming w DLSS | 2 x Performance of 2080 | Faster than 2080Ti |
CUDA Cores | 10,496 | 8,704 | 5,888 |
Shader Performance | 36 Shader-TFLOPS | 30 Shader-TFLOPS | 20 Shader-TFLOPS |
RT Performance | 69 RT-TFLOPS | 58 RT-TFLOPS | 40 RT-TFLOPS |
Tensor Performance | 285 Tensor-TFLOPS | 238 Tensor-TFLOPS | 163 Tensor-TFLOPS |
Memory | 24GB GDDR6X | 10GB GDDR6X | 8GB GDDR6 |
Price | USD 1499 | USD 699 | USD 499 |
Availability | 24 September 2020 | 17 September 2020 | 29 October 2020 |
What makes the new RTX 30 series perform so well? All About the new Ampere Architecture
The new Ampere Architecture lays the foundation for the 2nd Generation of RTX graphics card. By combining the advanced Samsung 8nm NVIDIA custom process, and improvements to GPU architecture, NVIDIA has brought never seen before performance to the new Ampere based graphics card.
So, what are the contributing factors to these performance gains? The key improvements in the Ampere architecture are discussed below.
- Ampere Streaming Multiprocessor
FP32, or also known as 32-bit floating point, are mainly used in calculations for 3D Games. Of course, when a GPU is able to process more FP32 calculations per clock, its performance is higher.
NVIDIA has doubled the number of FP32 ALU (32-bit floating point arithmetic-logic unit) per Streaming Multiprocessor on the new Ampere GPU architecture. The increase in FP32 ALUs also require a data path to support it. Tony Tamasi, VP of Technical Marketing at NVIDIA, explained the changes in the Ampere architecture as part of a Q&A on reddit:
One of the key design goals for the Ampere 30-series SM was to achieve twice the throughput for FP32 operations compared to the Turing SM. To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock
The new Ampere architecture brings about higher performance per watt, promising an unparalleled gaming performance to gamers that has never been seen before.
- 2nd Generation Ray Tracing Cores
NVIDIA first introduced the Ray Tracing Cores in the Turing-based first generation RTX GPUs. The Ray Tracing Cores, which are also known as RT Cores, are accelerator units that are dedicated to performing ray tracing operations with efficiency. With the new RTX 30 Series graphics card, NVIDIA’s 2nd Generation Ray Tracing Cores can tackle 2x throughput as compared to its predecessor.
- 3rd Generation Tensor Cores
On top of the improved Ray Tracing Cores, NVIDIA has also built up on the Tensor Cores on the RTX 30 Series Graphics card. The new Tensor Cores are now in its third generation, and will perform up to 2x the throughput of Turing-architecture Tensor Cores.
Tensor Cores are used specifically for deep learning and AI applications. By accelerating AI training and inference performance, the Tensor Cores found on the RTX 30 Series graphics cards are used to improve gaming performance.
NVIDIA’s latest DLSS technology uses deep learning neural network to boost frame rates while retaining image quality. This graphics technology process is accelerated by the Tensor Cores, and it can even enable games to run in Ultra Performance mode at up to 8K on the new GeForce RTX 3090.
- GDDR6X
Working with Micron, NVIDIA is also bringing the latest and greatest in Graphics Memory to the new RTX 30 series graphics cards. GDDR6X technology uses innovative PAM4 signal transmission technology to double the data rate per clock cycle as compared to GDDR6 memory. This increase in data rate per clock cycle brings about superb graphics memory performance for a multitude of workloads, such as gaming, professional visualization and AI inference.
- HDMI 2.1 Support
In the past, a HDMI cable can only output a maximum of 4k HDR at 98Hz over a single cable. With HDMI 2.1 support, the new NVIDIA GeForce RTX 30 series can now support 4K at higher refresh rates, and even 8K resolution just with a single cable.
Conclusion
NVIDIA was able to take advantage of not only an upgrade to its GPU architecture, but also process and graphics memory improvements. These bring about a generational leap in performance to the new GeForce RTX 30 series graphics cards. Furthermore, these graphics cards are also priced aggressively to ward off competition in the space, making the RTX 30 series an easy choice for many gamers.
NVIDIA continues to path the way in gaming graphics performance and technology. We’re excited with what the future holds with NVIDIA taking the lead.