NVIDIA TensorRT 3
Dramatically Accelerates AI Inference for Hyperscale Data Centres
Dramatically Accelerates AI Inference for Hyperscale Data Centres
Alibaba, Baidu, Tencent, JD.com and Hikvision Adopt
NVIDIA TensorRT
NVIDIA TensorRT
for Programmable Inference
Acceleration
Acceleration
SINGAPORE—September 26, 2017—NVIDIA
today unveiled new NVIDIA® TensorRT
3 AI inference
software that
sharply boosts the performance and slashes the cost of
inferencing from the cloud to edge devices, including
self-driving cars and robots.
today unveiled new NVIDIA® TensorRT
3 AI inference
software that
sharply boosts the performance and slashes the cost of
inferencing from the cloud to edge devices, including
self-driving cars and robots.
The
combination of TensorRT 3 with NVIDIA GPUs delivers ultra-fast and efficient
inferencing across all frameworks for AI-enabled services — such as image and
speech recognition, natural language processing, visual search and personalised
recommendations. TensorRT and NVIDIA Tesla®
GPU accelerators are
up to 40 times faster than CPUs(1)
at one-tenth the cost of CPU-based solutions.(2)
combination of TensorRT 3 with NVIDIA GPUs delivers ultra-fast and efficient
inferencing across all frameworks for AI-enabled services — such as image and
speech recognition, natural language processing, visual search and personalised
recommendations. TensorRT and NVIDIA Tesla®
GPU accelerators are
up to 40 times faster than CPUs(1)
at one-tenth the cost of CPU-based solutions.(2)
“Internet
companies are racing to infuse AI into services used by billions of people. As
a result, AI inference workloads are growing exponentially,” said NVIDIA
founder and CEO Jensen Huang. “NVIDIA TensorRT is the world’s first
programmable inference accelerator. With CUDA programmability, TensorRT will be
able to accelerate the growing diversity and complexity of deep neural
networks. And with TensorRT’s dramatic speed-up, service providers can
affordably deploy these compute intensive AI workloads.”
companies are racing to infuse AI into services used by billions of people. As
a result, AI inference workloads are growing exponentially,” said NVIDIA
founder and CEO Jensen Huang. “NVIDIA TensorRT is the world’s first
programmable inference accelerator. With CUDA programmability, TensorRT will be
able to accelerate the growing diversity and complexity of deep neural
networks. And with TensorRT’s dramatic speed-up, service providers can
affordably deploy these compute intensive AI workloads.”
More than
1,200 companies have already begun using NVIDIA’s inference platform across a
wide spectrum of industries to discover new insights from data and deploy
intelligent services to businesses and consumers. Among them are Amazon,
Microsoft, Facebook and Google; as well as leading
Chinese enterprise companies like Alibaba, Baidu, JD.com, iFLYTEK, Hikvision,
Tencent and WeChat.
1,200 companies have already begun using NVIDIA’s inference platform across a
wide spectrum of industries to discover new insights from data and deploy
intelligent services to businesses and consumers. Among them are Amazon,
Microsoft, Facebook and Google; as well as leading
Chinese enterprise companies like Alibaba, Baidu, JD.com, iFLYTEK, Hikvision,
Tencent and WeChat.
“NVIDIA’s
AI platform, using TensorRT software on Tesla GPUs, is an outstanding
technology at the forefront of enabling SAP’s growing requirements for inferencing,” said Juergen Mueller, chief
information officer at SAP. “TensorRT and NVIDIA GPUs make real-time service delivery
possible, with maximum machine learning performance
and versatility to meet our customers’ needs.”
AI platform, using TensorRT software on Tesla GPUs, is an outstanding
technology at the forefront of enabling SAP’s growing requirements for inferencing,” said Juergen Mueller, chief
information officer at SAP. “TensorRT and NVIDIA GPUs make real-time service delivery
possible, with maximum machine learning performance
and versatility to meet our customers’ needs.”
“JD.com
relies on NVIDIA GPUs and software for inferencing in our data centres,” said
Andy Chen, senior director of AI and Big Data at JD. “Using NVIDIA’s TensorRT
on Tesla GPUs, we can simultaneously inference 1,000 HD video streams in real
time, with 20 times fewer servers. NVIDIA’s deep learning platform provides
outstanding performance and efficiency for JD.”
relies on NVIDIA GPUs and software for inferencing in our data centres,” said
Andy Chen, senior director of AI and Big Data at JD. “Using NVIDIA’s TensorRT
on Tesla GPUs, we can simultaneously inference 1,000 HD video streams in real
time, with 20 times fewer servers. NVIDIA’s deep learning platform provides
outstanding performance and efficiency for JD.”
TensorRT 3
is a high-performance optimising compiler and runtime engine for production
deployment of AI applications. It can rapidly optimise, validate and deploy
trained neural networks for inference to hyperscale data centres, embedded or
automotive GPU platforms.
is a high-performance optimising compiler and runtime engine for production
deployment of AI applications. It can rapidly optimise, validate and deploy
trained neural networks for inference to hyperscale data centres, embedded or
automotive GPU platforms.
It offers
highly accurate INT8 and FP16 network execution, which can save data centre
operators tens of millions of dollars in acquisition and annual energy costs. A
developer can use it to take a trained neural network and, in just one day,
create a deployable inference solution that runs 3-5x faster than their
training framework.
highly accurate INT8 and FP16 network execution, which can save data centre
operators tens of millions of dollars in acquisition and annual energy costs. A
developer can use it to take a trained neural network and, in just one day,
create a deployable inference solution that runs 3-5x faster than their
training framework.
To further accelerate AI, NVIDIA introduced
additional software, including:
additional software, including:
•
DeepStream
SDK: NVIDIA
DeepStream SDK
delivers real-time,
low-latency video analytics at
scale. It helps developers integrate advanced video inference capabilities,
including INT8 precision and GPU-accelerated transcoding, to support AI-powered
services like object classification and scene understanding for up to 30 HD
streams in real time on a single Tesla
P4 GPU accelerator.
DeepStream
SDK: NVIDIA
DeepStream SDK
delivers real-time,
low-latency video analytics at
scale. It helps developers integrate advanced video inference capabilities,
including INT8 precision and GPU-accelerated transcoding, to support AI-powered
services like object classification and scene understanding for up to 30 HD
streams in real time on a single Tesla
P4 GPU accelerator.
•
CUDA 9: The latest
version of CUDA®, NVIDIA’s accelerated computing
software platform, speeds up HPC and
deep learning applications with support for NVIDIA
Volta architecture-based GPUs,
up to 5x faster libraries, a new programming model for thread
management and updates to debugging and profiling tools. CUDA
9 is optimised to deliver maximum performance on Tesla
V100 GPU accelerators.
CUDA 9: The latest
version of CUDA®, NVIDIA’s accelerated computing
software platform, speeds up HPC and
deep learning applications with support for NVIDIA
Volta architecture-based GPUs,
up to 5x faster libraries, a new programming model for thread
management and updates to debugging and profiling tools. CUDA
9 is optimised to deliver maximum performance on Tesla
V100 GPU accelerators.
Inference for
the Data Centre
the Data Centre
Data center
managers constantly balance performance and efficiency to keep their server
fleets at maximum productivity. Tesla GPU accelerated servers can replace over
a hundred hyperscale CPU servers for deep learning inference applications and
services, freeing up precious rack space, reducing energy and cooling
requirements, and reducing cost as much as 90 percent. energy and cooling
requirements, and reducing cost as much as 90 percent.
managers constantly balance performance and efficiency to keep their server
fleets at maximum productivity. Tesla GPU accelerated servers can replace over
a hundred hyperscale CPU servers for deep learning inference applications and
services, freeing up precious rack space, reducing energy and cooling
requirements, and reducing cost as much as 90 percent. energy and cooling
requirements, and reducing cost as much as 90 percent.
NVIDIA
Tesla GPU accelerators provide the optimal inference solution — combining the
highest throughput, best efficiency and lowest latency on deep learning
inference workloads to power new AI-driven experiences.
Tesla GPU accelerators provide the optimal inference solution — combining the
highest throughput, best efficiency and lowest latency on deep learning
inference workloads to power new AI-driven experiences.
Inference for
Self-Driving Cars and Embedded Applications
Self-Driving Cars and Embedded Applications
With NVIDIA’s
unified architecture, deep neural networks on every deep learning framework can
be trained on NVIDIA
DGX™ systems in the data
centre, and then deployed into all types of devices — from robots to autonomous
vehicles — for real-time inferencing at the edge.
unified architecture, deep neural networks on every deep learning framework can
be trained on NVIDIA
DGX™ systems in the data
centre, and then deployed into all types of devices — from robots to autonomous
vehicles — for real-time inferencing at the edge.
TuSimple, a
startup developing autonomous
trucking technology, increased inferencing
performance by 30 percent after TensorRT optimisation. In June, the company
successfully completed a 170-mile Level 4 test drive from San Diego to Yuma,
Arizona, using NVIDIA GPUs and cameras as the primary sensor. The performance
gains from TensorRT allow TuSimple to analyse additional camera data, and add
new AI algorithms to their autonomous trucks, without sacrificing response
time.
startup developing autonomous
trucking technology, increased inferencing
performance by 30 percent after TensorRT optimisation. In June, the company
successfully completed a 170-mile Level 4 test drive from San Diego to Yuma,
Arizona, using NVIDIA GPUs and cameras as the primary sensor. The performance
gains from TensorRT allow TuSimple to analyse additional camera data, and add
new AI algorithms to their autonomous trucks, without sacrificing response
time.
Keep Current
on NVIDIA
on NVIDIA
Subscribe
to the NVIDIA blog,
follow us on Facebook,
Google+,
Twitter,
LinkedIn
and Instagram, and view
NVIDIA videos on YouTube
and images on Flickr.
to the NVIDIA blog,
follow us on Facebook,
Google+,
Twitter,
and Instagram, and view
NVIDIA videos on YouTube
and images on Flickr.
About NVIDIA
NVIDIA‘s (NASDAQ:
NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market,
redefined modern computer graphics and revolutionised parallel computing. More
recently, GPU deep learning ignited modern AI — the next era of computing —
with the GPU acting as the brain of computers, robots and self-driving cars
that can perceive and understand the world.
More information at http://nvidianews.nvidia.com/.
NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market,
redefined modern computer graphics and revolutionised parallel computing. More
recently, GPU deep learning ignited modern AI — the next era of computing —
with the GPU acting as the brain of computers, robots and self-driving cars
that can perceive and understand the world.
More information at http://nvidianews.nvidia.com/.
For the LATEST tech updates,
FOLLOW us on our Twitter
LIKE us on our FaceBook
SUBSCRIBE to us on our YouTube Channel!