Learn About Our Meetup

4200+ Members

Category: NVIDIA

Making Waves at CVPR: Inception Startups Exhibit GPU-Powered Work in Long Beach

Computer vision technology that can identify items in a shopping bag. Deep learning tools that inspect train tracks for defects. An AI model that automatically labels street-view imagery.

These are just a few of the AI breakthroughs being showcased this week by the dozens of NVIDIA Inception startups at the annual Computer Vision and Pattern Recognition conference, one of the world’s top AI research events.

The NVIDIA Inception virtual accelerator program supports startups harnessing GPUs for AI and data science applications. Since its launch in 2016, the program has expanded over tenfold in size, to over 4,000 companies. More than 50 of them can be found in the CVPR expo hall — exhibiting GPU-powered work spanning retail, robotics, healthcare and beyond.

Malong Technologies: Giving Retailers an Edge with AI

From self-serve weighing stations that automatically identify fresh produce items in a plastic shopping bag, to smart vending machines that can recognize when a shopper takes a beverage out of a cooler — product recognition AI developed by Malong Technologies is enabling frictionless shopping experiences.

Malong’s computer vision solutions are transforming traditional retail equipment into smarter devices, enabling machines to see the products within them to improve operational efficiency, security and the customer experience.

Using the NVIDIA Metropolis platform for smart cities, the company is building product recognition AI models that enable highly accurate, real-time decisions at the edge. Malong develops powerful, scalable intelligent video analytics tools that can accurately recognize hundreds of thousands of retail products in real time. The company researches weakly-supervised learning to significantly reduce the effort to retrain their models as product packaging and store environments change.

Malong was able to speed its inferencing by more than 40x compared to CPU when using DeepStream and TensorRT software libraries on the NVIDIA T4 GPU. The company uses NVIDIA V100 GPUs in the cloud for training, and the Jetson TX2 supercomputer on a module to bring true AI computing at the edge.

At CVPR, the company is at booth 1316 on the show floor and is presenting research that achieves a new gold standard for image retrieval, outperforming prior methods by a significant margin. Malong is also co-hosting the Fine-Grained Visual Categorization Workshop and organized the first ever retail product recognition challenge at CVPR.

ABEJA: Keeping Singapore’s Metros on Track

Manually inspecting railway tracks is a dangerous task, often done by workers at night when trains aren’t running. But with high-speed cameras, transportation companies can instead capture images of the tracks and use AI to automatically detect defects for railway maintenance.

ABEJA, based in Japan, is developing deep learning models that detects anomalies on tracks with more than 90 percent accuracy, a significant improvement over other automated inspection methods. The startup works with SMRT, Singapore’s leading public transport operator, to examine rail defects.

Founded in 2012, ABEJA builds deep learning tools for multiple industries, including retail, manufacturing and infrastructure. Other use cases include an AI to measure efficiency in car factories and a natural language processing model to provide insights for call centers.

The company uses NVIDIA GPUs on premises and in the cloud for training its AI models. For inference, ABEJA has used GPUs for real-time data processing and high-performance image segmentation projects. It has also deployed projects using NVIDIA Jetson TX2 for AI inference at the edge.

The startup is showing a demo of the ABEJA annotation model in its CVPR booth.

Mapillary: AI in the Streets

Sweden-based Mapillary uses computer vision to automate mapping. Its AI models break down and classify street-level images, segmenting and labeling elements like roads, lane markings, street lights and sidewalks. The company has to date processed hundreds of millions of images submitted by individual contributors, nonprofit organizations, companies and governments worldwide.

These labeled datasets can be used for various purposes, including to create useful maps for local governments, train self-driving cars, or build tools for people with disabilities.

Mapillary is presenting four papers at CVPR this year, including one titled Seamless Scene Segmentation. The model described in the research — a new approach that joins two AI models into one, setting a new state-of-the-art for performance — was trained on eight NVIDIA V100 GPUs.

The segmentation models featured in Mapillary’s CVPR booth were also trained using V100 GPUs. By adopting the NVIDIA TensorRT inference software stack in 2017, Mapillary was able to speed up its segmentation algorithms by up to 27x when running on the Amazon Web Services cloud.

Companies interested in the NVIDIA Inception virtual accelerator can visit the program website and apply to join. Inception members are eligible for a 20 percent discount on up to six NVIDIA TITAN RTX GPUs until Oct. 26.

Startups based in the following countries can request a discount code by emailing Australia, Austria, Belgium, Canada, Czech Republic, Denmark, Finland, France, Germany, Ireland, Italy, Luxembourg, the Netherlands, Norway, Poland, Spain, Sweden, United Kingdom, United States.

The post Making Waves at CVPR: Inception Startups Exhibit GPU-Powered Work in Long Beach appeared first on The Official NVIDIA Blog.

Japan’s Fastest Supercomputer Adopts NGC, Enabling Easy Access to Deep Learning Frameworks

From discovering drugs, to locating black holes, to finding safer nuclear energy sources, high performance computing systems around the world have enabled breakthroughs across all scientific domains.

Japan’s fastest supercomputer, ABCI, powered by NVIDIA Tensor Core GPUs, enables similar breakthroughs by taking advantage of AI. The system is the world’s first large-scale, open AI infrastructure serving researchers, engineers and industrial users to advance their science.

The software used to drive these advances is as critical as the servers the software runs on. However, installing an application on an HPC cluster is complex and time consuming. Researchers and engineers are unproductive as they wait to access the software, and their requests to have applications installed distract system admins from completing mission-critical tasks.

Containers — packages that contain software and relevant dependencies — allow users to pull and run the software on a system without actually installing the software. They’re a win-win for users and system admins.

NGC: Driving Ease of Use of AI, Machine Learning and HPC Software

NGC offers over 50 GPU-optimized containers for deep learning frameworks, machine learning algorithms and HPC applications that run on both Docker and Singularity.

The HPC applications provide scalable performance on GPUs within and across nodes. NVIDIA continuously optimizes key deep learning frameworks and libraries, with updates released monthly. This provides users access to top performance for training and inference for all their AI projects.

ABCI Runs NGC Containers

Researchers and industrial users are taking advantage of ABCI to run AI-powered scientific workloads across domains, from nuclear physics to manufacturing. Others are taking advantage of the system’s distributed computing to push the limits on speeding AI training.

To achieve this, the right set of software and hardware tools must be in place, which is why ABCI has adopted NGC.

“Installing deep learning frameworks from the source is complicated and upgrading the software to keep up with the frequent releases is a resource drain,” said Hirotaka Ogawa, team leader of the Artificial Intelligence Research Center at AIST. “NGC allows us to support our users with the latest AI frameworks and the users enjoy the best performance they can achieve on NVIDIA GPUs.”

ABCI has turned to containers to address another user need — portability.

“Most of our users are from industrial segments who are looking for portability between their on-prem systems and ABCI,” said Ogawa. “Thanks to NGC and Singularity, the users can develop, test, and deploy at scale across different platforms. Our sampling data showed that NGC containers were used by 80 percent of the over 100,000 jobs that ran on Singularity.”

NGC Container Replicator Simplifies Ease of Use for System Admins and Users

System admins managing HPC systems at supercomputing centers and universities can now download and save NGC containers on their clusters. This gives users faster access to the software, alleviates their network traffic, and saves storage space.

NVIDIA offers NGC Container Replicator, which automatically checks and downloads the latest versions of NGC containers.

NGC container replicator chart

Without lifting a finger, system admins can ensure that their users benefit from the superior performance and newest features from the latest software.

More Than Application Containers

In addition to deep learning containers, NGC hosts 60 pre-trained models and 17 model scripts for popular use cases like object detection, natural language processing and text to speech.

It’s much faster to tune a pre-trained model for a use case than to start from scratch. The pre-trained models allow researchers to quickly fine-tune a neural network or build on top of an already optimized network for specific use-case needs.

The model training scripts follow best practices, have state-of-the-art accuracy and deliver superior performance. They’re ideal for researchers and data scientists planning to build a network from scratch and customize it to their liking.

The models and scripts take advantage of mixed precision powered by NVIDIA Tensor Core GPUs to deliver up to 3x deep learning performance speedups over previous generations.

Take NGC for a Spin

NGC containers are built and tested to run on-prem and in the cloud. They also support hybrid as well as multi-cloud deployments. Visit, pull your application container on any GPU-powered system or major cloud instance, and see how easy it is to get up and running for your next scientific research.

The post Japan’s Fastest Supercomputer Adopts NGC, Enabling Easy Access to Deep Learning Frameworks appeared first on The Official NVIDIA Blog.

Four Surprising Ways Inference Is Putting AI into Action

From voice assistants like Alexa and Google Maps navigation to Bing’s conversational search, AI has become a part of daily life for many.

These tasks are performing deep learning inference, which might be thought of as AI put into action.

The deep learning neural networks that power AI are trained on massive amounts of data. Putting this training to work in the digital world — to recognize spoken words, images or street signs, or to suggest the shirt you might want to buy or the next movie to watch — is inferencing.

And the breadth of inference applications on GPUs may surprise you. It’s pervasive in everything from the lumber industry to research that delves into reading ancient Japanese texts.

Below are four diverse ways inference running on GPUs is already making a difference.

Fighting Fraud

PayPal is using deep learning inference on GPUs to pinpoint fraudulent transactions — and help ensure they don’t happen again.

The company processes millions of transactions every day. Advances in AI — specifically logistic regression-powered neural network models — have allowed it to filter out deceptive merchants and crack down on sales of illegal products.

The deep learning models also help PayPal optimize its operations by identifying why some transactions fail and spotting opportunities to work more efficiently.

And since the models are always learning, they can personalize user experiences by serving up relevant advertisements based on people’s interests.

Weather Insight

Boston-based ClimaCell is working to bring unprecedented speed, precision and accuracy to weather forecasting by listening closely to a powerful voice: Mother Nature’s.

The company uses inference on GPUs to offer so-called “nowcasting” — hyper-local, high-resolution forecasts that can help businesses and people make better decisions about everything from paving projects to wind generation to planning a daily commute to avoid bad weather. The company also offers forecasting and historic data.

ClimaCell’s nowcasting GPU model in action.

To achieve this, the company writes software that turns the signals in existing communication networks into sensors that can analyze the surrounding environment and extract real-time weather data.

ClimaCell’s network quickly analyzes the signals, integrates them with data from the National Oceanic and Atmospheric Administration and then weaves it all together using predictive models run on NVIDIA GPU accelerators.

Detecting Cancer

Mammogram machines are effective at detecting breast cancer, but expensive. In many developing countries, this makes them rare outside of large cities.

Mayo Clinic researcher Viksit Kumar is leading an effort to use GPU-powered inferencing to more accurately classify breast cancer images using ultrasound machines, which are much cheaper and more accessible around the world.

Kumar and his team have been able to detect and segment breast cancer masses with very good accuracy and few false positives, according to their research paper.

Mayo Clinic ultrasound deep learning research
The red outline shows the manually segmented boundary of a carcinoma, while the deep learning-predicted boundaries are shown in blue, green and cyan.

The team does its local processing using the TensorFlow deep learning framework container from the NGC registry on NVIDIA GPUs. It also uses NVIDIA V100 Tensor Core GPUs on AWS using the same container.

Eventually, Kumar hopes to use ultrasound images for the early detection of other forms of the disease, such as thyroid and ovarian cancer.

Making Music

MuseNet is a deep learning algorithm demo from AI research organization OpenAI that automatically generates music using 10 kinds of instruments and a host of different styles — everything from pop to classical.

People can create entirely new tracks by applying different instruments and sounds to music the algorithm generates. The demo uses NVIDIA V100 Tensor Core GPUs for this inferencing task.

Using the demo, you can take spin up twists on your favorite songs. Add guitars, leave out the piano, go big on drums. Or change its style to sound like jazz or classic rock.

The algorithm wasn’t programmed to mimic the human understanding of music. Instead, it was trained on hundreds of thousands of songs so it could learn the patterns of harmony, rhythm and style prevalent within music.

Its 72-layer network was trained using NVIDIA V100 Tensor Core GPUs with the cuDNN-accelerated TensorFlow deep learning framework.

Read more stories about deep learning inferencing.

The post Four Surprising Ways Inference Is Putting AI into Action appeared first on The Official NVIDIA Blog.

Sea of Green: NVIDIA Floods ISC with AI and HPC

The intersection of HPC and AI is extending the reach of science and accelerating the pace of innovation like never before. It’s driving discovery in scientific astrophysics, weather forecasting, energy exploration, molecular dynamics and many other fields.

That’s why over 3,000 people will flock to ISC High Performance 2019, in Frankfurt, Germany, next week. Attendees will descend on the annual supercomputing conference, running from June 16-20, for scores of talks, demos and workshops to explore the latest HPC breakthroughs.

Hear from NVIDIA Experts at ISC

GPUs are at the heart of accelerating HPC. That’s why you’ll find NVIDIA technology featured in a number of talks and workshops across the show.

Make sure not to miss:

Witness Groundbreaking Technology in Action

GPU computing is the most accessible and energy-efficient path forward for HPC and the data center.

At ISC, dozens of NVIDIA partners will demonstrate the importance of GPU acceleration through a range of exhibits and demos.

Look out for “NVIDIA partner” signs at booths including those from Dell EMC, HPE, Mellanox, Boston, One Stop Systems and Supermicro to discover GPU-powered demos. Across the show, you’ll also find:

  • The AI “Emoji” demo — Pass by one of these demo stations at our partners’ booths and get your emotion read in real time. The Emoji demo performs real-time face detection and can identify a whole range of emotions, including “neutral,” “happiness,” “surprise,” “sadness,” “anger,” “disgust,” “fear”  and “contempt.”
  • The Index Supernova demo — Large, 3D scientific simulations typically take about four months to create and generate over a terabyte of visualization data. With the NVIDIA IndeX SDK running on NGC, researchers can now view and interact with their data, make modifications and focus on the most pertinent parts of the data — all in real time.

Students Battle It Out in Cluster Challenge

For this year’s Student Cluster Competition, half of the teams have chosen to build based on NVIDIA V100 Tensor Core GPUs.

Over the course of three days, a total of 14 teams will have the chance to showcase systems of their own design and compete to achieve the highest performance across a series of standard HPC benchmarks and applications.

The winner will be announced on Wednesday, June 19, at 5:15 p.m. in Panorama 2.

Keep up to date on all things HPC and AI by following our social handles @NVIDIAEU and #ISC19.



Image courtesy of

The post Sea of Green: NVIDIA Floods ISC with AI and HPC appeared first on The Official NVIDIA Blog.

What’s the Difference Between Hardware and Software Accelerated Ray Tracing?

You don’t need specialized hardware to do ray tracing, but you want it.

Software-based ray tracing, of course, is decades old. And it looks great: movie makers have been using ray tracing for decades now.

But it’s now clear that specialized hardware — like the RT Cores built into NVIDIA’s Turing architecture — makes a huge difference if you’re doing ray tracing in real time. Games require real-time ray tracing.

Once considered the “holy grail” of graphics, real-time ray tracing brings the same techniques long used by movie makers to gamers and creators.

Thanks to a raft of new AAA games developers have introduced this year — and the introduction last year of NVIDIA GeForce RTX GPUs — this once wild idea is mainstream.

Millions are now firing up PCs that benefit from the RT Cores and Tensor Cores built into RTX. And they’re enjoying ray-tracing enhanced experiences many thought would be years, even decades, away.

Real-time ray tracing, however, is possible without dedicated hardware. That’s because — while ray tracing has been around since the 1970s — the real trend is much newer: GPU-accelerated ray tracing with dedicated cores.

The use of GPUs to accelerate ray-tracing algorithms gained fresh momentum last year with the introduction of Microsoft’s DirectX Raytracing (DXR) API. And that’s great news for gamers and creators.

Ray Tracing Isn’t New

So what is ray tracing? Look around you. The objects you’re seeing are illuminated by beams of light. Now follow the path of those beams backwards from your eye to the objects that light interacts with. That’s ray tracing.

It’s a technique first described by IBM’s Arthur Appel, in 1969, in “Some Techniques for Shading Machine Renderings of Solids.” Thanks to pioneers such as Turner Whitted, Lucasfilm’s Robert Cook, Thomas Porter and Loren Carpenter, CalTech’s Jim Kajiya, and a host of others, ray tracing is now the standard in the film and computer graphics industry for creating lifelike lighting and images.

However, until last year, almost all ray tracing was done offline. It’s very compute intensive. Even today, the effects you see at movie theaters require sprawling, CPU-equipped server farms. Gamers want to play interactive, real-time games. They won’t wait minutes or hours per frame.

GPUs, by contrast, can move much faster, thanks to the fact they rely on larger numbers of computing cores to get complex tasks done more quickly. And, traditionally, they’ve used another rendering technique, known as “rasterization,” to display three-dimensional objects on a two-dimensional screen.

With rasterization, objects on the screen are created from a mesh of virtual triangles, or polygons, that create 3D models of objects. In this virtual mesh, the corners of each triangle — known as vertices — intersect with the vertices of other triangles of different sizes and shapes. It’s fast and the results have gotten very good, even if it’s still not always as good as what ray tracing can do.

GPUs Take on Ray Tracing

But what if you used these GPUs — and their parallel processing capabilities — to accelerate ray tracing? This is where GPU-accelerated software ray tracing comes in. NVIDIA OptiX, introduced in 2009, targeted design professionals with GPU-accelerated ray tracing. Over the next decade, OptiX rode the steady advance in speed delivered by successive generations of NVIDIA GPUs.

By 2015, NVIDIA was demonstrating at SIGGRAPH how ray tracing could turn a CAD model into a photorealistic image — indistinguishable from a photograph — in seconds, speeding up the work of architects, product designers and graphic artists.

That approach — GPU-accelerated software ray tracing — was endorsed by Microsoft early last year, with the introduction of DXR, which enables full support of NVIDIA RTX ray-tracing software through Microsoft’s DXR API.

Delivering high-performance, real-time ray tracing required two innovations: dedicated ray-tracing hardware, RT Cores; and Tensor Cores for high-performance AI processing for advanced denoising, anti-aliasing and super resolution.

RT Cores accelerate ray tracing by speeding up the process of finding out where a ray intersects with the 3D geometry of a scene. These specialized cores accelerate a tree-based ray-tracing structure called a bounding volume hierarchy, or BVH, used to calculate where rays and the triangles that comprise a computer-generated image intersect.

Tensor Cores — first unveiled with NVIDIA’s Volta architecture aimed at enterprise and scientific computing in 2018 to accelerate AI algorithms — further accelerate graphically intense workloads. That’s through a special AI technique called NVIDIA DLSS, short for Deep Learning Super Sampling. RTX’s Tensor Cores make this possible.

Turing at Work

You can see how this works by comparing how quickly Turing and our previous generation Pascal architecture render a single frame of Metro Exodus.

Top: One frame of Metro Exodus rendered on Pascal, with the time in the middle spent on ray tracing.

On Turing, you can see several things happening. One is green, that’s our RT cores kicking in. As you can see, the same ray tracing done on Pascal GPU is done in one-fifth of the time on Turing.

Reinventing graphics, NVIDIA and our partners have been driving Turing to market through a stack of products that now range from the highest performance product, at $999, all the way down to an entry gamer, at $149. The RTX products, with RT Cores and Tensor Cores, start at $349.

Broad Support

There’s no question that real-time ray tracing is the next generation of gaming.

Some of the most important ecosystem partners have announced their support and are now opening the floodgates for real-time ray tracing in games.

Inside of Microsoft’s DirectX 12 multimedia programming interfaces is a ray-tracing component they call DirectX Raytracing (DXR). So every PC, if enabled by the GPU, is capable of accelerated ray tracing.

At the Game Developers Conference in March, we turned on DXR-accelerated ray tracing on our Pascal and Turing GTX GPUs.

To be sure, earlier GPU architectures, such as Pascal, were designed to accelerate DirectX 12. So on this hardware, these calculations are performed on the programmable shader cores, a resource shared with many other graphics functions of the GPU.

So while your mileage will vary — since there are many ways ray tracing can be implemented — Turing will consistently perform better when playing games that make use of ray-tracing effects.

And that performance advantage on the most popular games is only going to grow.

EA’s AAA engine Frostbite, supports ray tracing. Unity and Unreal, which together power 90 percent of the world’s games, now support Microsoft’s DirectX ray tracing in the engine.

Collectively, that opens up an easy path for thousands and thousands of game developers to implement ray tracing in their games.

All told, NVIDIA’s engaged somewhere in excess of 100 developers who are working on ray-traced games.

To date we have millions of gamers who are gaming on RTX hardware, GPU-accelerated hardware with RT Cores.

And — thanks to ray tracing — that number is growing every week.

The post What’s the Difference Between Hardware and Software Accelerated Ray Tracing? appeared first on The Official NVIDIA Blog.

AI Potcast: A Joint Discussion on AI, Agtech with Grownetics CEO

The grass really is greener on the AI side. Grownetics CEO and co-founder Vince Harkiewicz would know. He helps grow it.

AI isn’t new to agtech, of course. But Grownetics, an intelligent cultivation management system for indoor farms and greenhouses, has a very specific focus: cannabis.

“We’ve specifically targeted the cannabis industry because there’s a lack of tools built for them and indoor agriculture as a whole,” Harkiewicz explained in a conversation with AI Podcast host Noah Kravitz.

Grownetics handles every step of the cultivation process, Harkiewicz says.  Using harvest data, an open sensor network and a deep learning recommendation engine, the company provides a “specific recipe leading to an ideal yield for that particular variety” of cannabis, or any crop.

While he’s focused cannabis, for now, Harkiewicz believes Grownetics’ work in the industry will support growth in the broader indoor agriculture market.

“I’d argue that [the cannabis industry is] leading the indoor-ag field,” Harkiewicz said. “The two industries don’t really communicate too much yet, and that’s really what we’re striving to do, is to bridge precision agriculture, indoor agriculture with what’s been going on in the cannabis space.”

Based in Boulder, Colo., Grownetics began running beta tests at the end of last year. Since then, the company has gained eight clients and hopes to commercially launch at the end of this year.

“It’s been intense,” Harkiewicz said. “Not only are we a startup, but we’re a startup in a startup industry.”

The cannabis industry has seen immense growth in recent years as multiple countries and U.S. states have legalized cannabis for medicinal and recreational use. However, for Grownetics’ operations in the U.S., the federal legal status of the crop poses an extra hurdle.

“Because of the federal legality, or illegality of the cannabis industry, it’s artificially suppressing the market and making it extremely hard for our customers to grow and scale great businesses,” Harkiewicz said.

Even with these legal challenges, Harkiewicz is optimistic about the future of the cannabis industry and its influence on agriculture.

“This is a systemic evolution that we’re looking at, from producing the unique medicinal product in cannabis, and doing it in a pharmaceutical high-quality, clean way,” Harkiewicz said.

“And then taking those same traits to leafy greens and produce to be growing them indoors without any pesticides, and very, very efficiently.”

Help Make the AI Podcast Better

Have a few minutes to spare? It’d help us if you fill out this short listener survey.

Your answers will help us learn more about our audience, which will help us deliver podcasts that meet your needs, what we can do better, and what we’re doing right.

How to Tune into the AI Podcast

Our AI Podcast is available through iTunesCastbox, DoggCatcher, Google Play MusicOvercastPlayerFMPodbayPodBean, Pocket Casts, PodCruncher, PodKicker, Stitcher, Soundcloud and TuneIn.

If your favorite isn’t listed here, email us at aipodcast [at] nvidia [dot] com.

The post AI Potcast: A Joint Discussion on AI, Agtech with Grownetics CEO appeared first on The Official NVIDIA Blog.

GPU Computing 101: Why University Educators Are Pulling NVIDIA Teaching Kits into Their Classrooms

Along with the usual elements of university curriculums — lectures, assignments, lab exercises — there’s a new tool that educators are increasingly leaning into: NVIDIA Teaching Kits.

University educators around the world are tapping into these kits, which include downloadable teaching materials and online courses that provide the foundation to understand and build hands-on expertise in areas like deep learning, accelerated computing and robotics.

The kits are offered by the NVIDIA Deep Learning Institute, a hands-on training program in AI, accelerated computing, and data science to help technologists solve challenging problems.

Co-developed with university faculty, NVIDIA Teaching Kits provide content to enhance a university curriculum, including lecture slides, videos, hands-on labs, online DLI certificate courses, e-books and GPU cloud resources.

Accelerated Computing at University of California, Riverside

Daniel Wong, an assistant professor of electrical and computer engineering at the University of California, Riverside, used the Accelerated Computing Teaching Kit for two GPU-centric computer science courses — a graduate course and an undergrad course on “GPU Computing and Programming.”

“The teaching kit presented a very well structured way to teach GPU programming, especially given the way many of our students come from very diverse backgrounds,” Wong said.

Wong’s undergrad course took place over 10 weeks with an enrollment of about three dozen students and is currently in its second offering. The kit was central in teaching the basics of CUDA, such as CUDA threading models, parallel patterns, common optimizations and other important parallel programming primitives, Wong said.

“Students know that the material we present is state of the art and up to date so it gives them confidence in the material and drew a lot of excitement,” he said.

The course built up to a final project with students accelerating an application of their choice, such as implementations and performance comparison of CNNs in cuDNN, TensorFlow, Keras, facial recognition on NVIDIA Jetson boards, and fluid dynamics and visualization. In addition, several of Wong’s undergraduate students have gone on to pursue GPU-related undergraduate research.

Deep Learning at University Hospital Erlangen

At the Institute of Neuropathology of the University Hospital Erlangen in Germany, a deep learning morphology research group applies deep learning algorithms to various problems around histopathologic brain tumors.

The university’s medical students have little background in computer science, so principal investigator Samir Jabari uses the NVIDIA Teaching Kit as part of sessions he conducts every few weeks on the field of computer vision.

Through lecture slides on convolutional neural networks and lab assignments, the teaching kit helps provide insights into the field of computer vision and its specific challenges toward histopathology.

Robotics at Georgia State University

Georgia State University’s Computer Science department used the Robotics Teaching Kit in its “Introduction to Robotics” course, first introduced in spring 2018.

The course grouped two to three students per kit to engage them in learning basic sensor interaction and path-planning experiments. At the end of the class, students presented projects during the department’s biannual poster and demonstration day.

The course was a hit. When first taught, it registered 32 students. The upcoming fall course has already received 60 registration requests — nearly double the registration capacity.

Beyond the classroom, Georgia State faculty and students are using NVIDIA Teaching Kits to facilitate projects in the greater community in interdisciplinary areas such as environmental sensing and cybersecurity.

“This kind of in-class hardware kit-based teaching is new to the department,” said Ashwin Ashok, assistant professor of computer science at Georgia State. “These kits have really gained a lot of traction for potential uses in courses as well as research at Georgia State.”

The post GPU Computing 101: Why University Educators Are Pulling NVIDIA Teaching Kits into Their Classrooms appeared first on The Official NVIDIA Blog.

Intel Highlighted Why NVIDIA Tensor Core GPUs Are Great for Inference

It’s not every day that one of the world’s leading tech companies highlights the benefits of your products.

Intel did just that last week, comparing the inference performance of two of their most expensive CPUs to NVIDIA GPUs.

To achieve the performance of a single mainstream NVIDIA V100 GPU, Intel combined two power-hungry, highest-end CPUs with an estimated price of $50,000-$100,000, according to Anandtech. Intel’s performance comparison also highlighted the clear advantage of NVIDIA T4 GPUs, which are built for inference. When compared to a single highest-end CPU, they’re not only faster but also 7x more energy-efficient and an order of magnitude more cost-efficient.

Inference performance is crucial, as AI-powered services are growing exponentially. And Intel’s latest Cascade Lake CPUs include new instructions that improve inference, making them the best CPUs for inference. However, it’s hardly competitive with NVIDIA deep learning-optimized Tensor Core GPUs.

Inference (also known as prediction), in simple terms, is the “pattern recognition” that a neural network does after being trained. It’s where AI models provide intelligent capabilities in applications, like detecting fraud in financial transactions, conversing in natural language to search the internet, and predictive analytics to fix manufacturing breakdowns before they even happen.

While most AI inference today happens on CPUs, NVIDIA Tensor Core GPUs are rapidly being adopted across the full range of AI models. Tensor Core, a breakthrough innovation has transformed NVIDIA GPUs to highly efficient and versatile AI processors. Tensor Cores do multi-precision calculations at high rates to provide optimal precision for diverse AI models and have automatic support in popular AI frameworks.

It’s why a growing list of consumer internet companies — Microsoft, Paypal, Pinterest, Snap and Twitter among them — are adopting GPUs for inference.

Compelling Value of Tensor Core GPUs for Computer Vision

First introduced with the NVIDIA Volta architecture, Tensor Core GPUs are now in their second generation with NVIDIA Turing. Tensor Cores perform extremely efficient computations for AI for a full range of precision — from 16-bit floating point with 32-bit accumulate to 8-bit and even 4-bit integer operations with 32-bit accumulate.

They’re designed to accelerate both AI training and inference, and are easily enabled using automatic mixed precision features in the TensorFlow and PyTorch frameworks. Developers can achieve 3x training speedups by adding just two lines of code to their TensorFlow projects.

On computer vision, as the table below shows, when comparing the same number of processors, the NVIDIA T4 is faster, 7x more power-efficient and far more affordable. NVIDIA V100, designed for AI training, is 2x faster and 2x more energy efficient than CPUs on inference.

Table 1: Inference on ResNet-50.

Intel Xeon 9282
ResNet-50 Inference (images/sec) 7,878 7,844 4,944
# of Processors 2 1 1
Total Processor TDP 800 W 350 W 70 W
Energy Efficiency (Taking TDP) 10 img/ sec/W 22 img/ sec/W 71 img/ sec/W
Performance per Processor (images/sec) 3,939 7,844 4,944
GPU Performance Advantage 1.0 (baseline) 2.0x 1.3x
GPU Energy-Efficiency Advantage 1.0 (baseline) 2.3x 7.2x

Source: Intel Xeon performance; NVIDIA GPU performance

Compelling Value of Tensor Core GPUs for Understanding Natural Language

AI has been moving at a frenetic pace. This rapid progress is fueled by teams of AI researchers and data scientists who continue to innovate and create highly accurate and exponentially more complex AI models.

Over four years ago, computer vision was among the first applications where AI from Microsoft was able to perform at superhuman accuracy using models like ResNet-50. Today’s advanced models perform even more complex tasks like understanding language and speech at superhuman accuracy. BERT, a highly complex AI model open-sourced by Google last year, can now understand prose and answer questions with superhuman accuracy.

A measure of the complexity of AI models is the number of parameters they have. Parameters in an AI model are the variables that store information the model has learned. While ResNet-50 has 25 million parameters, BERT has 340 million, a 13x increase.

On an advanced model like BERT, a single NVIDIA T4 GPU is 56x faster than a dual-socket CPU server and 240x more power-efficient.

Table 2: Inference on BERT. Workload: Fine-Tune Inference on BERT Large dataset.

  Dual Intel Xeon
Gold 6240
BERT Inference,
Question-Answering (sentences/sec)
2 118
Processor TDP 300 W (150 Wx2) 70 W
Energy Efficiency (using TDP) 0.007 sentences/ sec/W 1.7 sentences/ sec/W
GPU Performance Advantage 1.0 (baseline) 59x
GPU Energy-Efficiency Advantage 1.0 (baseline) 240x

CPU server: Dual-socket Xeon Gold 6240@2.6GHz; 384GB system RAM; FP32 precision; with Intel’s TF Docker container v. 1.13.1. Note: Batch-size 4 results yielded the best CPU score.

GPU results: T4: Dual-socket Xeon Gold 6240@2.6GHz; 384GB system RAM; mixed precision; CUDA 10.1.105; NCCL 2.4.3, cuDNN, cuBLAS 10.1.105; NVIDIA driver 418.67; on TensorFlow using automatic mixed precision and XLA compiler; batch-size 4 and sequence length 128 used for all platforms tested. 

Compelling Value of Tensor Core GPUs for Recommender Systems

Another key usage of AI is in recommendation systems, which are used to provide relevant content recommendations on video sharing sites, news feeds on social sites and product recommendations on e-commerce sites.

Neural collaborative filtering, or NCF, is a recommender system that uses the prior interactions of users with items to provide recommendations. When running inference on the NCF model that is a part of the MLPerf 0.5 training benchmark, NVIDIA T4 brings 12x more performance and 24x higher energy efficiency than CPUs.

Table 3: Inference on NCF.

  Single Intel Xeon
Gold 6140
Recommender Inference Throughput (MovieLens)(thousands of samples/sec) 2,860 27,800
Processor TDP 150 W 70 W
Energy Efficiency (using TDP) 19 samples/ sec/W 397 samples/ sec/W
GPU Performance Advantage 1.0 (baseline) 10x
GPU Energy-Efficiency Advantage 1.0 (baseline) 20x

CPU server: Single-socket Xeon Gold 6240@2.6GHz; 384GB system RAM; Used Intel Benchmark for NCF on TensorFlow with Intel’s TF Docker container version 1.13.1; FP32 precision. Note: Single-socket CPU config used for CPU tests as it yielded a better score than dual-socket.

GPU results: T4: Single-socket Xeon Gold 6140@2.3GHz; 384GB system RAM; CUDA 10.1.105; NCCL 2.4.3, cuDNN, cuBLAS 10.1.105; NVIDIA driver 418.40.04; on TensorFlow using automatic mixed precision and XLA compiler; batch-size: 2,048 for CPU, 1,048,576 for T4; precision: FP32 for CPU, mixed precision for T4. 

Unified Platform for AI Training and Inference

The use of AI models in applications is an iterative process designed to continuously improve their performance. Data scientist teams constantly update their models with new data and algorithms to improve accuracy. These models are then updated in applications by developers.

Updates can happen monthly, weekly and even on a daily basis. Having a single platform for both AI training and inference can dramatically simplify and accelerate this process of deploying and updating AI in applications.

NVIDIA’s data center GPU computing platform leads the industry in performance by a large margin for AI training, as demonstrated by the standard AI benchmark, MLPerf. And the NVIDIA platform provides compelling value for inference, as the data presented here attests. That value increases with the growing complexity and progress of modern AI.

To help fuel the rapid progress in AI, NVIDIA has deep engagements with the ecosystem and constantly optimizes software, including key frameworks like TensorFlow, Pytorch and MxNet as well as inference software like TensorRT and TensorRT Inference Server.

NVIDIA also regularly publishes pre-trained AI models for inference and model scripts for training models using your own data. All of this software is freely made available as containers, ready to download and run from NGC, NVIDIA’s hub for GPU-accelerated software.

Get the full story about our comprehensive AI platform.

The post Intel Highlighted Why NVIDIA Tensor Core GPUs Are Great for Inference appeared first on The Official NVIDIA Blog.

ACR AI-LAB and NVIDIA Make AI in Hospitals Easy on IT, Accessible to Every Radiologist

For radiology to benefit from AI, there needs to be easy, consistent and scalable ways for hospital IT departments to implement the technology. It’s a return to a service-oriented architecture, where logical components are separated and can each scale individually, and an efficient use of the additional compute power these tools require.

AI is coming from dozens of vendors as well as internal innovation groups, and needs a place within the hospital network to thrive. That’s why NVIDIA and the American College of Radiology (ACR) have published a Hospital AI Reference Architecture Framework. It helps hospitals easily get started with AI initiatives.

A Cookbook to Make AI Easy

The Hospital AI Reference Architecture Framework was published at yesterday’s annual ACR meeting for public comment. This follows the recent launch of the ACR AI-LAB, which aims to standardize and democratize AI in radiology. The ACR AI-LAB uses infrastructure such as NVIDIA GPUs and the NVIDIA Clara AI toolkit, as well as GE Healthcare’s Edison platform, which helps bring AI from research into FDA-cleared smart devices.

The Hospital AI Reference Architecture Framework outlines how hospitals and researchers can easily get started with AI initiatives. It includes descriptions of the steps required to build and deploy AI systems, and provides guidance on the infrastructure needed for each step.

Hospital AI Architecture Framework
Hospital AI Architecture Framework

To drive an effective AI program within a healthcare institution, there must first be an understanding of the workflows involved, compute needs and data required. It comes from a foundation of enabling better insights from patient data with easy-to deploy compute at the edge.

Using a transfer client, seed models can be downloaded from a centralized model store. A clinical champion uses an annotation tool to locally create data that can be used for fine-tuning the seed model or training a new model. Then, using the training system with the annotated data, a localized model is instantiated. Finally, an inference engine is used to conduct validation and ultimately inference on data within the institution.

These four workflows sit atop AI compute infrastructure, which can be accelerated with NVIDIA GPU technology for best performance, alongside storage for models and annotated studies. These workflows tie back into other hospital systems such as PACS, where medical images are archived.

Three Magic Ingredients: Hospital Data, Clinical AI Workflows, AI Computing

Healthcare institutions don’t have to build the systems to deploy AI tools themselves.

This scalable architecture is designed to support and provide computing power to solutions from different sources. GE Healthcare’s Edison platform now uses NVIDIA’s TRT-IS inference capabilities to help AI run in an optimized way within GPU-powered software and medical devices. This integration makes it easier to deliver AI from multiple vendors into clinical workflows — and is the first example of the AI-LAB’s efforts to help hospitals adopt solutions from different vendors.

Together, Edison with TRT-IS offers a ready-made device inferencing platform that is optimized for GPU-compliant AI, so models built anywhere can be deployed in an existing healthcare workflow.

Hospitals and researchers are empowered to embrace AI technologies without building their own standalone technology or yielding their data to the cloud, which has privacy implications.

The post ACR AI-LAB and NVIDIA Make AI in Hospitals Easy on IT, Accessible to Every Radiologist appeared first on The Official NVIDIA Blog.

By the Book: AI Making Millions of Ancient Japanese Texts More Accessible

Natural disasters aren’t just threats to people and buildings, they can also erase history — by destroying rare archival documents. As a safeguard, scholars in Japan are digitizing the country’s centuries-old paper records, typically by taking a scan or photo of each page.

But while this method preserves the content in digital form, it doesn’t mean researchers will be able to read it. Millions of physical books and documents were written in an obsolete script called Kuzushiji, legible to fewer than 10 percent of Japanese humanities professors.

“We end up with billions of images which will take researchers hundreds of years to look through,” said Tarin Clanuwat, researcher at Japan’s ROIS-DS Center for Open Data in the Humanities. “There is no easy way to access the information contained inside those images yet.”

Extracting the words on each page into machine-readable, searchable form takes an extra step: transcription, which can be done either by hand or through a computer vision method called optical character recognition, or OCR.

Clanuwat and her colleagues are developing a deep learning OCR system to transcribe Kuzushiji writing — used for most Japanese texts from the 8th century to the start of the 20th — into modern Kanji characters.

Clanuwat said GPUs are essential for both training and inference of the AI.

“Doing it without GPUs would have been inconceivable,” she said. “GPU not only helps speed up the work, but it makes this research possible.”

Parsing a Forgotten Script

Before the standardization of the Japanese language in 1900 and the advent of modern printing, Kuzushiji was widely used for books and other documents. Though millions of historical texts were written in the cursive script, just a few experts can read it today.

Only a tiny fraction of Kuzushiji texts have been converted to modern scripts — and it’s time-consuming and expensive for an expert to transcribe books by hand. With an AI-powered OCR system, Clanuwat hopes a larger body of work can be made readable and searchable by scholars.

She collaborated on the OCR system with Asanobu Kitamoto from her research organization and Japan’s National Institute of Informatics, and Alex Lamb of the Montreal Institute for Learning Algorithms. Their paper was accepted in 2018 to the Machine Learning for Creativity and Design workshop at the prestigious NeurIPS conference.

Using a labeled dataset of 17th to 19th century books from the National Institute of Japanese Literature, the researchers trained their deep learning model on NVIDIA GPUs, including the TITAN Xp. Training the model took about a week, Clanuwat said, but “would be impossible” to train on CPU.

Kuzushiji has thousands of characters, with many occurring so rarely in datasets that it is difficult for deep learning models to recognize them. Still, the average accuracy of the researchers’ KuroNet document recognition model is 85 percent — outperforming prior models.

The newest version of the neural network can recognize more than 2,000 characters. For easier documents with fewer than 300 character types, accuracy jumps to about 95 percent, Clanuwat said. “One of the hardest documents in our dataset is a dictionary, because it contains many rare and unusual words.”

One challenge the researchers faced was finding training data representative of the long history of Kuzushiji. The script changed over the hundreds of years it was used, while the training data came from the more recent Edo period.

Clanuwat hopes the deep learning model could expand access to Japanese classical literature, historical documents and climatology records to a wider audience.

The post By the Book: AI Making Millions of Ancient Japanese Texts More Accessible appeared first on The Official NVIDIA Blog.

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.