Author: torontoai

Introducing the Next Generation of On-Device Vision Models: MobileNetV3 and MobileNetEdgeTPU

Written on November 12, 2019. Posted in Google.

Posted by Andrew Howard, Software Engineer and Suyog Gupta, Silicon Engineer, Google Research

On-device machine learning (ML) is an essential component in enabling privacy-preserving, always-available and responsive intelligence. This need to bring on-device machine learning to compute and power-limited devices has spurred the development of algorithmically-efficient neural network models and hardware capable of performing billions of math operations per second, while consuming only a few milliwatts of power. The recently launched Google Pixel 4 exemplifies this trend, and ships with the Pixel Neural Core that contains an instantiation of the Edge TPU architecture, Google’s machine learning accelerator for edge computing devices, and powers Pixel 4 experiences such as face unlock, a faster Google Assistant and unique camera features. Similarly, algorithms, such as MobileNets, have been critical for the success of on-device ML by providing compact and efficient neural network models for mobile vision applications.

Today we are pleased to announce the release of source code and checkpoints for MobileNetV3 and the Pixel 4 Edge TPU-optimized counterpart MobileNetEdgeTPU model. These models are the culmination of the latest advances in hardware-aware AutoML techniques as well as several advances in architecture design. On mobile CPUs, MobileNetV3 is twice as fast as MobileNetV2 with equivalent accuracy, and advances the state-of-the-art for mobile computer vision networks. On the Pixel 4 Edge TPU hardware accelerator, the MobileNetEdgeTPU model pushes the boundary further by improving model accuracy while simultaneously reducing the runtime and power consumption.

Building MobileNetV3
In contrast with the hand-designed previous version of MobileNet, MobileNetV3 relies on AutoML to find the best possible architecture in a search space friendly to mobile computer vision tasks. To most effectively exploit the search space we deploy two techniques in sequence — MnasNet and NetAdapt. First, we search for a coarse architecture using MnasNet, which uses reinforcement learning to select the optimal configuration from a discrete set of choices. Then we fine-tune the architecture using NetAdapt, a complementary technique that trims under-utilized activation channels in small decrements. To provide the best possible performance under different conditions we have produced both large and small models.

Comparison of accuracy vs. latency for mobile models on the ImageNet classification task using the Google Pixel 4 CPU.

MobileNetV3 Search Space
The MobileNetV3 search space builds on multiple recent advances in architecture design that we adapt for the mobile environment. First, we introduce a new activation function called hard-swish (h-swish) which is based on the Swish nonlinearity function. The critical drawback of the Swish function is that it is very inefficient to compute on mobile hardware. So, instead we use an approximation that can be efficiently expressed as a product of two piecewise linear functions.

Next we introduce the mobile-friendly squeeze-and-excitation block, which replaces the classical sigmoid function with a piecewise linear approximation.

Combining h-swish plus mobile-friendly squeeze-and-excitation with a modified version of the inverted bottleneck structure introduced in MobileNetV2 yielded a new building block for MobileNetV3.

MobileNetV3 extends the MobileNetV2 inverted bottleneck structure by adding h-swish and mobile friendly squeeze-and-excitation as searchable options.

These parameters defined the search space used in constructing MobileNetV3:

Size of expansion layer
Degree of squeeze-excite compression
Choice of activation function: h-swish or ReLU
Number of layers for each resolution block

We also introduced a new efficient last stage at the end of the network that further reduced latency by 15%.

MobileNetV3 Object Detection and Semantic Segmentation
In addition to classification models, we also introduced MobileNetV3 object detection models, which reduced detection latency by 25% relative to MobileNetV2 at the same accuracy for the COCO dataset.

In order to optimize MobileNetV3 for efficient semantic segmentation, we introduced a low latency segmentation decoder called Lite Reduced Atrous Spatial Pyramid Pooling (LR-SPP). This new decoder contains three branches, one for low resolution semantic features, one for higher resolution details, and one for light-weight attention. The combination of LR-SPP and MobileNetV3 reduces the latency by over 35% on the high resolution Cityscapes Dataset.

MobileNet for Edge TPUs
The Edge TPU in Pixel 4 is similar in architecture to the Edge TPU in the Coral line of products, but customized to meet the requirements of key camera features in Pixel 4. The accelerator-aware AutoML approach substantially reduces the manual process involved in designing and optimizing neural networks for hardware accelerators. Crafting the neural architecture search space is an important part of this approach and centers around the inclusion of neural network operations that are known to improve hardware utilization. While operations such as squeeze-and-excite and swish non-linearity have been shown to be essential in building compact and fast CPU models, these operations tend to perform suboptimally on Edge TPU and hence are excluded from the search space. The minimalistic variants of MobileNetV3 also forgo the use of these operations (i.e., squeeze-and-excite, swish, and 5×5 convolutions) to allow easier portability to a variety of other hardware accelerators such as DSPs and GPUs.

The neural network architecture search, incentivized to jointly optimize the model accuracy and Edge TPU latency, produces the MobileNetEdgeTPU model that achieves lower latency for a fixed accuracy (or higher accuracy for a fixed latency) than existing mobile models such as MobileNetV2 and minimalistic MobileNetV3. Compared with the EfficientNet-EdgeTPU model (optimized for the Edge TPU in Coral), these models are designed to run at a much lower latency on Pixel 4, albeit at the cost of some loss in accuracy.

Although reducing the model’s power consumption was not a part of the search objective, the lower latency of the MobileNetEdgeTPU models also helps reduce the average Edge TPU power use. The MobileNetEdgeTPU model consumes less than 50% the power of the minimalistic MobileNetV3 model at comparable accuracy.

Left: Comparison of the accuracy on the ImageNet classification task between MobileNetEdgeTPU and other image classification networks designed for mobile when running on Pixel4 Edge TPU. MobileNetEdgeTPU achieves higher accuracy and lower latency compared with other models. Right: Average Edge TPU power in Watts for different classification models running at 30 frames per second (fps).

Objection Detection Using MobileNetEdgeTPU
The MobileNetEdgeTPU classification model also serves as an effective feature extractor for object detection tasks. Compared with MobileNetV2 based detection models, MobileNetEdgeTPU models offer a significant improvement in model quality (measured as the mean average precision; mAP) on the COCO14 minival dataset at comparable runtimes on the Edge TPU. The MobileNetEdgeTPU detection model has a latency of 6.6ms and achieves mAP score of 24.3, while MobileNetV2-based detection models achieve an mAP of 22 and takes 6.8ms per inference.

The Need for Hardware-Aware Models
While the results shown above highlight the power, performance, and quality benefits of MobileNetEdgeTPU models, it is important to note that the improvements arise due to the fact that these models have been customized to run on the Edge TPU accelerator.
MobileNetEdgeTPU when running on a mobile CPU delivers inferior performance compared with the models that have been tuned specifically for mobile CPUs (MobileNetV3). MobileNetEdgeTPU models perform a much greater number of operations, and so, it is not surprising that they run slower on mobile CPUs, which exhibit a more linear relationship between a model’s compute requirements and the runtime.

MobileNetV3 is still the best performing network when using mobile CPU as the deployment target.

For Researchers and Developers
The MobileNetV3 and MobileNetEdgeTPU code, as well as both floating point and quantized checkpoints for ImageNet classification, are available at the MobileNet github page. Open source implementation for MobileNetV3 and MobileNetEdgeTPU object detection is available in the Tensorflow Object Detection API. Open source implementation for MobileNetV3 semantic segmentation is available in TensorFlow through DeepLab.

Acknowledgements:
This work is made possible through a collaboration spanning several teams across Google. We’d like to acknowledge contributions from Berkin Akin, Okan Arikan, Gabriel Bender, Bo Chen, Liang-Chieh Chen, Grace Chu, Eddy Hsu, John Joseph, Pieter-jan Kindermans, Quoc Le, Owen Lin, Hanxiao Liu, Yun Long, Ravi Narayanaswami, Ruoming Pang, Mark Sandler, Mingxing Tan, Vijay Vasudevan, Weijun Wang, Dong Hyuk Woo, Dmitry Kalenichenko, Yunyang Xiong, Yukun Zhu and support from Hartwig Adam, Blaise Agüera y Arcas, Chidu Krishnan and Steve Molloy.

At RSNA, Healthcare Startups Shine Spotlight on AI for Radiology

Written on November 12, 2019. Posted in NVIDIA.

Radiology leaders have gathered for over 100 years running at RSNA, the annual meeting of the Radiological Society of North America, to discuss the industry’s latest challenges and opportunities. In recent years, AI in medical imaging has become a key focus — with startups at the center of the conversation.

Startups around the world are building AI solutions for a universal problem in medical imaging: limited time. Faced with rising numbers of patients being imaged, as well as the growing size of MRI and CT scans, radiologists must interpret one image every three or four seconds to keep up with the workload.

Agile startups are well-suited to tackle the demands of a rapidly evolving field like deep learning. In medical imaging, many are using AI to develop applications that target areas that slow radiologists down.

Healthcare startups raised more than $26 billion in venture capital funding last year and are partnering with major research institutions, hospitals and medical instrument manufacturers. They’re also receiving regulatory validation for clinical use: over three dozen healthcare AI startups have FDA clearance for algorithms that detect conditions including cancer, stroke and brain hemorrhages from medical scans.

At RSNA 2019, taking place in Chicago, Dec. 1-6, more than 50 attending startups are part of the NVIDIA Inception virtual accelerator program, which provides AI training and tools to fuel the growth of thousands of companies building GPU-powered applications, including over 700 healthcare startups.

Scan the Show for NVIDIA Inception Startups

Accelerated by NVIDIA GPUs, AI can speed up the acquisition, annotation and analysis of medical images to more quickly spot critical cases. It can also give experts quantitative insights that are too time-consuming to acquire using traditional methods.

Dozens of Inception companies will share their medical imaging applications for every phase of the radiology workflow at the RSNA AI Theater and the NVIDIA booth, including:

Higher-quality scans: Subtle Medical has developed the first and only AI software solutions FDA-cleared for medical imaging enhancement — SubtlePET for faster PET exams and SubtleMR for higher-quality MRI exams. Its software smoothly integrates with any scanner to enhance images during acquisition without altering the existing workflow, increasing efficiency and patient comfort. The company uses the NVIDIA DGX Station and NVIDIA DGX-1 to accelerate training, and NVIDIA T4 GPUs for inference.
Enabling AI-assisted annotation: TrainingData.io’s web platform helps researchers and companies manage their data labeling workflows, running on NVIDIA T4 GPUs for inference in Google Cloud. The startup leverages AI-assisted segmentation tools through the NVIDIA Clara Train SDK to label medical images that in turn train deep learning models for radiologists. And Palo Alto-based Fovia Ai, Inc. provides its customers with AI-assisted annotation powered by the NVIDIA Clara SDK in its tools for 2D and 3D visualization of medical images, which can seamlessly integrate into the clinical workflow.
Analyzing medical images: Tokyo startup LPIXEL develops deep learning image analysis tools using NVIDIA GPUs, including one to identify brain aneurysms from MRA, recently approved for clinical use in Japan. For lung tumor detection, China-based InferVISION’s AI tools identify and label lung nodules from CT scans in under 30 seconds. The company uses NVIDIA T4 GPUs for inference, achieving speedups of 4x over CPUs.
Processing surgical video: Doctors performing minimally invasive surgeries rely on live video feeds from tiny cameras to view the area they’re operating on. Kaliber Labs is building deep learning models that interpret these video feeds in real time for orthopedic surgery, identifying and measuring aspects of the patient’s anatomy and pathology, and providing intraoperative guidance to surgeons. The startup is using NVIDIA RTX GPUs for training and the NVIDIA Jetson AGX Xavier AI computing module for inference at the edge.

Rounding Out RSNA

In NVIDIA booth 10939 and beyond, we’ll be exhibiting the latest AI tools for medical imaging, from training to deployment.

We’ll also showcase demos of the NVIDIA Clara medical imaging platform, which combines NVIDIA GPU hardware and the NVIDIA Clara software development kit to accelerate the training and inference of deep learning applications for healthcare. The platform includes APIs for AI-assisted annotation of medical images, a transfer learning toolkit, a medical model development environment and tools for AI deployment at scale.

A Clara developer meetup will be held on Tuesday, Dec. 3 at 11:30 a.m. CT.

The following RSNA panels feature NVIDIA speakers:

Integrating the Healthcare Enterprise on Fast Healthcare Interoperability Resources — Monday, Dec. 2, at 8:30 a.m. CT
Deep Learning: How to Get Started — Thursday, Dec. 5, at 8:30 a.m. CT
Commercial Development and Deployment of Deep Learning Technology — Thursday, Dec. 5, at 4:30 p.m. CT

For more information, check out the full RSNA agenda.

The post At RSNA, Healthcare Startups Shine Spotlight on AI for Radiology appeared first on The Official NVIDIA Blog.

[Project] Deepdos

Written on November 12, 2019. Posted in Reddit MachineLearning.

Description

Hello, r/MachineLearning! Over the course of the last 2 months I’ve been working on my first major machine learning project called, “Deepdos” in my free time outside of school and work. Deepdos is a network tool that provides analysis and in the future mitigation of all network traffic coming over whatever network adapter you specify. The analysis utilizes a logistic regression model that classifies traffic as either safe or malicious based on aggregated packet capture data using the CICFlowmeter (The people that created the tool are also the same people that created the dataset used for training). The mitigation, which will only be for Linux based systems, will create and manage firewall rules written directly to iptables. While the name includes “deep”, there is actually no deep learning involved at all. (At least not yet)

The project source code can be found here: deepdos

Currently the project is listed as being in a pre-alpha state, as there are a lot of milestones that need to be hit before I can consider this a stable/production ready project. Hopefully, some of you can help me get there! Currently, I’m looking for constructive feedback on the projects current state, additions that I should be making, and really anything else that can help me grow this project into something that can be useful for companies. Here is a snapshot of the project without having to look at any of the code:

Where I’m at:

Currently utilize a logistic regression model that is trained on 200,000 samples of network traffic with 100,000 being “normal” network traffic and 100,000 being malicious.
Packet capture data aggregation via tcpdump. Currently, I listen for very short bursts of time for development but will be ramping this time up to reflect the communication between two devices more accurately.
Published on Pypi (Not stable, yet).
I’ve rebuilt the structure of the application 3 times right now for scalability and think I finally developed a system

Where I’m trying to go:

I’m currently thinking about how I can develop a robust testing system so that this project can continue to scale with reliability.
Training on the full data set which is comprised of roughly 57 million samples, as I’m currently only using 200,000 of those samples. :[
Experimenting with different machine and deep learning models to see how I can maximize performance of the classification and of the overall application.

Working on this project has been quite the learning experience and honestly, a really enjoyable time. I really appreciate those of you that took time out of your day to read this and hope that I can garner the opinions and expertise of those of you from this thread to make this into something awesome.

submitted by /u/C3NZ
[link] [comments]

[R] Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

Written on November 12, 2019. Posted in Reddit MachineLearning.

Paper: https://arxiv.org/abs/1911.03863

Abstract: Self-supervised pre-training of transformer models has shown enormous success in improving performance on a number of downstream tasks. However, fine-tuning on a new task still requires large amounts of task-specific labelled data to achieve good performance. We consider this problem of learning to generalize to new tasks with few examples as a meta-learning problem. While meta-learning has shown tremendous progress in recent years, its application is still limited to simulated problems or problems with limited diversity across tasks. We develop a novel method, LEOPARD, which enables optimization-based meta-learning across tasks with different number of classes, and evaluate existing methods on generalization to diverse NLP classification tasks. LEOPARD is trained with the state-of-the-art transformer architecture and shows strong generalization to tasks not seen at all during training, with as few as 8 examples per label. On 16 NLP datasets, across a diverse task-set such as entity typing, relation extraction, natural language inference, sentiment analysis, and several other text categorization tasks, we show that LEOPARD learns better initial parameters for few-shot learning than self-supervised pre-training or multi-task training, outperforming many strong baselines, for example, increasing F1 from 49% to 72%.

submitted by /u/ai_reader
[link] [comments]

[1910.13051] ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

Written on November 12, 2019. Posted in Reddit MachineLearning.

submitted by /u/jwuphysics
[link] [comments]

[P] Article: Curiosity through random network distillation with Montezuma’s revenge [Deep Reinforcement learning course]

Written on November 12, 2019. Posted in Reddit MachineLearning.

Hello everyone,

We’ve just published the new article of Deep reinforcement Learning course where we study Open AI’s Paper “Exploration by Random Network Distillation

THE ARTICLE: https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-random-network-distillation-488ffd8e5938

The bonus is that we give you a trained model on Montezuma’s Revenge during 21hours with 128 parallel environments in a Tesla K80.

Let me know what you think about this article.

PS: For people who follow me and follow Deep Reinforcement Learning Course I know that I’m totally late on this article (it was supposed to be published some months ago…) to be totally transparent with you, the publication rate is slow because since March I’m a RL research scientist at Dataiku so I have a lot of things to do. But stay tuned I’m currently working very hard on updating everything (especially the PR on github) and things will be announced in the 2 next weeks (and yes it will stay totally free and open source 🎉).

Cheers!

submitted by /u/cranthir_
[link] [comments]

[P] `gpt2-client` is now on Buy me a coffee!

Written on November 12, 2019. Posted in Reddit MachineLearning.

Hey y’all!

The past couple of days have been awesome. `gpt2-client` finally hit 12.5K downloads worldwide and we’re growing faster than ever. I’ve added a Buy me a coffee! link to the README in case any of you would donate to the project. Your continued support motivates me to continue building such nifty tools and your donations mean a lot to me!

Again, if you haven’t checked gpt2-client out, do visit https://github.com/rish-16/gpt2client . I’m now accepting feature requests (some have already been incorporated in!). So, feel free to drop me a message down below or on a Feature Requests template on GitHub.

Cheers!

submitted by /u/rish-16
[link] [comments]

UofT breaks ground on the Schwartz Reisman Innovation Centre, future Vector Institute home

Written on November 12, 2019. Posted in Toronto AI Organizations, Vector Institute.

[R] Finding a human-like classifier

Written on November 12, 2019. Posted in Reddit MachineLearning.

Paper: https://openreview.net/forum?id=BJeGFs9FsH

Abstract:

There were many attempts to explain the trade-off between accuracy and adversarial robustness. However, there was no clear understanding of the behaviors of a robust classifier which has human-like robustness.

We argue (1) why we need to consider adversarial robustness against varying magnitudes of perturbations not only focusing on a fixed perturbation threshold, (2) why we need to use different method to generate adversarially perturbed samples that can be used to train a robust classifier and measure the robustness of classifiers and (3) why we need to prioritize adversarial accuracies with different magnitudes.

We introduce Lexicographical Genuine Robustness (LGR) of classifiers that combines the above requirements. We also suggest a candidate oracle classifier called “Optimal Lexicographically Genuinely Robust Classifier (OLGRC)” that prioritizes accuracy on meaningful adversarially perturbed examples generated by smaller magnitude perturbations. The training algorithm for estimating OLGRC requires lexicographical optimization unlike existing adversarial training methods. To apply lexicographical optimization to neural network, we utilize Gradient Episodic Memory (GEM) which was originally developed for continual learning by preventing catastrophic forgetting.

TL;DR: We try to design and train a classifier whose adversarial robustness is more resemblance to robustness of human.

submitted by /u/hjk92r
[link] [comments]

[R] Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates

Written on November 12, 2019. Posted in Reddit MachineLearning.

submitted by /u/IranzoSanchez
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Author: torontoai

Introducing the Next Generation of On-Device Vision Models: MobileNetV3 and MobileNetEdgeTPU

At RSNA, Healthcare Startups Shine Spotlight on AI for Radiology

Scan the Show for NVIDIA Inception Startups

Rounding Out RSNA

[Project] Deepdos

[R] Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

[1910.13051] ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

[P] Article: Curiosity through random network distillation with Montezuma’s revenge [Deep Reinforcement learning course]

[P] `gpt2-client` is now on Buy me a coffee!

UofT breaks ground on the Schwartz Reisman Innovation Centre, future Vector Institute home

[R] Finding a human-like classifier

[R] Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates