Author: torontoai

[D] Named tensors in the new PyTorch version – what are the advantages compared to tsalib?

Written on October 11, 2019. Posted in Reddit MachineLearning.

In the new PyTorch version, there is experimental support for named tensors, which looks like a big deal for example when vectorizing a pipeline or something of the sort. The idea has been floating in the community for a while, I think it will greatly help with axis bugs. What I am not sure is what advantages does it bring compared to, say tsalib?

https://pytorch.org/docs/stable/named_tensor.html

http://nlp.seas.harvard.edu/NamedTensor

https://github.com/ofnote/tsalib

submitted by /u/dev-ai
[link] [comments]

[D] Why is L2 preferred over L1 Regularization?

Written on October 11, 2019. Posted in Reddit MachineLearning.

I understand L1 regularization induces sparsity, and is thus, good for cases when it’s required.

But In normal use cases, what are the benefits of using L2 over L1? If it’s just that weights should be smaller, then why can’t we use L4 for example?

I’ve seen mentions of L2 capturing energy, Euclidean distance and being rotation invariant. Could one explain these more explicitly as to how this happens?

submitted by /u/tshrjn
[link] [comments]

[D] Learning with “noisy data” (but perfect labels)

Written on October 11, 2019. Posted in Reddit MachineLearning.

There are many works that deal with noisy labels, but has the problem of unreliable data (but reliable labels) been studied? In other words, problems where the data to be classified is imperfect and not always sufficient to determine the class label.

An example would be a model that predicts the city in which a photo was taken. Ground truth labels would be perfect thanks to GPS metadata. If the photo contains the Eiffel Tower, we can predict that the city is Paris. But many pictures contain no useful information; for example a photo of a dog or a McDonald’s is nearly useless for determining the city.

How best to train a classifier when such “noisy examples” (for lack of a better term) are very common?

submitted by /u/viviandefeater
[link] [comments]

[R] Looking for an ML platform that also allows for integration with business users?

Written on October 10, 2019. Posted in Reddit MachineLearning.

I have the following overall requirement for an ML platform:

Ability for the data engineering team to build pipelines and integrate with ERP apps (hybrid on-prem/cloud), and run and monitor models in production, and store results of the models.
Ability for the data science team to perform EDA and run experiments, and then push models to production as needed.
Ability for business users (who are not technical and do not know how to code) to interact with the data, not just view reports: Perform Excel like calculations, override predictions that they disagree with, run what-if simulations, etc….and then commit any changes back to the data store.

I have seen multiple ML platforms that provide the first two components, but the business user part is always just a dashboarding capability, not a real interface like the one I described.

Does anybody provide anything like this?

submitted by /u/AlexSnakeKing
[link] [comments]

[R] [D] NLP, Any papers on text summarization on very long (arbitrary length) text?

Written on October 10, 2019. Posted in Reddit MachineLearning.

Hi, I’m catching up on the text summarization scene and most of the papers I have seen are using the CNN,newsroom,xsum datasets; but the max document size for any of these seem to be ~1000 tokens. Are there any papers that deal with very long (or arbitrary) document lengths?

As I understand it, most of the SOTA now is transformer based and they are bound by the # of positional embeddings in use.

submitted by /u/natural_language_guy
[link] [comments]

Exploring Massively Multilingual, Massive Neural Machine Translation

Written on October 10, 2019. Posted in Google.

Posted by Ankur Bapna, Software Engineer and Orhan Firat, Research Scientist, Google Research

“… perhaps the way [of translation] is to descend, from each language, down to the common base of human communication — the real but as yet undiscovered universal language — and then re-emerge by whatever particular route is convenient.” — Warren Weaver, 1949

Over the last few years there has been enormous progress in the quality of machine translation (MT) systems, breaking language barriers around the world thanks to the developments in neural machine translation (NMT). The success of NMT however, owes largely to the great amounts of supervised training data. But what about languages where data is scarce, or even absent? Multilingual NMT, with the inductive bias that “the learning signal from one language should benefit the quality of translation to other languages”, is a potential remedy.

Multilingual machine translation processes multiple languages using a single translation model. The success of multilingual training for data-scarce languages has been demonstrated for automatic speech recognition and text-to-speech systems, and by prior research on multilingual translation [1,2,3]. We previously studied the effect of scaling up the number of languages that can be learned in a single neural network, while controlling the amount of training data per language. But what happens once all constraints are removed? Can we train a single model using all of the available data, despite the huge differences across languages in data size, scripts, complexity and domains?

In “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges” and follow-up papers [4,5,6,7], we push the limits of research on multilingual NMT by training a single NMT model on 25+ billion sentence pairs, from 100+ languages to and from English, with 50+ billion parameters. The result is an approach for massively multilingual, massive neural machine translation (M4) that demonstrates large quality improvements on both low- and high-resource languages and can be easily adapted to individual domains/languages, while showing great efficacy on cross-lingual downstream transfer tasks.

Massively Multilingual Machine Translation
Though data skew across language-pairs is a great challenge in NMT, it also creates an ideal scenario in which to study transfer, where insights gained through training on one language can be applied to the translation of other languages. On one end of the distribution, there are high-resource languages like French, German and Spanish where there are billions of parallel examples, while on the other end, supervised data for low-resource languages such as Yoruba, Sindhi and Hawaiian, is limited to a few tens of thousands.

The data distribution over all language pairs (in log scale) and the relative translation quality (BLEU score) of the bilingual baselines trained on each one of these specific language pairs.

Once trained using all of the available data (25+ billion examples from 103 languages), we observe strong positive transfer towards low-resource languages, dramatically improving the translation quality of 30+ languages at the tail of the distribution by an average of 5 BLEU points. This effect is already known, but surprisingly encouraging, considering the comparison is between bilingual baselines (i.e., models trained only on specific language pairs) and a single multilingual model with representational capacity similar to a single bilingual model. This finding hints that massively multilingual models are effective at generalization, and capable of capturing the representational similarity across a large body of languages.

Translation quality comparison of a single massively multilingual model against bilingual baselines that are trained for each one of the 103 language pairs.

In our EMNLP’19 paper [5], we compare the representations of multilingual models across different languages. We find that multilingual models learn shared representations for linguistically similar languages without the need for external constraints, validating long-standing intuitions and empirical results that exploit these similarities. In [6], we further demonstrate the effectiveness of these learned representations on cross-lingual transfer on downstream tasks.

Visualization of the clustering of the encoded representations of all 103 languages, based on representational similarity. Languages are color-coded by their linguistic family.

Building Massive Neural Networks
As we increase the number of low-resource languages in the model, the quality of high-resource language translations starts to decline. This regression is recognized in multi-task setups, arising from inter-task competition and the unidirectional nature of transfer (i.e., from high- to low-resource). While working on better learning and capacity control algorithms to mitigate this negative transfer, we also extend the representational capacity of our neural networks by making them bigger by increasing the number of model parameters to improve the quality of translation for high-resource languages.

Numerous design choices can be made to scale neural network capacity, including adding more layers or making the hidden representations wider. Continuing our study on training deeper networks for translation, we utilized GPipe [4] to train 128-layer Transformers with over 6 billion parameters. Increasing the model capacity resulted in significantly improved performance across all languages by an average of 5 BLEU points. We also studied other properties of very deep networks, including the depth-width trade-off, trainability challenges and design choices for scaling Transformers to over 1500 layers with 84 billion parameters.

While scaling depth is one approach to increasing model capacity, exploring architectures that can exploit the multi-task nature of the problem is a very plausible complementary way forward. By modifying the Transformer architecture through the substitution of the vanilla feed-forward layers with sparsely-gated mixture of experts, we drastically scale up the model capacity, allowing us to successfully train and pass 50 billion parameters, which further improved translation quality across the board.

Translation quality improvement of a single massively multilingual model as we increase the capacity (number of parameters) compared to 103 individual bilingual baselines.

Making M4 Practical
It is inefficient to train large models with extremely high computational costs for every individual language, domain or transfer task. Instead, we present methods [7] to make these models more practical by using capacity tunable layers to adapt a new model to specific languages or domains, without altering the original.

Next Steps
At least half of the 7,000 languages currently spoken will no longer exist by the end of this century^*. Can multilingual machine translation come to the rescue? We see the M4 approach as a stepping stone towards serving the next 1,000 languages; starting from such multilingual models will allow us to easily extend to new languages, domains and down-stream tasks, even when parallel data is unavailable. Indeed the path is rocky, and on the road to universal MT many promising solutions appear to be interdisciplinary. This makes multilingual NMT a plausible test bed for machine learning practitioners and theoreticians interested in exploring the annals of multi-task learning, meta-learning, training dynamics of deep nets and much more. We still have a long way to go.

Acknowledgements
This effort is built on contributions from Naveen Arivazhagan, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Chen, Yuan Cao, Yanping Huang, Sneha Kudugunta, Isaac Caswell, Aditya Siddhant, Wei Wang, Roee Aharoni, Sébastien Jean, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen and Yonghui Wu. We would also like to acknowledge support from the Google Translate, Brain, and Lingvo development teams, Jakob Uszkoreit, Noam Shazeer, Hyouk Joong Lee, Dehao Chen, Youlong Cheng, David Grangier, Colin Raffel, Katherine Lee, Thang Luong, Geoffrey Hinton, Manisha Jain, Pendar Yousefi and Macduff Hughes.

* The Cambridge Handbook of Endangered Languages (Austin and Sallabank, 2011). ↩

[D] What to expect from technical case study interview?

Written on October 10, 2019. Posted in Reddit MachineLearning.

I just got an offer for a phone interview for a machine learning intership and part of it is a technical case study. I have looked up examples online and they seem complex. All of my knowledge of machine learning is self taught and more hands on so I really only know HOW to apply machine learning techniques and don’t know much about WHEN to apply them. Can people provide some things interviewers are looking for in this case study and perhaps some material I should learn before hand. Thank you!

submitted by /u/the_lonk55
[link] [comments]

NVIDIA Collaborates with UCSF on AI Center for Radiology

Written on October 10, 2019. Posted in NVIDIA.

University of California, San Francisco, one of the world’s top medical schools for research, unveiled today a center to develop AI tools for clinical radiology — leveraging the NVIDIA Clara healthcare toolkit and the powerful NVIDIA DGX-2 AI system.

As a founding partner of the Center for Intelligent Imaging, known as ci2, NVIDIA is working with UCSF to foster an ecosystem of industry and academic collaboration in healthcare. In addition to contributing technology tools, NVIDIA developers will work with UCSF researchers on several AI projects, including brain tumor segmentation, liver segmentation and clinical deployment.

Integrating AI into the radiology workflow can help medical institutions keep pace with an ever-growing stream of medical imaging data. The number of images acquired during common studies like MRI and CT scans has swelled in recent years from tens of images each to hundreds or thousands. It’s a challenge compounded by a rise in the number of patients being imaged.

“It makes for an absolutely overwhelming volume of information to digest,” said Christopher Hess, chair of the UCSF Department of Radiology and Biomedical Imaging. “We’re hoping to use AI to help radiologists better navigate and interact with data, to derive more meaning out of images, and to improve the value of medical imaging for the individual patient.”

Hess says the university also plans to use AI for quantitative imaging, predictive analytics and resource scheduling — giving medical professionals access to insights that were once too time-consuming to calculate or impossible to find without deep learning methods.

UCSF Adopts NVIDIA Clara and DGX Systems

DGX-2 at UCSF AI for radiology center — UCSF’s Center for Intelligent Imaging will use the NVIDIA DGX-2 AI system to power several radiology tools. From right to left: the author, UCSF’s Hess, Sharmila Majumdar, a professor and vice chair of the radiology department at UCSF, and Mona Flores, global lead for hospitals and clinical partnerships at NVIDIA.

A leading healthcare institution with more than a century of work in radiology, UCSF has long been an innovator in medical imaging. Its radiology department collaborated with industry partners in the 1970s to develop the first MRI systems,now used worldwide to diagnose a variety of conditions, including spinal fractures and brain and heart diseases.

Close to half a million imaging studies are performed at UCSF annually. The medical center has amassed at least a petabyte of imaging data over the years — ranging from small X-ray images to much larger PET/MRI studies. These bigger files can take up gigabytes or now even terabytes of data storage.

Training deep learning models on these massive datasets requires immense computational power. By adopting the high-performance NVIDIA DGX-2, Hess estimates UCSF researchers could cut the time to train AI models from months or days down to hours or even minutes.

The DGX-2 will also enable UCSF to harness multimodal data sources to develop more sophisticated deep learning models to accelerate the radiology workflow.

“We’re interested in integrating data from not only imaging, but also from medical records, genetics and other information sources in the healthcare system,” said Hess. “When we talk about computation at scale, we need access to a high-throughput, highly efficient and computationally sophisticated platform like DGX-2 to accelerate our development cycle.”

UCSF has also adopted the NVIDIA Clara developer toolkit for medical imaging. Its researchers are using the Clara Train SDK to train deep learning models that reconstruct and analyze CT and MRI scans, and the Clara Deploy SDK to optimize integration with the center’s clinical infrastructure.

“We’re really focusing on developing ways in which to implement algorithms from the modality to the reading room,” said Hess. “NVIDIA Clara will be an essential platform to create this ecosystem to implement, validate and use AI algorithms.”

Weaving AI Into the Clinical Workflow

NVIDIA and UCSF are working together to develop AI models that can be deployed into the medical center’s imaging workflow, starting with deep learning models to analyze scans of the brain and liver.

When doctors treat brain cancer patients, MRI scans provide critical information about how a tumor is responding to radiation treatment and chemotherapy. Today, radiologists analyze scans visually with manual tools. AI can instead provide a quantitative measurement, calculating the precise volume of a tumor. By tracking how a tumor’s volume changes from scan to scan, clinicians can better assess how a patient is responding to treatment over time.

The team is also developing an AI model that can segment and measure the left and right lobes of an organ donor’s liver from CT images. These metrics are critical for doctors planning liver transplants from a living donor to a patient, and take up to two hours to delineate and compute by hand. With deep learning, Hess estimates, it could be done in seconds.

UCSF and NVIDIA will also collaborate on tools that could improve the quality, efficiency and reproducibility of medical imaging exams. AI can be used to denoise medical images so that scans can be taken faster and are less susceptible to patient motion during scanning.

Beyond the day-to-day medical imaging workflow, the collaboration will explore predictive analytics tools to provide radiologists and other physicians insights from imaging scans, medical records and even patient sensors.

Additional deep learning algorithms will be created to improve operational efficiency at UCSF, helping its technologists optimize how the medical center’s fleet of imaging scanners is used.

The post NVIDIA Collaborates with UCSF on AI Center for Radiology appeared first on The Official NVIDIA Blog.