Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[N] Open-Unmix for Music Separation

📜Paper: https://joss.theoj.org/papers/571753bc54c5d6dd36382c3d801de41d
🔊Demo: https://open.unmix.app
🔥PyTorch: https://github.com/sigsep/open-unmix-pytorch
🔻NNabla: https://github.com/sigsep/open-unmix-nnabla
🔶TF2: t.b.a.
📓Colab: https://colab.research.google.com/drive/1mijF0zGWxN-KaxTnd0q6hayAlrID5fEQ

It is our great pleasure to announce the release of Open-unmix, a MIT-licensed python implementation for DNN-based music separation.

In the recent years, deep learning-based systems could break a long-standing crystal ceiling, and finally allow high-quality music separation. This provoked a raising interest from both the industry and the machine learning community (like /r/ML)

However, until now, no open-source implementation was available that matches the performance of the best systems proposed more than four years ago. This lead to a waste of time from both the points of view of sheer performance optimization and scientific comparison with the state of the art. Not being able to reproduce state of the art performance makes it difficult to clearly identify the sources for discrepancies and rooms for improvement.

In this context, we release Open-Unmix (UMX) as closing this gap by providing a reference implementation for DNN-based music separation. It serves two main purposes. First, it is intended to academic researchers for serving as a baseline method that is easy to compare to and build upon. Second, the availability of a pre-trained model allows bringing music separation to the enthusiastic end users and artists.

Paper

Open-unmix is presented in a paper that has just been published in the Journal of Open Source Software. You may download the paper PDF here

Code

Open-unmix comes in several DNN frameworks:

  • Pytorch
  • NNabla
  • tensorflow version will be released as soon as Tensorflow 2.0 is out.

Website

  • we provide extend documentation and further demos on the sigsep website.

https://sigsep.github.io/open-unmix/

Datasets

Open-unmix has been especially designed to combine well with the following datasets:

  • MUSDB18 has become one of the most popular dataset in Source Separation and MIR. We provide full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems.
  • MUSDB18-HQ: together with Open-Unmix, we also released an additional flavor of the dataset for models that aim to predict high bandwidth of up to 22 kHz. Other than that, MUSDB18-HQ is identical to MUSDB18.

=> Both datasets are available at https://sigsep.github.io/datasets/musdb.html

  • Open-unmix also offers a variety of template dataset structures that should be appropriate for many other use cases

Note:

If you want to compare separation models to existing source separation literature or if want compare to SiSEC 2018 participants, please use the standard MUSDB18 dataset, instead.

Pre-trained models

We provide pre-trained models trained on both MUSDB18 and MUSDB18-HQ that reach state-of-the-art performance of 6.32 dB SDR (median of medians) on vocals on MUSDB18 test data. This significantly outperforms any model we are aware of that was trained on MUSDB18 only.

The pre-trained models are automatically bundled/downloaded when using the pytorch implementation.

Further information for both models such as evaluation scores can be downloaded from zenodo:

Tutorial

Open-unmix was recently proposed during a tutorial held at EUSIPCO 2019. This features:

  • A recent overview into current source separation method with a focus on deep learning
  • A lecture on spectrogram models and wiener filtering
  • Visualizations and results of Open-Unmix compared to state-of-the-art

The slides of the tutorial as well as self-contained colab notebooks can be found on the tutorial site.

Related tools

Open-unmix is part of a whole ecosystem enabling easy research on source separation for Python users. Several distinct and independent projects were released in the recent years in this effort to make it possible for researchers to reproduce state of the art performance in this domain.

norbert

A reliable python package that implements the multichannel wiener filter and related filtering methods.

https://github.com/sigsep/norbert

musdb

We released the new version 0.3.0 of our popular musdb tools. This releases makes it simpler to use musdb inside your data loading framework thus we pro

https://github.com/sigsep/sigsep-mus-db

museval

museval makes it easy to compare the performance of any new method under investigation to both Open-unmix and the participants of SiSEC18.

https://github.com/sigsep/sigsep-mus-eval

UMX-Pro

Please note that we are also working on some version of open-unmix that has been trained on a significantly larger dataset and that achieves unprecedented separation performance. Please feel free to contact us for demonstrations / industrial collaborations / licensing on this matter.

We look forward to your feedback and we hope that you will find Open-unmix useful!

submitted by /u/faroit
[link] [comments]

[P] Does any framework have native Fourier-based CNNs?

I’m looking to do some experiments using the Fast Fourier Transform to do CNNs. From what I’ve seen, many common frameworks (Chainer, Keras, PyTorch, TensorFlow) don’t provide support for this. They typically implement a FFT or DFT function but not a FT convolutional layer. I could implement it from scratch, but there’s some finicky implementation aspects I was hoping to avoid worrying about.

Does any framework have native Fourier-based CNNs? Alternatively, pointers to SOTA implementations on GitHub I can use for reference would be highly appreciated. Ideally in Chainer, as that’s the framework I have experience with.

submitted by /u/StellaAthena
[link] [comments]

[P] learn2learn: A PyTorch Meta Learning Library

Hello /r/ML,

We are pleased to share with you our meta-learning library, that started as a project at the PyTorch hackathon.

learn2learn is a PyTorch library for all things meta-learning. Our goal is to support as many meta-learning algorithms as possible (be it few-shots, meta-descent, or meta-RL) and to enable researchers to develop better methods and easily compare against existing literature.​

Our current features include:​

  • Modular API: implement your own training loops with our low-level utilities.
  • Provides various meta-learning algorithms (e.g. MAML, FOMAML, MetaSGD, ProtoNets, DiCE)
  • Task generator with unified API, compatible with torchvision, torchtext, torchaudio, and cherry
  • Provides standardized meta-learning tasks for vision (Omniglot, mini-ImageNet), reinforcement learning (Particles, Mujoco), and even text (news classification).
  • 100% compatible with PyTorch — use your own modules, datasets, or libraries!

If this is of interest to you, have a look at the following links:​

​Let us know what you think and how we can help you in your research!

​PS: learn2learn was also accepted as a poster to the PyTorch Dev Conference, so you’ll know all about it there!

submitted by /u/praat33k
[link] [comments]

[R] Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks

Abstract – Object detection, the computer vision task dealing with detecting instances of objects of a certain class (e.g., ’car’, ’plane’, etc.) in images, attracted a lot of attention from the community during the last six years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolutional neural networks (DCNNs). This article reviews the recent literature on object detection with deep CNN, in a comprehensive way. This study covers not only the design decisions made in modern deep (CNN) object detectors, but also provides an in-depth perspective on the set of challenges currently faced by the computer vision community, as well as some complementary and new directions on how to overcome them. In its last part it goes on to show how object detection can be extended to other modalities and conducted under different constraints. This survey also reviews in its appendix the public datasets and associated state-of-the-art algorithms.

Page -> https://arxiv.org/abs/1809.03193

PDF -> https://arxiv.org/pdf/1809.03193.pdf

submitted by /u/gnavihs
[link] [comments]

[P] The age of transformers & Understanding text with BERT

This is a two part blog post on a Project that aims to do question answering, using a pretrained BERT.

The first part teaches about Transformers, and the history that leads up to this Architecture. -> https://blog.scaleway.com/2019/building-a-machine-reading-comprehension-system-using-the-latest-advances-in-deep-learning-for-nlp/

The second part focuses on using a pre-trained BERT (in PyTorch) and how to do question answering. There’s code and you can try it on your own dataset easily 🙂

-> https://blog.scaleway.com/2019/understanding-text-with-bert/

submitted by /u/ilnmtlbnm
[link] [comments]

Cure for the Common Code: San Francisco Startup Uses AI to Automate Medical Coding

Doctors’ handwriting is notoriously difficult to read. Even more cryptic is medical coding — the process of turning a clinician’s notes into a set of alphanumeric codes representing every diagnosis and procedure.

Although this system is used in over 100 countries worldwide, accurate coding is of particular significance in the U.S., where medical codes form the basis for the bills doctors, clinics and hospitals issue to insurance providers and patients.

More than 150,000 codes are used in the U.S.’s adaptation of the International Classification of Diseases, a cataloging standard developed by the World Health Organization.

The diagnostic code for a pedestrian hit by a pickup truck? V03.10XA. Type 2 diabetes diagnosis? E11.9. There are also a set of procedural codes for everything a doctor might do, like put a cast on a patient’s broken right forearm (2W3CX2Z) or insert a pacemaker into a coronary vein (02H40NZ).

After every doctor’s appointment or procedure, a clinician’s summary of the interaction is converted into these codes. When done by humans, the turnaround time for medical chart coding — within a healthcare organization or at a private firm — is often two days or more. Natural language processing AI, accelerated by GPUs, can shrink that time to minutes or seconds.

San Francisco-based Fathom is developing deep learning tools to automate the painstaking medical coding process while increasing accuracy. The startup’s tools can help address the shortage of trained clinical coders, improve the speed and precision of billing, and allow human coders to focus on complex cases and follow-up queries.

“Sometimes you have to go back to the doctor to ask for clarification,” said Christopher Bockman, co-founder and chief technology officer of Fathom, a member of the NVIDIA Inception virtual accelerator program. “The longer that process takes, the harder it is for the doctor to remember what happened.”

Fathom uses NVIDIA P100 and V100 Tensor Core GPUs in Google Cloud for both training and inference of its deep learning algorithms. Founded in 2016, the company now works with several of the largest medical coding operations in the U.S., representing more than 200 million annual patient encounters. Its tools can reduce human time spent on medical coding by as much as 90 percent.

Deciphering the Doctor

At any doctor’s appointment, emergency room visit or surgical procedure, healthcare providers type up notes describing the interaction. While there are some standardized formats, these medical records differ by hospital, by type of appointment or procedure, and by whether the note is written during the patient interaction or after.

Medical coders make sense of this unstructured text, categorizing every test, treatment and procedure into a list of codes. Once coded, a healthcare provider’s billing department turns the reports into an invoice to collect payments from insurance providers and patients.

It’s a messy process — for a human or an AI. Human coders agree with each other less than two-thirds of the time in key scenarios, studies show. And research has found that half or more medical charts have coding errors.

“The challenge for us is these notes can vary quite a bit,” Bockman said. “There’s a push to standardize, but that tends to make the doctor’s job a lot harder. Human health is complex, so it’s hard to come up with a format that works for every case.”

Coding an AI that Codes

As a machine learning problem, medical coding shares elements of two kinds of tasks: multilabel classification and sequence-to-sequence NLP. An effective AI must understand the text in a doctor’s note and accurately tag it with a list of diagnoses and procedures organized in the right order for billing.

Fathom is tackling this challenge, aided by tools such as NVIDIA’s GPU-optimized version of BERT, a leading natural language understanding model. The team uses the TensorFlow deep learning framework and relies on the mixed-precision training provided by Tensor Cores to accelerate the large-scale processing of medical documents that vary widely in size.

Using NVIDIA GPUs for inference allows Fathom to easily scale up to process upwards of millions of healthcare encounters per hour.

“While lowering costs matter, the ability to instantly add the capacity of thousands of medical coders to their operations has been the game-changer for our clients,” said Andrew Lockhart, Fathom’s co-founder and CEO.

Relying on NVIDIA GPUs on Google Cloud helps the team ramp its usage up and down based on demand.

“We have very bursty needs,” Bockman said, referring to the team’s fluctuating computational workload. “Sometimes we might be trying to retrain different variants of the same large model, while other times we’re doing a lot of experimentation or just doing inference. We might need a single GPU or many dozens of them.”

The startup chose Google Cloud, Bockman said, in part because the data is encrypted by default — one of the requirements for compliance with HIPAA and SOC 2 privacy requirements.

While medical coding is the main activity done today with doctor’s notes, unlocking the information contained in these health records could enable a wide range of use cases beyond billing and reimbursement, Bockman says.

AI that quickly and accurately analyzes medical charts and appointment records at scale can help doctors spot patient illnesses that may otherwise have been missed, predict likely patient outcomes, suggest treatment options — and even identify promising patient candidates for clinical trials.

The post Cure for the Common Code: San Francisco Startup Uses AI to Automate Medical Coding appeared first on The Official NVIDIA Blog.

[Discussion] Google Patents “Generating output sequences from input sequences using neural networks”

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

http://www.freepatentsonline.com/10402719.html

News from the UK is that the grave of some guy named Turing has been heard making noises since this came out.

What would happen if, by some stroke of luck, Google collapses and some company like Oracle buys its IP and then goes after any dude who installed PyTorch?

Why doesn’t Google come out with a systematic approach to secure these patents?

I am not too sure they are doing this *only* for defending against patent trolls anymore.

submitted by /u/metacurse
[link] [comments]