Category: Reddit MachineLearning

[R] Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

Written on September 12, 2019. Posted in Reddit MachineLearning.

submitted by /u/tsauri
[link] [comments]

[P][D] Anyone working with a data pipeline of CPU -> GPU? I am developing a library of methods for faster transfer to GPU. In some cases, 370x faster than used Pytorch’s Pinned CPU Tensors. Let me know what your pipeline is and I’ll try to add methods for it. Just show me your code.

Written on September 11, 2019. Posted in Reddit MachineLearning.

I am developing methods for fast transfer from CPU and GPU, and currently coding the methods for it. Show me your code (A Colab notebook would be really helpful) and I’ll see how to incorporate the library into it, for faster data transfer.

submitted by /u/BatmantoshReturns
[link] [comments]

[N] Open-Unmix for Music Separation

Written on September 11, 2019. Posted in Reddit MachineLearning.

📜Paper: https://joss.theoj.org/papers/571753bc54c5d6dd36382c3d801de41d
🔊Demo: https://open.unmix.app
🔥PyTorch: https://github.com/sigsep/open-unmix-pytorch
🔻NNabla: https://github.com/sigsep/open-unmix-nnabla
🔶TF2: t.b.a.
📓Colab: https://colab.research.google.com/drive/1mijF0zGWxN-KaxTnd0q6hayAlrID5fEQ

It is our great pleasure to announce the release of Open-unmix, a MIT-licensed python implementation for DNN-based music separation.

In the recent years, deep learning-based systems could break a long-standing crystal ceiling, and finally allow high-quality music separation. This provoked a raising interest from both the industry and the machine learning community (like /r/ML)

However, until now, no open-source implementation was available that matches the performance of the best systems proposed more than four years ago. This lead to a waste of time from both the points of view of sheer performance optimization and scientific comparison with the state of the art. Not being able to reproduce state of the art performance makes it difficult to clearly identify the sources for discrepancies and rooms for improvement.

In this context, we release Open-Unmix (UMX) as closing this gap by providing a reference implementation for DNN-based music separation. It serves two main purposes. First, it is intended to academic researchers for serving as a baseline method that is easy to compare to and build upon. Second, the availability of a pre-trained model allows bringing music separation to the enthusiastic end users and artists.

Paper

Open-unmix is presented in a paper that has just been published in the Journal of Open Source Software. You may download the paper PDF here

Code

Open-unmix comes in several DNN frameworks:

Pytorch
NNabla
tensorflow version will be released as soon as Tensorflow 2.0 is out.

Website

we provide extend documentation and further demos on the sigsep website.

https://sigsep.github.io/open-unmix/

Datasets

Open-unmix has been especially designed to combine well with the following datasets:

MUSDB18 has become one of the most popular dataset in Source Separation and MIR. We provide full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems.
MUSDB18-HQ: together with Open-Unmix, we also released an additional flavor of the dataset for models that aim to predict high bandwidth of up to 22 kHz. Other than that, MUSDB18-HQ is identical to MUSDB18.

=> Both datasets are available at https://sigsep.github.io/datasets/musdb.html

Open-unmix also offers a variety of template dataset structures that should be appropriate for many other use cases

Note:

If you want to compare separation models to existing source separation literature or if want compare to SiSEC 2018 participants, please use the standard MUSDB18 dataset, instead.

Pre-trained models

We provide pre-trained models trained on both MUSDB18 and MUSDB18-HQ that reach state-of-the-art performance of 6.32 dB SDR (median of medians) on vocals on MUSDB18 test data. This significantly outperforms any model we are aware of that was trained on MUSDB18 only.

The pre-trained models are automatically bundled/downloaded when using the pytorch implementation.

Further information for both models such as evaluation scores can be downloaded from zenodo:

umx: https://doi.org/10.5281/zenodo.3370486
umxhq: https://doi.org/10.5281/zenodo.3370489

Tutorial

Open-unmix was recently proposed during a tutorial held at EUSIPCO 2019. This features:

A recent overview into current source separation method with a focus on deep learning
A lecture on spectrogram models and wiener filtering
Visualizations and results of Open-Unmix compared to state-of-the-art

The slides of the tutorial as well as self-contained colab notebooks can be found on the tutorial site.

Related tools

Open-unmix is part of a whole ecosystem enabling easy research on source separation for Python users. Several distinct and independent projects were released in the recent years in this effort to make it possible for researchers to reproduce state of the art performance in this domain.

norbert

A reliable python package that implements the multichannel wiener filter and related filtering methods.

https://github.com/sigsep/norbert

musdb

We released the new version 0.3.0 of our popular musdb tools. This releases makes it simpler to use musdb inside your data loading framework thus we pro

https://github.com/sigsep/sigsep-mus-db

museval

museval makes it easy to compare the performance of any new method under investigation to both Open-unmix and the participants of SiSEC18.

https://github.com/sigsep/sigsep-mus-eval

UMX-Pro

Please note that we are also working on some version of open-unmix that has been trained on a significantly larger dataset and that achieves unprecedented separation performance. Please feel free to contact us for demonstrations / industrial collaborations / licensing on this matter.

We look forward to your feedback and we hope that you will find Open-unmix useful!

submitted by /u/faroit
[link] [comments]

[P] Does any framework have native Fourier-based CNNs?

Written on September 11, 2019. Posted in Reddit MachineLearning.

I’m looking to do some experiments using the Fast Fourier Transform to do CNNs. From what I’ve seen, many common frameworks (Chainer, Keras, PyTorch, TensorFlow) don’t provide support for this. They typically implement a FFT or DFT function but not a FT convolutional layer. I could implement it from scratch, but there’s some finicky implementation aspects I was hoping to avoid worrying about.

Does any framework have native Fourier-based CNNs? Alternatively, pointers to SOTA implementations on GitHub I can use for reference would be highly appreciated. Ideally in Chainer, as that’s the framework I have experience with.

submitted by /u/StellaAthena
[link] [comments]

[P] learn2learn: A PyTorch Meta Learning Library

Written on September 11, 2019. Posted in Reddit MachineLearning.

Hello /r/ML,

We are pleased to share with you our meta-learning library, that started as a project at the PyTorch hackathon.

learn2learn is a PyTorch library for all things meta-learning. Our goal is to support as many meta-learning algorithms as possible (be it few-shots, meta-descent, or meta-RL) and to enable researchers to develop better methods and easily compare against existing literature.

Our current features include:

Modular API: implement your own training loops with our low-level utilities.
Provides various meta-learning algorithms (e.g. MAML, FOMAML, MetaSGD, ProtoNets, DiCE)
Task generator with unified API, compatible with torchvision, torchtext, torchaudio, and cherry
Provides standardized meta-learning tasks for vision (Omniglot, mini-ImageNet), reinforcement learning (Particles, Mujoco), and even text (news classification).
100% compatible with PyTorch — use your own modules, datasets, or libraries!

If this is of interest to you, have a look at the following links:

Let us know what you think and how we can help you in your research!

PS: learn2learn was also accepted as a poster to the PyTorch Dev Conference, so you’ll know all about it there!

submitted by /u/praat33k
[link] [comments]

[R] Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks

Written on September 11, 2019. Posted in Reddit MachineLearning.

Abstract – Object detection, the computer vision task dealing with detecting instances of objects of a certain class (e.g., ’car’, ’plane’, etc.) in images, attracted a lot of attention from the community during the last six years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolutional neural networks (DCNNs). This article reviews the recent literature on object detection with deep CNN, in a comprehensive way. This study covers not only the design decisions made in modern deep (CNN) object detectors, but also provides an in-depth perspective on the set of challenges currently faced by the computer vision community, as well as some complementary and new directions on how to overcome them. In its last part it goes on to show how object detection can be extended to other modalities and conducted under different constraints. This survey also reviews in its appendix the public datasets and associated state-of-the-art algorithms.

Page -> https://arxiv.org/abs/1809.03193

PDF -> https://arxiv.org/pdf/1809.03193.pdf

submitted by /u/gnavihs
[link] [comments]

[P] The age of transformers & Understanding text with BERT

Written on September 11, 2019. Posted in Reddit MachineLearning.

This is a two part blog post on a Project that aims to do question answering, using a pretrained BERT.

The first part teaches about Transformers, and the history that leads up to this Architecture. -> https://blog.scaleway.com/2019/building-a-machine-reading-comprehension-system-using-the-latest-advances-in-deep-learning-for-nlp/

The second part focuses on using a pre-trained BERT (in PyTorch) and how to do question answering. There’s code and you can try it on your own dataset easily 🙂

-> https://blog.scaleway.com/2019/understanding-text-with-bert/

submitted by /u/ilnmtlbnm
[link] [comments]

[Discussion] Google Patents “Generating output sequences from input sequences using neural networks”

Written on September 11, 2019. Posted in Reddit MachineLearning.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

http://www.freepatentsonline.com/10402719.html

News from the UK is that the grave of some guy named Turing has been heard making noises since this came out.

What would happen if, by some stroke of luck, Google collapses and some company like Oracle buys its IP and then goes after any dude who installed PyTorch?

Why doesn’t Google come out with a systematic approach to secure these patents?

I am not too sure they are doing this *only* for defending against patent trolls anymore.

submitted by /u/metacurse
[link] [comments]

[R] Research Guide for Video Frame Interpolation with Deep Learning

Written on September 11, 2019. Posted in Reddit MachineLearning.

In this research guide, we’ll look at deep learning papers aimed at synthesizing video frames within an existing video. This could be in between video frames, known as interpolation, or after them, known as extrapolation.

“Research Guide for Video Frame Interpolation with Deep Learning” by Derrick Mwiti https://link.medium.com/DgU5ogTlVZ

submitted by /u/mwitiderrick
[link] [comments]

[D] training ASR checking assumptions

Written on September 11, 2019. Posted in Reddit MachineLearning.

Suppose you’re training an ASR system from scratch using audio books, where you have the plain text as well as the audio. One big mp3 file for the book (or maybe split into several chapter’s mp3 files ). And several text files corresponding to the chapters.

The first stage is you need labelled data, ie 1.wav with a transcript 1.txt for the first line of the book. Then 2.wav with a transcript 2.txt for the 2nd line of the book. And so on. Once you have all those pairings, you can feed pairs at a time into your ASR system (eg mozilla deepspeech). The algorithm wont take bigger chunks, so a sentence seems to be the right way to go. You could try feeding it per word but I don’t know if that would improve performance? I don’t even know how you begin to detect and segment on single words in an utterance, that seems to be a far harder problem than waiting for natural pauses which are easier to detect by machine. Anyway suppose it’s per line for simplicity, because you don’t really want a bazillion files of just one word uttered and transcribed.

But how exactly do you segment the bigger audio files into smaller line sized pieces? You can try by voice activity detection. I can stick a large mp3 in Audacity and have it do approximately per line labelling via voice segment labelling, it’s easy to produce a bunch of 1-5s long wav files that have been cut on silence. But now you don’t know exactly how each piece of audio (say foo.wav) produces a corresponding foo.txt since the narrator might blend two sentences together with an altogether too small pause in some cases. If the narrator pauses faithfully between sentences it would be easy. But you can’t assume VAD will give you neat divisions of 1:1 speech line to text sentence, so you can’t easily work out which sentence of text corresponds to what chunk of audio. Unfortunately sentences will often spill over wav boundaries. In the best case it might just mean you have two whole sentences combined in one wav, so you need to split them up into two separate wavs, but the worst case would be a sentence being divided over two wav files. Then you need to edit the wavs by hand, moving a bit of one wav into the second or vice versa. Or just accept broken sentences.

So your audio data is cut up into 1-5s pieces, by voice activity, unlabelled, you need to label it. In this phase I think you’re supposed to use tools like forced alignment. But the problem is we don’t really care about word alignment which is what forced aligners do, we’re just after sentences after all. And forced alignment needs line by line, per wav, transcripts to work in the first place, which we don’t have, which is the problem we were originally faced with before anyone mentioned forced aligner. If you are working with broken lines then labels don’t even correspond evenly to text sentences anymore, just whatever words of the book are uttered in that given wav.

Apparently one solution to the labelling problem is to simply run ASR on those small wavs to generate rough transcripts. Then presumably you match it up approximately with the known text lines and align them up that way? Sounds complicated, especially when we don’t have a decently performing ASR to use. So does that mean the ASR bootstrapping labelling is done mainly by hand? That’s where i’m stuck. Is there no way around the bootstrapping by hand? For a low resource language you’re just stuck with hand labelling lots of data first. And without enough data, the ASR isn’t going to be helpful since the error rate will be too high. You just have to build up a decent amount of hand labelled data to train an okay ASR which can be used to bootstrap more efficient machine driven training on new datasets. Are all my assumptions correct? This bootstrapping seems unavoidable?

I should probably mention there is an assumption that we’re training an end-to-end deepnet ASR system. I’m not sure how the classical systems worked but they’re probably even harder to train, because you need a phoneme dictionary for the language and then training a model to discriminate and identify phones in a given speech fragment. Which means you do need per word alignment on training data, a much harder problem than the per sentence alignment needed by the end-to-end system.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[R] Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

[P][D] Anyone working with a data pipeline of CPU -> GPU? I am developing a library of methods for faster transfer to GPU. In some cases, 370x faster than used Pytorch’s Pinned CPU Tensors. Let me know what your pipeline is and I’ll try to add methods for it. Just show me your code.

[N] Open-Unmix for Music Separation

Paper

Code

Website

Datasets

Pre-trained models

Tutorial

Related tools

norbert

musdb

museval

UMX-Pro

[P] Does any framework have native Fourier-based CNNs?

[P] learn2learn: A PyTorch Meta Learning Library

[R] Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks

[P] The age of transformers & Understanding text with BERT

[Discussion] Google Patents “Generating output sequences from input sequences using neural networks”

[R] Research Guide for Video Frame Interpolation with Deep Learning

[D] training ASR checking assumptions