Author: torontoai

[D] Conversational AI

Written on November 1, 2019. Posted in Reddit MachineLearning.

I’m curious about what the current state of conversational AI. To be more specific, by “conversation” I’m not talking about something that can take orders to schedule appointments or buy tickets or something like that. I mean something like discussing movies or TV shows or current events. I think remember Amazon holding a contest for this kind of thing but I haven’t really seen something like this implemented anywhere I can access it. Does anyone have any examples of what this kind of technology can do? Or better yet, anything I can play and experiment with on my own?

submitted by /u/Rioghasarig
[link] [comments]

[D] any principled reason for cross entropy instead of L2 in language modelling? (more details in post)

Written on November 1, 2019. Posted in Reddit MachineLearning.

Is there any principled reason for doing softmax and cross entropy for the loss in for example transformers, rather than doing L2 over the target embeddings and the output from the model?

When the output from your model by necessity is a dot product such as in shallow models I understand why you need to do cross entropy loss. But for models such as rnns and some variants of transformers wouldn’t L2 loss directly on the desired embedding and output work as well or better?

submitted by /u/mesmer_adama
[link] [comments]

[R] MGBPv2: Scaling Up Multi-Grid Back-Projection Networks (Winner of AIM ICCV19 Extreme-SR, Perceptual track)

Written on November 1, 2019. Posted in Reddit MachineLearning.

(16x upscaling example from paper)

Authors:Pablo Navarrete Michelini, Wenbin Chen, Hanwen Liu, Dan Zhu

Abstract: Here, we describe our solution for the AIM–2019 Extreme Super–Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high–fidelity (PSNR). To tackle this challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and more general than its predecessor. It combines the scalability of the multigrid algorithm and the performance of iterative backprojections. In its original form, MGBP is limited to a small number of parameters due to a strongly recursive structure. In MGBPv2, we make full use of the multigrid recursion from the beginning of the network; we allow different parameters in every module of the network; we simplify the main modules; and finally, we allow adjustments of the number of network features based on the scale of operation. For inference tasks, we introduce an overlapping patch approach to further allow processing of very large images (e.g. 8K). Our training strategies make use of a multiscale loss, combining distortion and/or perception losses on the output as well as downscaled output images. The final system can balance between high quality and high performance.

PDF Link | Landing Page | Github

submitted by /u/pnavarre
[link] [comments]

[P] Person remover: image-to-image project

Written on November 1, 2019. Posted in Reddit MachineLearning.

Hi, during summer I worked on a project with the objective of removing people or objects from photos. Person-remover uses a pretrained YOLO to detect them and then feeds the resulting bounding boxes to the generator of a pix2pix which I trained from zero on Paris dataset. Even though the generator wasn’t trained with the purpose of filling person-shaped objects, the results are pretty great and seems to generalize well to unseen photos or video.

Any ideas on how to improve the results even further?

Repo: https://github.com/javirk/Person_remover

submitted by /u/javirk
[link] [comments]

[D] What is SOTA in Multi-Task Learning

Written on November 1, 2019. Posted in Reddit MachineLearning.

Is it domain specific? Does Ruders’ blog/paper still hold up: https://ruder.io/multi-task/?

submitted by /u/searchingundergrad
[link] [comments]

[P] NER Tagger based on BERT + CRF (for Korean)

Written on November 1, 2019. Posted in Reddit MachineLearning.

Hi all,

I did a toy project for Korean NER tagger(in progress). If you are interested in Korean Named Entity Recognition, try it. (This NER tagger is implemented in PyTorch)

If you want to apply it to other languages, you don’t have to change the model architecture, you just change vocab, pretrained BERT(from huggingface), and training dataset.

https://github.com/eagle705/pytorch-bert-crf-ner

submitted by /u/eagle705
[link] [comments]

[D] Why re-sampling imbalanced data isn’t always the best idea

Written on November 1, 2019. Posted in Reddit MachineLearning.

I often times work with people (medical studies) with a huge “knowledge” on statistical methods but none of the required basics or understanding what goes on inside some algorithms. That’s perfectly fine because after all that’s not their job but mine.

But over time, I’ve come across a few problems where (due to not finding the “needed significance”) some really basic over-sampling was applied. I’ve thrown together a really simple example, that anyone should be able to follow (without any deep statistical knowledge) to showcase what could happen – maybe it helps you or you can use it to your help:

https://stroemer.cc/resample-imbalanced-data/

submitted by /u/kchnkrml
[link] [comments]

[P] Mask_RCNN for blurring advertisment on streets.

Written on November 1, 2019. Posted in Reddit MachineLearning.

https://github.com/WannaFIy/mask_AD

https://preview.redd.it/6dv43jvyy8w31.jpg?width=2048&format=pjpg&auto=webp&s=b9feedfeb74912aac43d53eb6fe03cd82cd7b08e

submitted by /u/wannafIy
[link] [comments]

MOBILEBERT: TASK-AGNOSTIC COMPRESSION OF BERT BY PROGRESSIVE KNOWLEDGE TRANSFER

Written on November 1, 2019. Posted in Reddit MachineLearning.

submitted by /u/I_ai_AI
[link] [comments]

[D] Momentum methods helps to escape local minima, so what? It was never our objective.

Written on October 31, 2019. Posted in Reddit MachineLearning.

Something that seems to be under-discussed in machine learning is why we bother with momentum method in the first place.

Suppose we are training a classifier and the loss function has two local minima, one of which is global. Suppose by sheer unluck, the gradient descent gets stuck in the worse local minima. If you ask around as to what can be done, you will hear answers like “oh just use the momentum method, it gets you out of the local minima”.

First, there is no guarantee you will be out of the local minima (only if the difference between the current and previous iterate is large enough do you have a chance), and more importantly,

Second, great, you have found the global mimina and….you have just potentially overfitted your classifier.

In other words, we are looking for local minima (or even just some point associated with the loss function) with good generalization properties, and I don’t think momentum methods guarantees that.

Has there been any research on the generalization properties of the minima that you find and what algorithm get you the best minima, not in terms of how small the loss is, but how well it achieves generalization?

submitted by /u/fromnighttilldawn
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Author: torontoai

[D] Conversational AI

[D] any principled reason for cross entropy instead of L2 in language modelling? (more details in post)

[R] MGBPv2: Scaling Up Multi-Grid Back-Projection Networks (Winner of AIM ICCV19 Extreme-SR, Perceptual track)

[P] Person remover: image-to-image project

[D] What is SOTA in Multi-Task Learning

[P] NER Tagger based on BERT + CRF (for Korean)

[D] Why re-sampling imbalanced data isn’t always the best idea

[P] Mask_RCNN for blurring advertisment on streets.

MOBILEBERT: TASK-AGNOSTIC COMPRESSION OF BERT BY PROGRESSIVE KNOWLEDGE TRANSFER

[D] Momentum methods helps to escape local minima, so what? It was never our objective.