Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Trust t-SNE without PCA verification?

Hi all,
for my dataset t-SNE produces beautiful clusters with some transition in between clusters and a plot that overall is just very exciting. PCA on the other hand just produces very boring results.

Now I’m aware that t-SNE will try much harder to cluster stuff than PCA, so I’m not sure what to make of it.

Can I somehow verify I’m not seeing artefacts that are based on the workings of t-SNE?

I can’t share the data, but here are some crudely drawn examples <3 https://imgur.com/a/7gQPrMA

Thanks!

submitted by /u/Seiteshyru
[link] [comments]

[DISCUSSION] Bert Token Embeddings

From Paper is easy to understand that BERT input is composed by Token Embeddings, Positional Encode, Sentence Encode. The last two are well-defined in BERT paper and in “Attention is all you need”. But Token embeddings is not clear how are build. Reading on Internet I found different opinions. For sure tokenization is performed using WordPiece Tokens and it’s easy understand how it splits words. But once you have the token id how BERT converts it in a Embedding?

submitted by /u/lor_v
[link] [comments]

[R] Weakly Supervised Disentanglement with Guarantees

We build a theoretical framework for analyzing disentanglement in the weakly supervised regime. We provide new definitions for disentanglement (sorry) that can be measured in a weakly supervised manner, and use these definitions as the cornerstone for developing a calculus and theory of disentanglement. We then analyzed several weak supervision techniques and proved (and empirically demonstrated) their disentanglement guarantees (or lack thereof).

We hope that the concepts developed in this paper will help researchers frame their discussion and analysis of weakly supervised disentanglement in future work.

Paper: https://arxiv.org/abs/1910.09772

Cute gif: https://twitter.com/i/status/1187507675258486784

submitted by /u/approximately_wrong
[link] [comments]

[P] 10K Downloads Special 🎉: gpt2-client accepting all feature requests!

[P] 10K Downloads Special 🎉: gpt2-client accepting all feature requests!

Hey everyone 👋🏻👋🏻!

First off, I want to thank all of you for your amazing support. gpt2-client just reached 10K+ downloads!!

Being my first open-source project, it’s touching to see the positive experiences you share with me via email/DM. I’ve noticed a trend where many of you are using it for your NLP research and some of you for your side-projects. No matter what you do, I’d love to know how I can improve it, either in terms of functionality, extendability, modularity, efficiency. You name it.

We did it y’all! 10K in the bag 😀

The Good Stuff: You’re in control now

As a way of giving back, I’d love to hear what you’d want to see in gpt2-client. It can be any bombastic feature request!!! We could discuss this on any platform (or you can open a feature request here https://github.com/rish-16/gpt2client/issues/new/choose). This could tug on any aspect of gpt2-client that you feel should belong inside the module.

————————————

If you still aren’t sure what gpt2-client is, I urge you to check out https://github.com/rish-16/gpt2client/ and if you like what you’re seeing, do drop a ⭐. It means a lot to me and motivates me to continue building open-source technology.

Express your creativity down below in the comments!!! Grateful for your continuing support 🤘🏻

Cheers!

submitted by /u/rish-16
[link] [comments]

[R] Distributed self-supervising capsule network

One month ago I post an introduction to what I was working on, and someone suggested I should “create *something*, anything that people can look at to try and understand “. So I have written an article now. This is the link.

This article proposes a self-supervising machine learning architecture which is actually a two-step model. The first step is to construct a causal representation model, and the second step is to promote intentions for it to complete tasks. The output cannot be supervised directly but it is action control signals sparsely coding information. In these situations, the end-to-end supervised learning is not applicable anymore.

Here is the former post a month ago.

submitted by /u/tobby_liu
[link] [comments]

[R] Attenchilada: Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

tl;dr: Using location-relative attention mechanisms allows Tacotron-based TTS systems to generalize to very long utterances.

Abstract:
Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be addressed using simple location-relative attention mechanisms that do away with content-based query/key comparisons. We compare two families of attention mechanisms: location-relative GMM-based mechanisms and additive energy-based mechanisms. We suggest simple modifications to GMM-based attention that allow it to align quickly and consistently during training, and introduce a new location-relative attention mechanism to the additive energy-based family, called Dynamic Convolution Attention (DCA). We compare the various mechanisms in terms of alignment speed and consistency during training, naturalness, and ability to generalize to long utterances, and conclude that GMM attention and DCA can generalize to very long utterances, while preserving naturalness for shorter, in-domain utterances.

Paper: https://arxiv.org/abs/1910.10288

Audio Examples: https://google.github.io/tacotron/publications/location_relative_attention

submitted by /u/animus144
[link] [comments]

[R] AI Benchmark: All About Deep Learning on Smartphones in 2019

[R] AI Benchmark: All About Deep Learning on Smartphones in 2019

[arXiv Abstract]: The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website: http://ai-benchmark.com

Performance evolution of mobile AI accelerators rs: image throughput for the float Inception-V3 model.

The paper discusses the following topics:

  1. Four generations of mobile NPUs
  2. Hardware acceleration resources for AI inference on each of Android mobile SoC platforms
  3. Android ecosystem for running deep learning models
  4. Quantized and Floating-point performance of all generations of mobile NPUs
  5. Performance comparison of FP inference on mobile NPUs vs. Intel CPUs vs. Nvidia GPUs.

The full paper is available on arXiv: https://arxiv.org/pdf/1910.06663.pdf

submitted by /u/aiff22
[link] [comments]

[D] Do CNNs understand semantic relationships between their classes?

Hi all,

How do CNNs understand “compositional semantic relationships” between their classes? The problem exists in the entire field, but I’m referencing this paper in particular: http://gandissect.csail.mit.edu/

In the 3rd paragraph of the introduction of the paper, Bau et al. say (emphasis mine):

To a human observer, a well-trained GAN appears to have learned facts about the objects in the image: for example, a door can appear on a building but not on a tree. We wish to understand how a GAN represents such structure. Do the objects emerge as pure pixel patterns without any explicit representation of objects such as doors and trees, or does the GAN contain internal variables that correspond to the objects that humans perceive?

From my (limited) understanding of CNNs, they take in the input image (HxWx3 channels) and pass it through a bunch of filters and maxpool layers. Each maxpool layer reduces the HxW of the matrix. Each filter layer increases the depth of matrix as we move away from the image and toward the “highest levels of representation.” In a sense, we’re abstracting toward higher and higher level features as the receptive field of our neurons increases and we’re able to model longer-distance relationships.

Finally, the last layer of the CNN is connected to the class output by a fully connected layer.

From what I’ve read, it seems like the field is a bit split on this. You have this paper saying

oh look! GANs (and thus CNNs) can understand relationships between classes because doors don’t appear in the sky

But others also say it’s an explicit shortcoming of the convolution function — that the spatial equivariance of convolution means it inherently cannot understand these relationships.

A CNN only looks for 2 eyes, 1 nose, and 1 mouth. It doesn’t care that the eyes are parallel and above the nose, or that the nose is above the mouth!

————

My take is that the CNN can understand broadly the correlations between classes because of the last, FC layer. As a result, it can understand that maybe standing is negatively correlated with beer, or that pens are correlated with paper. But, it can’t understand spatial relationships.

What do you guys think of this issue?

submitted by /u/sabot00
[link] [comments]

[D] A Unifying Framework of Bilinear LSTMs

Disclaimer: this is my paper that I’ve been working on, if this sort of thing is not allowed on /r/ml please let me know.

arXiv page: https://arxiv.org/abs/1910.10294

Abstract: This paper presents a novel unifying framework of bilinear LSTMs that can represent and utilize the nonlinear interaction of the input features present in sequence datasets for achieving superior performance over a linear LSTM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balanced by correspondingly trading off between the hidden state vector size vs. approximation quality of the weight matrix in the bilinear term so as to optimize the performance of our bilinear LSTM, while not incurring more parameters to be learned. We empirically evaluate the performance of our bilinear LSTM in several language-based sequence learning tasks to demonstrate its general applicability.

Comments: This approach is novel because it considers improvement through the use of bilinear neurons (essentially polynomial regression + nonlinearity) as a building block. This is typically not done in neural networks as it is typically accepted that linear neuron + nonlinearity is sufficient as a universal approximator. However, we find that performance improvement can be achieved without incurring additional learnable parameters if bilinear neurons are used. It should be noted that the original proof on the universal approximability of linear neurons (Cybenko, 1989) does not show that they are efficient.

submitted by /u/ml_mohit
[link] [comments]