Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[Discussion] Is there any Deepfake Database?

Hi All.I was wondering if there are any deepfake databases which can be found? I was surfing the net but couldn’t find anything but https://www.idiap.ch/dataset/deepfaketimit

I did stumble upon news that Facebook and others are investing to build a database, so my guess is there isn’t but I thought to ask just in case.

I am interested because I want to start a research on deepfake recognition, but before manually (semi-automatically) scrapping data I wanted to understand if there is anything that I am missing. Thank you)

submitted by /u/Alir_the_Neon
[link] [comments]

[D] What is the inductive bias in transformer architectures?

I’ve been thinking a lot about the question of inductive biases recently; basically equipping a model with a set of assumptions in order to make it prefer certain solutions. This can happen in different ways, like the model architecture, the loss or regularization.

In NLP, RNNs are (still) very popular because through their recurrency they exhibit an inductive bias that makes them temporally invariant, but recent work (like this) seems to suggest that this way they also suffer from a recency bias, which might inhibit their application to language.

Now that transformers dominate the leaderboards in many NLP tasks, I was wondering which kind of inductive bias they might carry given their architecture?

submitted by /u/Kaleidophon
[link] [comments]

[D] Current issues with transfer learning in NLP

We have all been impressed with good results achieved by BERT and its brothers. But these models are not perfect and their results come at a cost. I have summarized a lot of my readings on current issues with these huge pre-trained models in this blog post: https://mohammadkhalifa.github.io/2019/09/06/Issues-With-Transfer-Learning-in-NLP/.

I have mainly discussed the following 6 issues:

  • Computational Intensity
  • Difficult Reproducibility
  • Leaderboard Madness
  • Dissimilarity to how humans learn a language.
  • Shallow Language Understanding.
  • High carbon footprint.

Any feedback on the content or the writing would be really appreciated.

submitted by /u/moyle
[link] [comments]

[P] Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks

TLDR: – Learns weight agnostic topologies which mimic digital circuits using gradient descent. – Sparse nets with weights restricted to 0 or 1 can be learned by introducing Bernoulli initialization to BinaryConnect framework. – In binarized net, neuron + activation act like OR gates. Network purely composed with OR gates fails to solve the complex problem. BatchNorm acts like a NOT gate allowing the network to learn more complicated functions by forming NOR gates. – Learning topologies don’t necessarily need signed weights (Refer SuperMasks: https://arxiv.org/abs/1905.01067).

Paper: https://arxiv.org/abs/1909.00052

Code: https://github.com/AgrawalAmey/learning-digital-net

Colab: https://colab.research.google.com/github/AgrawalAmey/learning-digital-net/

![](https://pbs.twimg.com/media/EDuoUq7UEAA3QPI?format=jpg&name=large)

submitted by /u/agrawalamey
[link] [comments]

[D] Approaches to array feature

Do you know approaches to array/bag features?

To be specific let’s say we have classification problem and want to sort out things as good or bad. For example: * Feature1 – regular feature, limited set of values: A, B, C * Feature2 – array/bag feature, container feature consisting of unknown number of values

Feature1 Feature2 Good?
A [A, B] Yes
B [B, A, A] No
A [B, C, B, C, A] Yes

How do one encode Feature2 to a numeric vector?

submitted by /u/truri
[link] [comments]

[D] Compressing Neural Network

I’m curious if this idea has been tried out anywhere, I can’t seem to find results on arxiv but I think I’m using the wrong search terms.

I’m wondering if a feasible way of reducing the number of layers of a deep network is by training another much smaller network to approximate the behaviour of the large network between the input layer and second to last layer?

As an example, say we have a network with 100 layers, then we train it until it achieves the desired accuracy, and after training, I create a smaller network with maybe 5-10 layers, with the input being my data, and the output being the values of the 99th layer from the first network. I hope this makes sense.

Anyone come across references for something like this?

submitted by /u/Minimum_Zucchini
[link] [comments]

[R] Some idea for choosing action (instead of reinforcement learning)

I have heard that Reinforcement Learning (RL) has several problems for use in real-life problems.

📷 alexirpan.com

Deep Reinforcement Learning Doesn’t Work Yet

June 24, 2018 note: If you want to cite an example from the post, please cite the paper which that example came from. If you want to cite the post as a whole, you can use the following BibTeX:

I understand that the problem with RL is that it is appropriate when the environment is constantly receiving information in real-time (like real-time games or driving a car) but not when the environment consists of several features. And in many real-world problems, the environment consists of several features (like turn-based games).

So, as I understand it, there’s a lack of ML models about which action to choose in the simple environment to achieve a score as high as possible. It is difficult to solve simply by classification or regression.

The idea I am thinking of is this:
First, I want to select an action with only ML models for tabular data. So use tabular as the data. One row is one case, specific columns are Environment values, specific columns are Action values, and specific columns are Score.
And, Env + Action is used as a dependent value and Score is independent.
And get some model.

📷1.jpg817×750 42.7 KB

And when there is a value for Env, append all combinations of actions into tabular data. then use them as input to predict the score.
Then you get the score value according to the combination of actions. Select the value of the Action whose score is the highest.

Since the value of Score according to Action is not a real one but a prediction, it may be necessary to select not the global minimum but the section with the most stable slope or the like. Came up with Recalled how to pick LR from a graph in LR Finder. Bayesian optimization can also be used to select the Action values ​​to test. Or use sort of variance. Whatever.

📷2.jpg919×2312 189 KB

And it can also be used to set what is good scores. In other words, When there is a stage, it can decide which score is the best way to get a high total score, in the same way (probably the results obtained so far from the previous stage can be the environment).
That way, if you have a range of target scores, you can choose an Action that will give you that score.

📷3.jpg651×512 23.6 KB

Another consideration here is that Env values ​​are often sequential data. You may need to make the sequential process accordingly. One idea is this: for categorical: Some idea for sequential data
And in the case of numeric + sequential, I may need to use recursion and Deep Learning for tabular data together.

Please check if this idea is already obsolete, typical or absurd! If it looks ok, I’ll dig it.

submitted by /u/SunghoYahng
[link] [comments]

[D] Requirements for a fast model-building algorithm in one-shot model-based reinforcement learning

Comparision of algorithms for the fast extraction of a model from real world observations to be used for predicting rewards at different future timespans.

Requirements:​ * Time – Has memory of at least 20 steps so that it can handle temporal sequences * 1sht – Can learn from a single example so that it doesn’t need hundreds of training samples for each class * Hier – Is hierarchical so that it generalizes well (not just flat memorization) * Arch – Can learn the architecture from data so that it doesn’t need to be predefined by the developers * Curr – Has curriculum learning so that it can be trained successively and doesn’t suffer from catastrophic forgetting * Scal – Can be scaled up to at least 1 million inputs so that it’s not limited to toy environments

Algo Time 1sht Hier Arch Curr Scal
NNGP 🚫 🚫
GHSOM 🚫 🚫
THSOM 🚫 🚫 🚫
BPTT 🚫 🚫 🚫
GA 🚫 🚫 🚫
HTM 🚫 🚫

Candidate algorithms: * NNGP – Nearest Neighbor Gaussian Processes https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2015.1044091 * GHSOM – Growing Hierarchical Self-Organizing Map http://www.ifs.tuwien.ac.at/~andi/ghsom/ * THSOM – Temporal Hebbian Self-organizing Map https://link.springer.com/chapter/10.1007/978-3-540-87536-9_65 * BPTT – Recurrent Neural Networks trained with Backpropagation Through Time, for example https://en.wikipedia.org/wiki/Long_short-term_memory * GA – Genetic Algorithms https://en.wikipedia.org/wiki/Genetic_algorithm * HTM – Hierarchical Temporal Memory https://en.wikipedia.org/wiki/Hierarchical_temporal_memory or in German https://de.wikipedia.org/wiki/Hierarchischer_Temporalspeicher

The table probably has errors because I’m not an expert and just wanna watch progress in AGI. But the current backprop winter is boring me, and if no one else is taking the initiative then an outsider from the audience has to.

As I don’t understand the math in the paper for NNGPs, I’m assuming that they are just a hierarchical version of the simple nearest neighbor algorithm. Or that the two SOM-descendants are just standard self-organizing maps plus some fancy extensions for hierarchical architecture and time.

Drop a note if you find an error and I will fix the table.

submitted by /u/wlorenz65
[link] [comments]