Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[R] Teaching a neural network to use a calculator

Article by Reiichiro Nakano:

This article explores a seq2seq architecture for solving simple probability problems in Saxton et. al.’s Mathematics Dataset. A transformer is used to map questions to intermediate steps, while an external symbolic calculator evaluates intermediate expressions. This approach emulates how a student might solve math problems, by setting up intermediate equations, using a calculator to solve them, and using those results to construct further equations.

https://reiinakano.com/2019/11/12/solving-probability.html

submitted by /u/wei_jok
[link] [comments]

[D] BERT for non-textual sequence data

Hi there, I’m working on a deep learning solution for classifying sequence data that isn’t raw text but rather entities (which have already been extracted from the text). I am currently using word2vec-style embeddings to feed the entities to a CNN, but I was wondering if a Transformer (à la BERT) would be a better alternative & provide a better way of capturing the semantics of the entities involved. I can’t seem to find any articles (let alone libraries) to apply sth like BERT to non-textual sequence data. Does anybody know any papers about this angle? I’ve thought about training a BERT model from scratch and treating the entities as if they were text. The issue with that though is that apparently BERT is slow when dealing with long sequences (sentences). In my data I often have sequences that have a length of 1000+ so I’m worried BERT won’t cut it. Any help, insights or references are very much appreciated! Thanks

submitted by /u/daanvdn
[link] [comments]

[D] Thoughts about this conversation?

This thread is on a public forum(Twitter) between two scientists.

Person1 – [Director of #AI #research @nvidia, Bren #Professor @Caltech, Fmr Principal scientist @awscloud]

Person2 – Research Scientist at Deepmind

Both are entitled to their own opinions. Here’s how the thread goes…

Person1(talking about her newly published work): DeepLearning is only good at interpolation. But applications need extrapolation that can reason about more complex scenarios than it is trained on. With current methods, accuracy degrades rapidly when complexity of test instances grows. Our new work aims to overcome this…

Person2: This tweet really downplays prior work. NTM, memory nets, Neural GPU, MANN, graph nets, and many, many other related methods also degrade gracefully. Your work looks like an important next step, but this rhetoric is unhelpful.

Person1: What you are doing is rhetoric and rude. We have mentioned all prior work in our paper. You don’t want to engage in science. It is inevitable to get attacked online as a woman. #deepmind can engage in all kind of media hype that is unethical but I get attacked for stating facts. As a woman stating science, I get accused of engaging in rhetoric.

I personally feel this response by Person1 to be extremely out of the blue. Putting aside the fact that Person1 is a Director @ NVIDIA + some title at Caltech and Person2 is a scientist as well @Google, let’s look at the simple conversation here. The thread started with a tweet about an interesting work. That was followed by a review directed only at the tweet being rhetoric. And it was then replied with something unimaginable. Am I the only one looking at this all confused?

Source post: https://twitter.com/AnimaAnandkumar/status/1194338388221972480

submitted by /u/GreySindrome
[link] [comments]

“[D]” John Carmack stepping down as Oculus CTO to work on artificial general intelligence (AGI)

Here is John’s post with more details:

https://www.facebook.com/permalink.php?story_fbid=2547632585471243&id=100006735798590

I’m curious what members here on MachineLearning think about this, especially that he’s going after AGI and starting from his home in a “Victorian Gentleman Scientist” style. John Carmack is one of the smartest people alive in my opinion, and even as CTO at Oculus he’s answered several of my questions via Twitter despite never meeting me nor knowing who I am. A real stand-up guy.

submitted by /u/jd_3d
[link] [comments]

[R] NVIDIA’s Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research

Link to repo: https://github.com/NVIDIAGameWorks/kaolin

Link to arxiv paper: https://arxiv.org/abs/1911.05063

Abstract: We present Kaolin, a PyTorch library aiming to accelerate 3D deep learning research. Kaolin provides efficient implementations of differentiable 3D modules for use in deep learning systems. With functionality to load and preprocess several popular 3D datasets, and native functions to manipulate meshes, pointclouds, signed distance functions, and voxel grids, Kaolin mitigates the need to write wasteful boilerplate code. Kaolin packages together several differentiable graphics modules including rendering, lighting, shading, and view warping. Kaolin also supports an array of loss functions and evaluation metrics for seamless evaluation and provides visualization functionality to render the 3D results. Importantly, we curate a comprehensive model zoo comprising many state-of-the-art 3D deep learning architectures, to serve as a starting point for future research endeavours. Kaolin is available as open-source software at this https URL.

submitted by /u/edwardsmith1884
[link] [comments]

[D] How do you standardize/scale data for multi-input neural networks?

Books teach us that we have to standardize or scale data (when features have non-comparable scales) before feeding a neural network (or other algorithms such as linear or logistic regressions).

When you have multi-input neural networks (for example you have some “dense” inputs for tabular/structured data and some convolutional layers for unstructured data such as images… and then at some point you concatenate their output), how do you standardize/scale data? You have totally different input data, different NN layers and then at some point you merge them (e.g. https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models). How do you handle data normalization in this case?

submitted by /u/ekerazha
[link] [comments]

[Project] Deepdos

Description

Hello, r/MachineLearning! Over the course of the last 2 months I’ve been working on my first major machine learning project called, “Deepdos” in my free time outside of school and work. Deepdos is a network tool that provides analysis and in the future mitigation of all network traffic coming over whatever network adapter you specify. The analysis utilizes a logistic regression model that classifies traffic as either safe or malicious based on aggregated packet capture data using the CICFlowmeter (The people that created the tool are also the same people that created the dataset used for training). The mitigation, which will only be for Linux based systems, will create and manage firewall rules written directly to iptables. While the name includes “deep”, there is actually no deep learning involved at all. (At least not yet)

The project source code can be found here: deepdos

Currently the project is listed as being in a pre-alpha state, as there are a lot of milestones that need to be hit before I can consider this a stable/production ready project. Hopefully, some of you can help me get there! Currently, I’m looking for constructive feedback on the projects current state, additions that I should be making, and really anything else that can help me grow this project into something that can be useful for companies. Here is a snapshot of the project without having to look at any of the code:

Where I’m at:

  • Currently utilize a logistic regression model that is trained on 200,000 samples of network traffic with 100,000 being “normal” network traffic and 100,000 being malicious.
  • Packet capture data aggregation via tcpdump. Currently, I listen for very short bursts of time for development but will be ramping this time up to reflect the communication between two devices more accurately.
  • Published on Pypi (Not stable, yet).
  • I’ve rebuilt the structure of the application 3 times right now for scalability and think I finally developed a system

Where I’m trying to go:

  • I’m currently thinking about how I can develop a robust testing system so that this project can continue to scale with reliability.
  • Training on the full data set which is comprised of roughly 57 million samples, as I’m currently only using 200,000 of those samples. :[
  • Experimenting with different machine and deep learning models to see how I can maximize performance of the classification and of the overall application.

Working on this project has been quite the learning experience and honestly, a really enjoyable time. I really appreciate those of you that took time out of your day to read this and hope that I can garner the opinions and expertise of those of you from this thread to make this into something awesome.

submitted by /u/C3NZ
[link] [comments]