Category: Reddit MachineLearning

[D] Classifying malware based on API calls

Written on December 23, 2019. Posted in Reddit MachineLearning.

Hi guys,

I am new to machine learning and after trying out TensorFlow’s tutorial on how to create a classifier based on IMDb reviews, I want to create my own classifier to actually do a binary classification(malicious/benign) of maybe .exe or .apk files.

I was wondering if I can actually proceed to do the same thing as what tensorflow’s IMDb tutorial did, i.e train using a set of text + give those text a label (pos/neg).

So in the context of classifying malware, those texts are actually system API calls. i.e

Set 1 [ func1() func2() func3() func4() func5() func6()…etc] Label -> Malicious

Set 2 [func1() func3() func4() func5()] Label -> benign

Sequence of the API call matters btw and i heard to do that I will need to use RNN LSTM.

I would love to hear from you guys if this is the correct way to do things…would most likely target Android applications…

submitted by /u/yourspeaker317
[link] [comments]

[Project] Curated list of computational narratology papers

Written on December 23, 2019. Posted in Reddit MachineLearning.

Hi!

You might remember me from the blog I posted a few days ago link.

I received an absolute onslaught of emails (close to 30 emails!!!). The main question I got was “Wow computational narratology seems pretty cool! Where do I get started? I’ve only seen paper XYZ”

As such, I decided rather than answering ever email independently, I would create a curated list of papers!

https://github.com/LouisCastricato/Narratology-Papers

Feel free to contribute (PRs are welcome!) I’ll be working on this for the next few hours, so it should be a couple dozen papers by tomorrow 🙂

submitted by /u/FerretDude
[link] [comments]

[N] 4 Months after Siraj was caught scamming he has still not refunded any victims based in India, Philippines, or any other countries with no legal recourse. He makes an apology video, and when his victims ask for their refund, his followers respond with “Be kind. He’s asking for your forgiveness”

Written on December 22, 2019. Posted in Reddit MachineLearning.

This is fucking sick..

People based in India, the Philippines, and other countries that do not have the resources to go after Siraj legally are those who need the money the most. 200$ could be a months worth of salary, or several months. And the types of people who get caught up in the scams are those who genuinely looking to improve their financial situation and work hard for it. This is fucking cruel.

I’m having a hard time believing Siraj’s followers are that brainwashed. Most likely alt accounts controlled by Siraj.

https://i.imgur.com/6cUhQDO.png

https://i.imgur.com/TDx5ELA.png

submitted by /u/RelevantMarketing
[link] [comments]

[D] On Disentangling disentanglement: Making sense of a dozen papers on disentanglement at NEURIPS — 2019

Written on December 22, 2019. Posted in Reddit MachineLearning.

As I am surveying the work published on disentanglement at NEURIPS — 2019, I would like to share some of the cheat-sheets, video lecture playlist and a github repo I have curated in this regard with the ML community here. Hope you find this useful in some way!

Github: https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019

Blogpost: https://medium.com/@VinayPrabhu/disentangling-disentanglement-in-deep-learning-d405005c0741

submitted by /u/VinayUPrabhu
[link] [comments]

[D] Is NeurIPS getting too big?

Written on December 22, 2019. Posted in Reddit MachineLearning.

Andrew Kurenkov gives a retrospective here

https://thegradient.pub/neurips-2019-too-big/

submitted by /u/hughbzhang
[link] [comments]

[D] Summary of best papers of NIPS, ACL, EMNLP 2019

Written on December 22, 2019. Posted in Reddit MachineLearning.

Hi,

Vision and Language Group, a deep learning group at IIT Roorkee, has written summaries for various NeurIPS, ACL, EMNLP 2019 papers:

Putting an End to End-to-End: Gradient-Isolated Learning of Representations
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/infomax.md
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/srn.md
Specializing Word Embeddings (for Parsing) by Information Bottleneck
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/info_bottleneck.md
Designing and Interpreting Probes with Control Tasks
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/control_tasks.md
Bridging the Gap between Training and Inference for Neural Machine Translation
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/NMT_Gap.md
Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/ecpe.md
Do you know that Florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/florence.md
Zero-Shot Entity Linking by Reading Entity Descriptions
https://github.com/vlgiitr/papers_we_read/blob/master/summaries/entity_linking.md

If you found the summaries useful do star the repo. The repo will be constantly updated with summaries of more papers from tier-1 conferences.

submitted by /u/vlg_iitr
[link] [comments]

[P] DQN Loss Function Question

Written on December 22, 2019. Posted in Reddit MachineLearning.

Hi All,

I am attempting to implement my own DQN without looking at the source code. I am torn between two possible approaches for evaluating the loss.

The basic approach could be to take the gradient over the selected action and to set the gradient of all other action value outputs to 0. My worry is that changes to improve the Q-value of the selected action will incidentally change the outputs for Q-values of other actions, meaning our other estimates will become less accurate. One way to solve this would be to set the labels for the non-chosen actions to be the current output of the Q-network, so that the network is incentivized not to change the output values of these actions. However, I have not seen this approach very much on the forums I’ve checked so I’m assuming it is bad for some reason.

Can anyone shed some light on which approach is better to take?

submitted by /u/Braindoesntwork2
[link] [comments]

[D] Objective: Masked Language Model vs Autoencoding

Written on December 22, 2019. Posted in Reddit MachineLearning.

Let’s say we have a simple “autoencoding transformer” architecture:

encoder
bottleneck (Z)
decoder

We can train the model either using:

the Masked Language Model objective, where we mask random inputs / replace them with a null token, and measure the loss on reconstruction of the masked inputs
or the Autoencoding objective, where we don’t mask anything, and measure the loss on reconstruction of all inputs

Now we ask about the properties of Z – the latent representation of the data, after the model is trained. Will Z differ between the two objectives? How will it differ? Will it capture different information? Which loss will preserve more information in Z?

Does this have an obvious interpretation? Any intuitions?

submitted by /u/maskedlanguagemodel
[link] [comments]

[P] I have built a face detector to blur faces for videos

Written on December 22, 2019. Posted in Reddit MachineLearning.

https://github.com/JeiKeiLim/noone_video

I needed to blur faces on the video dataset I collected previously for privacy issues.

Surprisingly, I couldn’t find related (free)services nor programs.

Firstly, I wrote simple code for automatic blurred face maker with a Haar detector using OpenCV. I didn’t like the result. Haar detector was fast but not accurate enough for me.

Then, I tried the DNN model which OpenCV already includes. The performance got better but because the input of DNN is 300×300 and my input video is 1080×1920, faces from long-distance were often ignored.

So, I split the input video 3 to 5 divisions horizontally and vertically and computed face heatmap using the confidence of each detection result. Final face detection was done by contour detection from OpenCV.

The result seems that it can detect small faces from relatively high-resolution videos.

Any suggestions or related papers? would be more than welcome!

submitted by /u/workout_JK
[link] [comments]

[D] Siraj Raval’s Apology

Written on December 22, 2019. Posted in Reddit MachineLearning.

Siraj Raval recently produced a new Apology video ( https://youtu.be/1zZZjaYl4AA ):

Doing high quality, original work is what ultimately pushes society forward and improves people’s lives. If people started thinking that if they put something high quality and original on Github, a random youtuber would come along and claim it as their own work, then people might just start putting fewer things on GitHub to begin with, and the world would be a worse place. That’s why its so critical to not plagiarize. The ideal world we want to live in is one where the people who actually do high quality & original work are the ones who get the credit. It took my reputation blowing up in my face for me to realize that, both in the Neural Qubit paper case & my content more generally. I hope my painful fall serves as a valuable lesson to everyone else. This is my apology video.

He put together a list of GitHub repos he took code from to produce his videos.

submitted by /u/milaworld
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[D] Classifying malware based on API calls

[Project] Curated list of computational narratology papers

[D] On Disentangling disentanglement: Making sense of a dozen papers on disentanglement at NEURIPS — 2019

[D] Is NeurIPS getting too big?

[D] Summary of best papers of NIPS, ACL, EMNLP 2019

[P] DQN Loss Function Question

[D] Objective: Masked Language Model vs Autoencoding

[P] I have built a face detector to blur faces for videos

[D] Siraj Raval’s Apology