Category: Reddit MachineLearning

[R] Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Written on October 28, 2019. Posted in Reddit MachineLearning.

Hello Reddit! We reviewed state-of-the-art Adversarial Attacks as well as Defenses against them in our paper. We cover images, graphs and text domains.

I eagerly look forward to your comments!

Paper: https://arxiv.org/abs/1909.08072

submitted by /u/debayandeb3050
[link] [comments]

[P] Would like some ideas for a student project based on city data

Written on October 28, 2019. Posted in Reddit MachineLearning.

Hello everyone I have a project for my AI machine learning course that will be based on city data, namely Vancouver. The below link are the data sets that our project can be based upon. We are free to use outside data sets but must relate it to our city.

https://opendata.vancouver.ca/explore

I would really appreciate any guidance or input. We’re having a difficult time coming up with ideas, the only ones we have come up with are predicting of house property, bike theft, general crime prediction all of which would combine features from other data sets.

Thank you for reading my post!

submitted by /u/JohnMcClapperson
[link] [comments]

[D] I need to interpolate some of my data and have some design decisions about where in my pipeline this should happen.

Written on October 28, 2019. Posted in Reddit MachineLearning.

I am working with a database that is spread across 7 tables and for ML stuff I need to join them together. However, as some of these rows are not sampled as frequently as others, this leaves a lot of nulls for some features. I want to interpolate these values but I’m not sure the most efficient way to do so. In other words, let’s say I have feature X sampled every 10 ms and feature Y every 1 second, and a third feature Z sampled every 15 seconds. I could store it in the database, but I don’t know if allowing that kind of storage capacity is feasible for us. Alternatively, I could calculate it for each row when I get batches for training, but I’m afraid that will become a bottleneck depending on how fast the interpolation is. Is there some obvious way of interpolating this efficiently that I’m not thinking of that will allow me to save on memory space?

submitted by /u/zcleghern
[link] [comments]

[D] Speech Recognition Pretrained Model with LM

Written on October 28, 2019. Posted in Reddit MachineLearning.

Hi everyone,

I have a project where I have to do a quick POC of speech recognition in noisy environment and multispeaker setting. However, I am having a hard time finding a pretrained model with language model rescoring (or any other decoding helper). Any repo or links are welcome!

submitted by /u/nottakumasato
[link] [comments]

[D] Average conference or workshop held in conjunction with a highly reputed conference?

Written on October 28, 2019. Posted in Reddit MachineLearning.

For preliminary research work, should one submit their paper to an average conference (e.g. ACCV, IJCNN, WACV, BMVC) or to a workshop organised in conjunction with a highly reputed conference (e.g., ICML/NeurIPS/ICLR/AAAI workshop)? What are the pros and cons for each option?

[D] For GNN’s are gradients normally tracked on neighborhood aggregation operations (e.g. max, mean)?

Written on October 28, 2019. Posted in Reddit MachineLearning.

I am writing a GNN from scratch, to demonstrate to myself that I understand all the required concepts.

I am a bit confused on whether neighborhood aggregation operations require gradients to be tracked through those operations like mean and max of neighbors embeddings. In my code where I perform these operations, currently I do them within a with torch.no_grad() block because if I don’t each epoch takes forever.

Here my code for those operations:

def neighborhood_aggregation(self, adj_lists, feat, agg_method): # adj_lists is a dict of neighbors for every node in graph # e.g. adj_list = {0:{1, 4, 5, 6}, 1: {2, 4, 5}, ...} # node 0 has neighbors 1, 4, 5, 6 with torch.no_grad(): # construct aggregated neighborhood embedding dim = list(feat.size()) n_nodes = dim[0] feat_dim = dim[1] aggregated_embed = torch.Tensor(n_nodes, feat_dim) # aggregated embeddings for all nodes in graph. embed_element_vec = torch.arange(feat_dim) # for node_id, neighbor_node_ids in adj_lists.items(): neighborhood_embedding = feat[list(neighbor_node_ids), :] if agg_method == 'mean': aggregated_neigborhood_embedding = torch.mean(neighborhood_embedding, 0) elif agg_method == 'pool': aggregated_neigborhood_embedding = torch.max(neighborhood_embedding, 0)[0] else: raise KeyError('Aggregator type {} not recognized.'.format(agg_method)) aggregated_embed[node_id, embed_element_vec] = aggregated_neigborhood_embedding return aggregated_embed

Note: The above code works, and I am getting very good results with it. It’s just I am not sure if what I am doing is wrong. IF it is wrong I was thinking that I need a 3D tensor for the aggregated_embed tensor [n_nodes, n_neighbors, embed_dim] (which requires_grad=False) and perform the mean/max on that tensor which would track gradients.

Thanks for any help.

submitted by /u/Muunich
[link] [comments]

[N] Even notes from Siraj Raval’s course turn out to be plagiarized.

Written on October 28, 2019. Posted in Reddit MachineLearning.

More odd paraphrasing and word replacements.

From this article: https://medium.com/@gantlaborde/siraj-rival-no-thanks-fe23092ecd20

Left is from Siraj Raval’s course, Right is from original article

‘quick way’ -> ‘fast way’

‘reach out’ -> ‘reach’

‘know’ -> ‘probably familiar with’

‘existing’ -> ‘current’

Original article Siraj plagiarized from is here: https://www.singlegrain.com/growth/14-ways-to-acquire-your-first-100-customers/

submitted by /u/Kitchen_Extreme
[link] [comments]

[P] Lyrics Generator Twitter Bot

Written on October 28, 2019. Posted in Reddit MachineLearning.

I fine-tuned 2 small GPT-2 models (124M parameters) and created twitter bots that interact with Twitter users.

I have shared the code and useful things I learned and used hoping it will help somebody in the following repository :

https://jsalbert.github.io/lyrics-generator-twitter-bot/

The following samples correspond to the outputs of such models.

Eminem Bot Lyrics (@rap_god_bot)

https://preview.redd.it/anndufmguhv31.png?width=600&format=png&auto=webp&s=e027a50442f71b64fbcbe8821ed843c6d6823ead

Music Storytelling Bot Lyrics (@musicstorytell)

https://preview.redd.it/lo8qhzshuhv31.png?width=600&format=png&auto=webp&s=c0a609f649bb3daeeea18aa91c43165c7216f038

submitted by /u/jsalbert_
[link] [comments]

[1910.11908] Noisier2Noise: Learning to Denoise from Unpaired Noisy Data

Written on October 28, 2019. Posted in Reddit MachineLearning.

submitted by /u/Imnimo
[link] [comments]

[D] The roots of natural language processing can be traced back to Kabbalist mystics

Written on October 28, 2019. Posted in Reddit MachineLearning.

For people interested in the history of technology — here’s an eccentric essay arguing that the first examples of NLP happened in medieval times. Mystics studying the Kabbala devised “sacred rules” for combining letters to generate prophetic texts and, sometimes, to create golems.

https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/natural-language-processing-dates-back-to-kabbalist-mystics

“While specific technologies have changed over time, the basic idea of treating language as a material that can be artificially manipulated by rule-based systems has been pursued by many people in many cultures and for many different reasons. These historical experiments reveal the promise and perils of attempting to simulate human language in non-human ways—and they hold lessons for today’s practitioners of cutting-edge NLP techniques.”

submitted by /u/newsbeagle
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[R] Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

[P] Would like some ideas for a student project based on city data

[D] I need to interpolate some of my data and have some design decisions about where in my pipeline this should happen.

[D] Speech Recognition Pretrained Model with LM

[D] Average conference or workshop held in conjunction with a highly reputed conference?

[D] For GNN’s are gradients normally tracked on neighborhood aggregation operations (e.g. max, mean)?

[N] Even notes from Siraj Raval’s course turn out to be plagiarized.

[P] Lyrics Generator Twitter Bot

[1910.11908] Noisier2Noise: Learning to Denoise from Unpaired Noisy Data

[D] The roots of natural language processing can be traced back to Kabbalist mystics