Category: Reddit MachineLearning

[R] 3D Ken Burns Effect from a Single Image

Written on September 17, 2019. Posted in Reddit MachineLearning.

submitted by /u/sniklaus
[link] [comments]

[Discussion] What is the state-of-the-art for entity extraction and relation extraction?

Written on September 17, 2019. Posted in Reddit MachineLearning.

Hi,

I am looking for the state-of-the-art entity extraction/relation extraction algorithms that are practical to implement and use for commercial information extraction. An example:

“Mr. Wilken is the CEO of Foobar, inc. “

Entities are Mr. Wilken, CEO, Foobar, inc. Mr. Wilken’s title is CEO. Mr. Wilken works at Foobar, Inc (transitively he is then the CEO of Foobar, Inc.).

In my experience I’ve used CRF used hand crafted features for entity tagging followed by a classifier to determine relations between entities use hand crafted features. This is a pretty old school approach and does not leverage any of the advances in word embeddings (Glove, BERT, etc.). I know there are also methods for doing joint entity + relation extraction.

There are dozens of papers on Google scholar, and I’m not sure which ones would be worth implementing. I’m looking for recommendations of recent papers to read that would get me started.

submitted by /u/Tash_is_Aslan
[link] [comments]

[R] Emergent Tool Use from Multi-Agent Interaction

Written on September 16, 2019. Posted in Reddit MachineLearning.

Paper: https://d4mucfpksywv.cloudfront.net/emergent-tool-use/paper/Multi_Agent_Emergence_2019.pdf

Blog: https://openai.com/blog/emergent-tool-use/

TLDR: Hide and seek game where there are moveable blocks + ramps. Hiders and seekers learn to use them to complete task. Trained with PPO + transformer NN for representing objects.

Abstract: Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a selfsupervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.

submitted by /u/ivalm
[link] [comments]

[D] Suggestions on good practice when merging k-means centroids

Written on September 16, 2019. Posted in Reddit MachineLearning.

Hi, I posted this on /r/datascience but thought I’d x-post for visibility.

I was wondering if I could get some feedback into whether my methodology is problematic or not.

I’m working with a pre-established set of 12 cluster centroids in a classification problem, based on the output of a 42-element 2D joint histogram. When classifying points, these histograms are collapsed so that the classification is only done on a 3-element vector representing the mean quantities of the data point.

The purpose of this is to identify cloud types, to feed into some of my cluster-specific analysis. Now, three adjacent cluster centroids all refer to the same cloud type, however they have different ‘thicknesses’. In my final work, I’d like there to be just a single cluster to represent this type. I worry though, by just merging by, for instance, taking the mean of these centroids, the classification step will miss many points that would’ve ordinarily been assigned to these clusters, because they might then be closer to another centroid which doesn’t represent the data point accurately.

My idea is to classify my datapoints with a codebook containing the three centroids (say, clusters 1, 2, and 3). After allocating all my points, I’d then merge the clusters together into a single classification. This would then result in a cluster which has been manually extended to capture points that wouldn’t ordinarily be in it.

Is this a problematic way of merging clusters, as opposed to say, taking the mean of the three cluster centroids? Or are there better ways of doing this?

I’ve drawn out a basic diagram attempting to illustrate what I mean – https://i.imgur.com/aGIW3f5.jpg Thanks a lot in advance

EDIT: I’ve looked at agglomerative clustering but I’m working off an (almost) nicely defined set of clusters, aside from this issue. I tried merging cluster centroids using agglomerative but unfortunately it agglomerated together two which I didn’t want merged. (PS. is this how you do agglom clustering? Can you just train the agglom algorithm by passing it the original k-means centroids?)

submitted by /u/The_Foetus
[link] [comments]

[R] BERT fine-tuning and Contrastive Learning

Written on September 16, 2019. Posted in Reddit MachineLearning.

Hi, I wrote up some research I’ve been doing over the summer, mainly on combining transformer networks with contrastive learning in order to learn better sentence representations.

https://jcaip.github.io/Summer-Research/

Please lmk if you have any questions or comments!

submitted by /u/kingcai
[link] [comments]

[D] Consistency of Impact for Selected Features Across Model Types

Written on September 16, 2019. Posted in Reddit MachineLearning.

I’ve been experimenting with DataRobot as of late and I’ve noticed that given a fixed set of features, the choice of n (e.g. 10) “most impactful” features differs significantly from one model type to another. Given that different algorithms may have differing sensitivity to input data type and specific effects present in the training data set, how would one go about shortlisting a set of features that would be consistently impactful across a variety of algorithms?

Current train of thought has me stuck at two options: 1. Rank top n features by model impact for various models (say top 5 best performing models) and select features that show the least change in ranking; or 2. Examine pairwise correlation with target for all features and if variables with low correlation are selected in the final model, sanity check for nonlinear relationship using contour plots.

Your thoughts/comments are much appreciated.

P. S. Discussion is motivated by my model validation team that insists on me benchmarking every model against logistic regression because that’s the only one they understand. Theme of the month is “Why are the variables selected in your model significantly different from those used in our traditional LR model?”.

submitted by /u/furyincarnate
[link] [comments]

[D] Machine Learning In Field of Cyber Security

Written on September 16, 2019. Posted in Reddit MachineLearning.

Hi, I found some companies of Information Security and Cyber Security that implement Machine Learning, but I couldn’t understand in what field or sector of InfoSec will they be implementing cyber security? What I could think of is anomalies in logs? Any more ways you guys think it can be implemented?

submitted by /u/shivammehta007
[link] [comments]

[R] Meta-Learning with Implicit Gradients

Written on September 16, 2019. Posted in Reddit MachineLearning.

New meta-learning approach that doesn’t require to backprop through the inner problem dynamics (Uses the Implicit Function theorem). The paper has been accepted at NeurIPS:

https://arxiv.org/abs/1909.04630

submitted by /u/rikkajounin
[link] [comments]

[D] If you know SQL, you probably understand Transformer, BERT and GPT

Written on September 16, 2019. Posted in Reddit MachineLearning.

Distilling the Transformer Architecture into its First Principle. Link

submitted by /u/transformer_ML
[link] [comments]

[D] Keras output of simple network dependent on batch size

Written on September 15, 2019. Posted in Reddit MachineLearning.

https://github.com/keras-team/keras/issues/13328

Depending on the other data in the batch, keras will return different results. My test shows very minor differences but alas, they should all be identical. Additionally, I have datasets where this error grows considerably. I am working on creating a test I can release showing the issue.

EDIT 1: fixed typo.

submitted by /u/idg101
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[R] 3D Ken Burns Effect from a Single Image

[Discussion] What is the state-of-the-art for entity extraction and relation extraction?

[R] Emergent Tool Use from Multi-Agent Interaction

[D] Suggestions on good practice when merging k-means centroids

[R] BERT fine-tuning and Contrastive Learning

[D] Consistency of Impact for Selected Features Across Model Types

[D] Machine Learning In Field of Cyber Security

[R] Meta-Learning with Implicit Gradients

[D] If you know SQL, you probably understand Transformer, BERT and GPT

[D] Keras output of simple network dependent on batch size