Author: torontoai

[P] How Not to Fail Your Machine Learning Interview

Written on November 3, 2019. Posted in Reddit MachineLearning.

How Not to Fail Your Machine Learning Interview — Getting an interview is easy, not mucking it up can be hard

https://medium.com/ai%C2%B3-theory-practice-business/how-not-to-fail-your-machine-learning-interview-9545a67b35bc

submitted by /u/cdossman
[link] [comments]

[N] PyTorch-NLP 0.5.0 Released! Heres to contributing back to open source, hurah! 🤗

Written on November 3, 2019. Posted in Reddit MachineLearning.

Hi There! 🍪

2 years into PyTorch-NLP and another 6 months from the previous release, I am releasing PyTorch-NLP 0.5.0. Also, with your help, we’ll break 50,000 downloads! Thank you 🙂 I love helping the community because I myself benefit from the hard work of other open source contributors!

As always, the theme of PyTorch-NLP is to be small, extensible and intuitive much like PyTorch is! And the goal is to extend PyTorch with basic NLP utilities.

Here are the release notes highlights: 🐕

Python 3.5 Support, Sampler Pipelining, Finer Control of Random State

Major Updates

Updated my README emoji game to be more ambiguous while maintaining fun and heartwarming vibe.
Support for Python 3.5.
Extensive rewrite of README.md to focus on new users and building an NLP pipeline. See here.
Support for Pytorch 1.2.
Added torchnlp.random for finer grain control of random state building on PyTorch’s fork_rng. This module controls the random state of torch, numpy and random. “`python import random import numpy import torch

from torchnlp.random import fork_rng

with fork_rng(seed=123): # Ensure determinism print(‘Random:’, random.randint(1, 231)) print(‘Numpy:’, numpy.random.randint(1, 231)) print(‘Torch:’, int(torch.randint(1, 2**31, (1,)))) - Refactored `torchnlp.samplers` enabling pipelining. For example: python from torchnlp.samplers import DeterministicSampler from torchnlp.samplers import BalancedSampler

data = [‘a’, ‘b’, ‘c’] + [‘c’] * 100 sampler = BalancedSampler(data, num_samples=3) sampler = DeterministicSampler(sampler, random_seed=12) print([data[i] for i in sampler]) # [‘c’, ‘b’, ‘a’] - Added `torchnlp.samplers.balanced_sampler` for balanced sampling extending Pytorch's `WeightedRandomSampler`. - Added `torchnlp.samplers.deterministic_sampler` for deterministic sampling based on `torchnlp.random`. - Added `torchnlp.samplers.distributed_batch_sampler` for distributed batch sampling that's more extensible and less restrictive than PyTorch's version. - Added `torchnlp.samplers.oom_batch_sampler` to sample large batches first in order to force an out-of-memory error earlier rather than later into training. - Added `torchnlp.utils.get_total_parameters` to measure the number of parameters in a model. - Added `torchnlp.utils.get_tensors` to measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and for `torchnlp.samplers.oom_batch_sampler`. python from torchnlp.utils import get_tensors

randomobject = tuple([{‘t’: torch.tensor([1, 2])}, torch.tensor([2, 3])]) tensors = gettensors(random_object) assert len(tensors) == 2 “`

submitted by /u/Deepblue129
[link] [comments]

[D] The “test set” is nonsense

Written on November 2, 2019. Posted in Reddit MachineLearning.

I often see ML practitioners, and even experts, pose the idea of the “test set” as the ultimate benchmark of a model’s performance. This is nonsense – and I’ll explain why.

Suppose you gather some data, label it, preprocess it, and compile it into a dataset. Now it’s time to split your data – train, validation, test; how will you do it?

Random selection; may work well if the dataset is large enough
Engineered selection – assign samples to each set according to some rationale

The ultimate goal of an ML model is generalization; as such, said ‘rationale’ could be:

The best test set is comprised of the most “realistic” or “difficult” samples. Problem: model performance is harmed by artificially biasing the train set to exclude realistic/difficult samples.
The best set split is, each gets same quality data. Justification: “poorer” quality usually means (a) noise; (b) low complexity (“too obvious”). Whatever the description, if you test the model on an information landscape ( / probability distribution) that it wasn’t trained on, the model may perform poorly simply because it “learned” little that’s relevant to the test set.

Thus, “split equally” should work best. Onto the problem: why do we use a test set at all? Because – we “fit” the validation set with our hyperparameters, and we need to test on “never seen” data to avoid bias. Indeed, agreed – the test set does suppress said bias. But here’s its red line: variance.

Direct statistical theory: a sample is an approximation of the population distribution, with an uncertain mean, standard deviation, & other. The more complex the problem, the greater the variation. — So, is there a solution? Yes: K-Fold Cross-Validation. Per known theory, K-Fold CV can significantly slash variance of model performance – the higher the “K”, the better. Without it, classification error can easily differ by 5-15%, if not 20-30%. When deciding what’s “SOTA”, every single percentage point can be a battle hard-fought – so a “mere 5%” is already astronomical.

One may counter-argue, “it’s fine if the test set is large enough”. Except it’s not fine; you get a “large enough” test set by either sacrificing train data, or, dataset is large enough so that you can make an even validation-test split. Former’s undesirable for obvious reasons – and in latter, unless you have a gargantuan dataset (extremely rare), your test samples are still subject to significant-enough variance; merely swapping test & validation samples can flip tables.

As a final punchline, note that the random seed can also substantially impact final outcome, further amplifying variance. Consequence: you don’t know you did well because Dropout(0.5) works better than Dropout(0.2) or because dice rolled nicely. K-Fold CV will also reduce seed variance as a side-effect, but ideally (though often prohibitively) you’d do “K seeds”.

Verdict: test set isn’t good for testing. Instead, use K-fold CV, which both better estimates generalization performance by reducing variance, and allows using more train data.

Though I am knowledgeable on the topic, I’m not an “expert” – and even experts disagree. Thus, counterarguments welcome.

submitted by /u/OverLordGoldDragon
[link] [comments]

[D] How do you handle sparse features?

Written on November 2, 2019. Posted in Reddit MachineLearning.

I am working on a problem where I have a sequence of events happening, every event generate a set of tokens (some of the tokens are shared between the events, but not all), the task is to categorize the behavior that generated this set of events.

Let me give you a simple example to have an understanding on the input.

event_type	order	value_type_1	value_1	value_type_2	value_2
E1	1	alpha 1	24	alpha 2	33
E2	2	beta	120
E1	3	alpha 1	234	alpha 2	56
E3	4	theta	150
E4	5

You can notice for example that the token “theta” doesn’t exist in event_type E2, it only exist in some event types.

If I want to do feature engineering in this case, what is the best way to vectorize my data. If I take the token, and try to put this way, I will end up with a very sparse features.

event_type	order	alpha 1	alpha 2	beta	theta
E1	1	24	33
E2	2			120
E1	3	234	56
E3	4				150
E4	5

If I construct my features this way, it will be very sparse and it doesn’t make sense to consider it as missing data (because the data doesn’t exist in first place).

I don’t want to apply data imputation method such filling the last value (You can see below the example, I have added the number in bold to show it as an example) . The reason is that some event type are very frequent, and some event types are not.

event_type	order	alpha 1	alpha 2	beta	theta
E1	1	24	33	0	0
E2	2	24	33	120	0
E1	3	234	56	120	0
E3	4	234	56	120	150
E4	5	234	56	120	150

If you were in my shoes, how would you treat this problem?. Ideas, references are welcomed.

If you are wondering what do I want to do, I want to categorize the behavior that generated this set of events. I can experiment with any method if I get feature engineering right (you can think of clustering as an example).

submitted by /u/__Julia
[link] [comments]

[D] Machine Learning – WAYR (What Are You Reading) – Week 74

Written on November 2, 2019. Posted in Reddit MachineLearning.

This is a place to share machine learning research papers, journals, and articles that you’re reading this week. If it relates to what you’re researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you’ve read.

Please try to provide some insight from your understanding and please don’t post things which are present in wiki.

Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links.

Previous weeks :

1-10	11-20	21-30	31-40	41-50	51-60	61-70	71-80
Week 1	Week 11	Week 21	Week 31	Week 41	Week 51	Week 61	Week 71
Week 2	Week 12	Week 22	Week 32	Week 42	Week 52	Week 62	Week 72
Week 3	Week 13	Week 23	Week 33	Week 43	Week 53	Week 63	Week 73
Week 4	Week 14	Week 24	Week 34	Week 44	Week 54	Week 64
Week 5	Week 15	Week 25	Week 35	Week 45	Week 55	Week 65
Week 6	Week 16	Week 26	Week 36	Week 46	Week 56	Week 66
Week 7	Week 17	Week 27	Week 37	Week 47	Week 57	Week 67
Week 8	Week 18	Week 28	Week 38	Week 48	Week 58	Week 68
Week 9	Week 19	Week 29	Week 39	Week 49	Week 59	Week 69
Week 10	Week 20	Week 30	Week 40	Week 50	Week 60	Week 70

Most upvoted papers two weeks ago:

/u/ecart33: https://arxiv.org/abs/1906.00817v1

Besides that, there are no rules, have fun.

submitted by /u/ML_WAYR_bot
[link] [comments]

[D] Heads up – MLPerf Inference results publishing Wednesday

Written on November 2, 2019. Posted in Reddit MachineLearning.

MLPerf, a project to benchmark machine learning hardware, is publishing their first round of Inference results this Wednesday.

Take some time to review the precise challenge they’re putting the hardware to: https://mlperf.org/inference-overview/ and the general rules for Inference submissions: mlperf/inference_policies: inference_rules.adoc

I’m excited to see some of the low-power chip results.

Source for date: #single-submission-round-schedule – Submission for this cycle was October 11th so therefore Week 1 Monday is October 14th, and Week 4 Wednesday (publication day) is November 6th, 10AM US/Pacific time.

submitted by /u/riking27
[link] [comments]

[D] DeepMind’s PR regarding Alphastar is unbelievably bafflingg.

Written on November 2, 2019. Posted in Reddit MachineLearning.

David Silver hinted that DeepMind is done with Starcraft in a BBC news article saying “the lab may rest now” and that they have “completed the Starcraft challenge”.

I thought this was a little disappointing since the skill level Alphastar reached on ladder was not enough to beat professional players. I think we all wanted a real nice showdown between the human champion and the robot, right? That’d been pretty cool.

The Nature paper had a nice graph depicting Alphastar’s MMR which is basically Blizzard’s version of elo rating. The Protoss agent had reached an MMR of ~6200 and the aggregate of all three races was 6030 iirc. The graph also had MMR’s of Alphastar’s opponents and information on whether the agent won or lost.

Basically Alphastar had lost all but 2 games against players who had higher than 6200 MMR. On ladder, it could not beat the professionals.

The agent from January was estimated to have been over 7000 MMR. I figured it’d be nice to estimate how well this newest agent would have fared against Mana. Right now, MaNa’s MMR is ~6700.

So I looked at the EU ladder, found someone with an MMR of ~6200, popped him and MaNa into Aligulac (sc2 database) and let it estimate some odds. MaNa had ~75% chance of winning a Best of 5, and his 6200 MMR opponent had less than 1% chance of beating MaNa 5-0.

At this point I became convinced that DeepMind was throwing in the towel on sc2 because the cost of further improving Alphastar was too high to justify the publicity they were getting from the project. The team looked to be moving on to different things and the showmatch vs the world champion had been cancelled.

But then something absolutely baffling happened which I don’t think anyone saw coming.

Blizzcon was this weekend. With little to no fanfare DeepMind had brought Alphastar with them and let Blizzcon visitors play against it. Serral, one of the best players in the world, had just finished top 4 in the biggest tournament of the year wandered to the arcade and played a few games against the bot. Serral’s MMR is over 7000.

He lost 0-3 to the Protoss agent. These games were not televised. All we have is some blurry smartphone footage. https://mobile.twitter.com/LiquidTLO/status/1190779241564000256

I don’t get it. If Alphastar was this strong why didn’t DeepMind let it play more on ladder and get a higher ranking? Why didn’t they organize a showmatch or something? They dropped the ball pretty hard on this one. This is so confusing to me.

First they beat two professional players but were hit with a huge, imo warranted backlash due to the APM controversy.

Then they produced agents under more proper mechanical limitations and the agents turned out to be much weaker than the previous version.

Finally, they beat the best player In the world, seemingly accidentally while no one was looking.

From PR standpoint, could this have gone any worse for Deepmind?

submitted by /u/SoulDrivenOlives
[link] [comments]

[Discussion] On what basis are anchors chosen in YOLO algorithm?

Written on November 2, 2019. Posted in Reddit MachineLearning.

So we had a poject review, and our teacher asked us on what basis anchors are chosen in YOLO, Faster R-CNN and the lot.

Now I have no idea one what criterion is it based, so if anyone has something to say on this, please do. I would appreciate it!

submitted by /u/kirasama16997
[link] [comments]

[D] Is finetuning on part of the evaluation dataset acceptable for publishing machine learning papers?

Written on November 2, 2019. Posted in Reddit MachineLearning.

Hello everyone,

I have been trying yo reproduce the results of a SOTA paper regarding object detection. I have reimplemented their method and trained on the same dataset, based on the paper, however I was not able to achieve their results on the datasets they use for evaluation, no matter what I have tried.

Then I also studied their referenced papers and realised that many of them use a train-test split strategy for evaluating their models. This means that they use a part of the evaluation dataset for finetuning their already trained model and then evaluate it on the testing part of the same dataset. In the case of these papers, this fact was explicitly mentioned. I think that this also happened in the paper I tried to reproduce. However, they don’t mention it.

My question for discussion is, what do you think about this strategy? Is finetuning on part of the evaluation dataset a way to go? What about generalisation on totally unknown data? In my opinion it is ok if explicitly mentioned. Totally uncool in the opposite case, though.

EDIT: Just a clarification to be on the same page. What I mean by train, test and validation sets is a big dataset which is split in those three subsets.

By evaluation dataset I mean a benchmark dataset which researchers use to report their results on a specific task. So, finetuning on part of the evaluation dataset is about retraining on a part of the benchmark dataset and later report the results on the rest of it, that was not seen during finetuning.

submitted by /u/roset_ta
[link] [comments]

parameter sharing decoder pair for auto composing

Written on November 1, 2019. Posted in Reddit MachineLearning.

submitted by /u/edisonzhao
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Author: torontoai

[P] How Not to Fail Your Machine Learning Interview

[N] PyTorch-NLP 0.5.0 Released! Heres to contributing back to open source, hurah! 🤗

Python 3.5 Support, Sampler Pipelining, Finer Control of Random State

Major Updates

[D] The “test set” is nonsense

[D] How do you handle sparse features?

[D] Machine Learning – WAYR (What Are You Reading) – Week 74

[D] Heads up – MLPerf Inference results publishing Wednesday

[D] DeepMind’s PR regarding Alphastar is unbelievably bafflingg.

[Discussion] On what basis are anchors chosen in YOLO algorithm?

[D] Is finetuning on part of the evaluation dataset acceptable for publishing machine learning papers?

parameter sharing decoder pair for auto composing