[P] How Not to Fail Your Machine Learning Interview
How Not to Fail Your Machine Learning Interview — Getting an interview is easy, not mucking it up can be hard
submitted by /u/cdossman
[link] [comments]
How Not to Fail Your Machine Learning Interview — Getting an interview is easy, not mucking it up can be hard
submitted by /u/cdossman
[link] [comments]
Hi There! 🍪
2 years into PyTorch-NLP and another 6 months from the previous release, I am releasing PyTorch-NLP 0.5.0. Also, with your help, we’ll break 50,000 downloads! Thank you 🙂 I love helping the community because I myself benefit from the hard work of other open source contributors!
As always, the theme of PyTorch-NLP is to be small, extensible and intuitive much like PyTorch is! And the goal is to extend PyTorch with basic NLP utilities.
Here are the release notes highlights: 🐕
torchnlp.random for finer grain control of random state building on PyTorch’s fork_rng. This module controls the random state of torch, numpy and random. “`python import random import numpy import torchfrom torchnlp.random import fork_rng
with fork_rng(seed=123): # Ensure determinism print(‘Random:’, random.randint(1, 231)) print(‘Numpy:’, numpy.random.randint(1, 231)) print(‘Torch:’, int(torch.randint(1, 2**31, (1,)))) - Refactored `torchnlp.samplers` enabling pipelining. For example: python from torchnlp.samplers import DeterministicSampler from torchnlp.samplers import BalancedSampler
data = [‘a’, ‘b’, ‘c’] + [‘c’] * 100 sampler = BalancedSampler(data, num_samples=3) sampler = DeterministicSampler(sampler, random_seed=12) print([data[i] for i in sampler]) # [‘c’, ‘b’, ‘a’] - Added `torchnlp.samplers.balanced_sampler` for balanced sampling extending Pytorch's `WeightedRandomSampler`. - Added `torchnlp.samplers.deterministic_sampler` for deterministic sampling based on `torchnlp.random`. - Added `torchnlp.samplers.distributed_batch_sampler` for distributed batch sampling that's more extensible and less restrictive than PyTorch's version. - Added `torchnlp.samplers.oom_batch_sampler` to sample large batches first in order to force an out-of-memory error earlier rather than later into training. - Added `torchnlp.utils.get_total_parameters` to measure the number of parameters in a model. - Added `torchnlp.utils.get_tensors` to measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and for `torchnlp.samplers.oom_batch_sampler`. python from torchnlp.utils import get_tensors
randomobject = tuple([{‘t’: torch.tensor([1, 2])}, torch.tensor([2, 3])]) tensors = gettensors(random_object) assert len(tensors) == 2 “`
submitted by /u/Deepblue129
[link] [comments]
I often see ML practitioners, and even experts, pose the idea of the “test set” as the ultimate benchmark of a model’s performance. This is nonsense – and I’ll explain why.
Suppose you gather some data, label it, preprocess it, and compile it into a dataset. Now it’s time to split your data – train, validation, test; how will you do it?
The ultimate goal of an ML model is generalization; as such, said ‘rationale’ could be:
Thus, “split equally” should work best. Onto the problem: why do we use a test set at all? Because – we “fit” the validation set with our hyperparameters, and we need to test on “never seen” data to avoid bias. Indeed, agreed – the test set does suppress said bias. But here’s its red line: variance.
Direct statistical theory: a sample is an approximation of the population distribution, with an uncertain mean, standard deviation, & other. The more complex the problem, the greater the variation. — So, is there a solution? Yes: K-Fold Cross-Validation. Per known theory, K-Fold CV can significantly slash variance of model performance – the higher the “K”, the better. Without it, classification error can easily differ by 5-15%, if not 20-30%. When deciding what’s “SOTA”, every single percentage point can be a battle hard-fought – so a “mere 5%” is already astronomical.
One may counter-argue, “it’s fine if the test set is large enough”. Except it’s not fine; you get a “large enough” test set by either sacrificing train data, or, dataset is large enough so that you can make an even validation-test split. Former’s undesirable for obvious reasons – and in latter, unless you have a gargantuan dataset (extremely rare), your test samples are still subject to significant-enough variance; merely swapping test & validation samples can flip tables.
As a final punchline, note that the random seed can also substantially impact final outcome, further amplifying variance. Consequence: you don’t know you did well because Dropout(0.5) works better than Dropout(0.2) or because dice rolled nicely. K-Fold CV will also reduce seed variance as a side-effect, but ideally (though often prohibitively) you’d do “K seeds”.
Verdict: test set isn’t good for testing. Instead, use K-fold CV, which both better estimates generalization performance by reducing variance, and allows using more train data.
Though I am knowledgeable on the topic, I’m not an “expert” – and even experts disagree. Thus, counterarguments welcome.
submitted by /u/OverLordGoldDragon
[link] [comments]
I am working on a problem where I have a sequence of events happening, every event generate a set of tokens (some of the tokens are shared between the events, but not all), the task is to categorize the behavior that generated this set of events.
Let me give you a simple example to have an understanding on the input.
| event_type | order | value_type_1 | value_1 | value_type_2 | value_2 |
|---|---|---|---|---|---|
| E1 | 1 | alpha 1 | 24 | alpha 2 | 33 |
| E2 | 2 | beta | 120 | ||
| E1 | 3 | alpha 1 | 234 | alpha 2 | 56 |
| E3 | 4 | theta | 150 | ||
| E4 | 5 |
You can notice for example that the token “theta” doesn’t exist in event_type E2, it only exist in some event types.
If I want to do feature engineering in this case, what is the best way to vectorize my data. If I take the token, and try to put this way, I will end up with a very sparse features.
| event_type | order | alpha 1 | alpha 2 | beta | theta |
|---|---|---|---|---|---|
| E1 | 1 | 24 | 33 | ||
| E2 | 2 | 120 | |||
| E1 | 3 | 234 | 56 | ||
| E3 | 4 | 150 | |||
| E4 | 5 |
If I construct my features this way, it will be very sparse and it doesn’t make sense to consider it as missing data (because the data doesn’t exist in first place).
I don’t want to apply data imputation method such filling the last value (You can see below the example, I have added the number in bold to show it as an example) . The reason is that some event type are very frequent, and some event types are not.
| event_type | order | alpha 1 | alpha 2 | beta | theta |
|---|---|---|---|---|---|
| E1 | 1 | 24 | 33 | 0 | 0 |
| E2 | 2 | 24 | 33 | 120 | 0 |
| E1 | 3 | 234 | 56 | 120 | 0 |
| E3 | 4 | 234 | 56 | 120 | 150 |
| E4 | 5 | 234 | 56 | 120 | 150 |
If you were in my shoes, how would you treat this problem?. Ideas, references are welcomed.
If you are wondering what do I want to do, I want to categorize the behavior that generated this set of events. I can experiment with any method if I get feature engineering right (you can think of clustering as an example).
submitted by /u/__Julia
[link] [comments]
This is a place to share machine learning research papers, journals, and articles that you’re reading this week. If it relates to what you’re researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you’ve read.
Please try to provide some insight from your understanding and please don’t post things which are present in wiki.
Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links.
Previous weeks :
Most upvoted papers two weeks ago:
/u/ecart33: https://arxiv.org/abs/1906.00817v1
Besides that, there are no rules, have fun.
submitted by /u/ML_WAYR_bot
[link] [comments]
MLPerf, a project to benchmark machine learning hardware, is publishing their first round of Inference results this Wednesday.
Take some time to review the precise challenge they’re putting the hardware to: https://mlperf.org/inference-overview/ and the general rules for Inference submissions: mlperf/inference_policies: inference_rules.adoc
I’m excited to see some of the low-power chip results.
Source for date: #single-submission-round-schedule – Submission for this cycle was October 11th so therefore Week 1 Monday is October 14th, and Week 4 Wednesday (publication day) is November 6th, 10AM US/Pacific time.
submitted by /u/riking27
[link] [comments]
David Silver hinted that DeepMind is done with Starcraft in a BBC news article saying “the lab may rest now” and that they have “completed the Starcraft challenge”.
I thought this was a little disappointing since the skill level Alphastar reached on ladder was not enough to beat professional players. I think we all wanted a real nice showdown between the human champion and the robot, right? That’d been pretty cool.
The Nature paper had a nice graph depicting Alphastar’s MMR which is basically Blizzard’s version of elo rating. The Protoss agent had reached an MMR of ~6200 and the aggregate of all three races was 6030 iirc. The graph also had MMR’s of Alphastar’s opponents and information on whether the agent won or lost.
Basically Alphastar had lost all but 2 games against players who had higher than 6200 MMR. On ladder, it could not beat the professionals.
The agent from January was estimated to have been over 7000 MMR. I figured it’d be nice to estimate how well this newest agent would have fared against Mana. Right now, MaNa’s MMR is ~6700.
So I looked at the EU ladder, found someone with an MMR of ~6200, popped him and MaNa into Aligulac (sc2 database) and let it estimate some odds. MaNa had ~75% chance of winning a Best of 5, and his 6200 MMR opponent had less than 1% chance of beating MaNa 5-0.
At this point I became convinced that DeepMind was throwing in the towel on sc2 because the cost of further improving Alphastar was too high to justify the publicity they were getting from the project. The team looked to be moving on to different things and the showmatch vs the world champion had been cancelled.
But then something absolutely baffling happened which I don’t think anyone saw coming.
Blizzcon was this weekend. With little to no fanfare DeepMind had brought Alphastar with them and let Blizzcon visitors play against it. Serral, one of the best players in the world, had just finished top 4 in the biggest tournament of the year wandered to the arcade and played a few games against the bot. Serral’s MMR is over 7000.
He lost 0-3 to the Protoss agent. These games were not televised. All we have is some blurry smartphone footage. https://mobile.twitter.com/LiquidTLO/status/1190779241564000256
I don’t get it. If Alphastar was this strong why didn’t DeepMind let it play more on ladder and get a higher ranking? Why didn’t they organize a showmatch or something? They dropped the ball pretty hard on this one. This is so confusing to me.
First they beat two professional players but were hit with a huge, imo warranted backlash due to the APM controversy.
Then they produced agents under more proper mechanical limitations and the agents turned out to be much weaker than the previous version.
Finally, they beat the best player In the world, seemingly accidentally while no one was looking.
From PR standpoint, could this have gone any worse for Deepmind?
submitted by /u/SoulDrivenOlives
[link] [comments]
So we had a poject review, and our teacher asked us on what basis anchors are chosen in YOLO, Faster R-CNN and the lot.
Now I have no idea one what criterion is it based, so if anyone has something to say on this, please do. I would appreciate it!
submitted by /u/kirasama16997
[link] [comments]
Hello everyone,
I have been trying yo reproduce the results of a SOTA paper regarding object detection. I have reimplemented their method and trained on the same dataset, based on the paper, however I was not able to achieve their results on the datasets they use for evaluation, no matter what I have tried.
Then I also studied their referenced papers and realised that many of them use a train-test split strategy for evaluating their models. This means that they use a part of the evaluation dataset for finetuning their already trained model and then evaluate it on the testing part of the same dataset. In the case of these papers, this fact was explicitly mentioned. I think that this also happened in the paper I tried to reproduce. However, they don’t mention it.
My question for discussion is, what do you think about this strategy? Is finetuning on part of the evaluation dataset a way to go? What about generalisation on totally unknown data? In my opinion it is ok if explicitly mentioned. Totally uncool in the opposite case, though.
EDIT: Just a clarification to be on the same page. What I mean by train, test and validation sets is a big dataset which is split in those three subsets.
By evaluation dataset I mean a benchmark dataset which researchers use to report their results on a specific task. So, finetuning on part of the evaluation dataset is about retraining on a part of the benchmark dataset and later report the results on the rest of it, that was not seen during finetuning.
submitted by /u/roset_ta
[link] [comments]