Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Tensorflow GPU memory management (TF_FORCE_GPU_ALLOW_GROWTH)

So this is more of an exploratory question. I am deploying models using a TF serving docker image with the flag TF_FORCE_GPU_ALLOW_GROWTH. I am deploying a small fashion mnist model, resnet (99MB), and inception v3(92MB) models. Because of the flag, the tf model server initially occupies only ~300 MB approx, then on sequential requests to the models it increases as follows (according to nvidia-smi):

~300 MB | after inception request ~4306MB | after resnet request ~ 8402 MB

if I send a request to resnet first, the GPU usage does not increase at all (Even when I add more models):

~300 MB | after resnet request ~7888MB | after inception request ~ 7888 MB

Why does the GPU usage not increase after adding more models? Are they flushed from memory when new models are loaded for inference? How can I accurately estimate how many similar sized models can be loaded on one GPU enabled machine without the trial and error method? Is there a pattern to what fraction of GPU memory is progressively allocated?

Note: This is run on an EC2 instance with available GPU memory 11441MiB [ Tesla K80 ] when I trey to run the same on a machine with lower capacity [Quadro P2000 – 5059 MB], I face a similar situation where there is no increase in memory usage. However, I also get the following in the logs:

2019-12-11 05:10:54.727985: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-12-11 05:10:54.736610: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available 

submitted by /u/annoyed_panda
[link] [comments]

[D] Experiment Management in Kubernetes

I’ve yet to witness a solution that containerizes my laptop environment and replicates my experiment in the cloud. Suppose I run a jupyter experiment on my laptop. Now with a modicum of fiddling I want to fire off 10 instances of the same experiment in AWS with different learning rates (and have my results neatly collated, perhaps in git branches).

I acknowledge the efforts of MLflow, Kubeflow, Comet.ml, deepkit, guide.ai, dvc, sacred, speedrun, and trains but none of these at first glance addresses basic replication in the cloud. Moreover several of them like Kubeflow are prohibitively complex for academics on a budget.

Requesting comments and/or sympathies.

submitted by /u/fragglestickcar0
[link] [comments]

[Discussion] My boss is convinced you can do a SVM using ASCII integer codes as features

Where do I even begin this rant?

I am a machine learning intern. We have a labelling problem in which we want to classify strings into category “Something” and category “Not Something”. These are not sentences so we can’t use any standard NLP library. My boss is convinced we should turn these strings into ASCII codes, in order to make them “non categorical”, with each feature being the ASCII code for the character in question.

I tried to gently assert that even though they’re numbers, that doesn’t mean that they’re quantitative data – is the average of B and D, C? (He answered yes to that, btw.).

I told him if the word ‘apple’ appears in the beginning of the string and in the other row appears in the end of the string, it won’t be put in the same cluster necessarily. He says the SVM will pick up the pattern – say you have for features 0, 1 and 2 the values 65, 112 and 112 and in another row for features 10, 11 and 1 the same values, the SVM will “detect the pattern” and put them closer together. “That’s not how support vector machines work.” “Oh really, how many have you done?”

I ran it anyway – it gives results with 98% accuracy because in this case “Something” and “Not Something” tend to have radically different lengths. To show him it doesn’t detect patterns, I put a bunch of zeros behind the string and it obviously did not correctly recognise the label. He says that doesn’t prove anything, it’s just a “vulnerability”.

I am at a loss here. Does anyone have a source I can share with him? Or an alternative way of solving my problem?

submitted by /u/babyscully
[link] [comments]

[N] Dragonfire v1.1.0: DeepPavlov SQuAD BERT Integration as ODQA

We are happy to announce the Dragonfire v1.0.0 which introduces significant improvements on Open-Domain Question Answering feature.

In the two weeks period of active development and refactoring we have immensely improved the code quality in project-wide. We have added tons of CI/CD pipelines using GitHub Actions. Dragonfire may have the most GitHub Actions usage amongst the open-source projects in GitHub right now. Check out this 2 hour long workflow or this automatic Debian package publisher for example.

We really do care about ODQA feature of Dragonfire and we are planning to further improve the performance. So please check out the new release of Dragonfire and share our excitement. PRs are highly welcomed…

submitted by /u/mertyildiran
[link] [comments]

[D] What do you think were the most important open source libraries for ML to come out this year?

2019 has been yet another properous year for the open source world, adding several new toys to our collection such as streamlit, detectron2, transformers and metaflow.

We recently compiled our own top Python libraries of 2019 list including many ML (and other useful tools for ML) and would love to know your opinions.

Did we miss any big releases this year? Which ones do you think are more likely to have a lasting impact in the community?

submitted by /u/tryo_labs
[link] [comments]

[D] Things to predict from human skeleton/posture data

I’m in charge of designing a new lab for a course we teach at our university. Students will attend multiple sessions in which they will learn applied data science / machine learning. My question is what would be a fun thing to predict from videos of humans?

In the first session, they will collect video data using some RealSense RGBD depth cameras we have in the lab; we will record people, but the details aren’t set yet. In the second session, they will work on labeling and cleaning the data they collected, prepare it, and work on loading it into different ML frameworks (current plan is scikit-learn and TF). We will then aggregate all that data into a decently sized dataset with ground truth. In the third session, students should use that dataset, build a model, and complete a small assignment.

Over the years, we should get a substantial amount of examples (the course is experiencing exponential growth at the moment), which would make this big enough to train deep networks on it. I think that would be a fun lab for a university student.

The current idea is to give people a personality test (big5) before recording the video, and then predict traits like extroversion from people’s posture.

I’m not 100% sold on this though. So if anybody has suggestions, I’d love to hear them!

submitted by /u/FirefoxMetzger
[link] [comments]

[R] horse breed misclassification

I have heard from my professor that a current research article about classifying images of horse breeds by means of CNN got debunked. The reason was that the input image data had their corresponding labels on the images themselves so the classifier learned to achieve it’s high accuracy looking at the correct labels instead of the actual horses. Apparently nobody bothered to have a closer look on the input files. Unfortunately, I cannot find the article or related ones online. Would anybody else know where to find it? Thanks in advance!

submitted by /u/ccwpog
[link] [comments]

[D] Open Exposition Problems in Machine Learning

In his paper “A Beginner’s Guide to Forcing,” Timothy Chow introduced the idea of an “open exposition problem,” which is a concept that has not yet been explained in a totally clear way. The online journal Distill is trying to tackle open exposition problems in machine learning, which I feel is really important.

So what do you guys think still isn’t explained well in ML? What topics confuse you or your students?

Timothy Chow’s paper: http://timothychow.net/forcing.pdf

Distill: https://distill.pub/

submitted by /u/turing_machines
[link] [comments]