Category: Reddit MachineLearning

[N] Python Creator Guido van Rossum Quits Dropbox, And Announced His Retirement

Written on November 6, 2019. Posted in Reddit MachineLearning.

https://preview.redd.it/oixo7lmzh6x31.png?width=793&format=png&auto=webp&s=148db18f71aadd6e77d6d01a1d24de28462c1c3b

The creator of one of the worlds most popular programming language Python, Mr. Guido van Rossum is all set to start the second half of his life, Rossum has announced his retirement this week. Guido is stepping down from his current role at cloud file storage firm Dropbox and heading into retirement.

Guido joins Dropbox in the month of December 2012, at Dropbox he spent his last six and half years. The hiring of Guido makes sense because so much of Dropbox functionality was built on Python.

Dropbox has about four million lines of Python code and it’s the most heavily used language for its back-end services, desktop app and in other major operations. When Guido van Rossum started in 2012, Dropbox’s server and desktop client were written: “almost exclusively in Python”.

Continue Reading

submitted by /u/navin49
[link] [comments]

[R] Announcing the release of StellarGraph version 0.8.1 open-source Python Machine Learning Library for graphs

Written on November 6, 2019. Posted in Reddit MachineLearning.

StellarGraph is an open-source library implementing a variety of state-of-the-art graph machine learning algorithms. The project is delivered as part of CSIRO’s Data61.

We are happy to announce the 0.8.1 release of the library, which extends StellarGraph capability by adding new algorithms and demos, enhancing interpretability via saliency maps for Graph Attention (GAT), and further simplifying graph machine learning workflows through standardised model APIs and arguments.

This release, we’ve dealt with some bugs from the previous release and introduced new features and enhancements. Some of these include:

New directed GraphSAGE algorithm (a generalisation of GraphSAGE to directed graphs)
New Attri2vec algorithm
New PPNP and APPNP algorithms
New Graph Attention (GAT) saliency maps for interpreting node classification with Graph Attention Networks
Added directed SampledBFS walks on directed graphs
Unified API of GCN, GAT, GraphSAGE, and HinSAGE classes by adding build() method to GCN and GAT classes
Enhanced unsupervised GraphSage speed up via multithreading
Support of sparse generators in the GCN saliency map implementation.
Unified activations and regularisation for GraphSAGE, HinSAGE, GCN and GAT
Changed from using keras to tensorflow.keras

We’ve also added new demos using real-world datasets to show how StellarGraph can solve these tasks.

Access the StellarGraph project and explore the new features on GitHub. StellarGraph is a Python 3 library.

We welcome your feedback and contributions.

With thanks, the StellarGraph team.

submitted by /u/StellarGraphLibrary
[link] [comments]

[D] What are good heuristics when choosing classes for image classification?

Written on November 5, 2019. Posted in Reddit MachineLearning.

For example let’s say I want to classify eggs. And eggs in images can often be seen as just an egg, eggs in an egg cartoon and the carton can be closed or opened.

A naive approach would be to put all these image under the class eggs. But it might work better if there are 2 classes one for eggs and one for eggs in carton so training should be easier since these can look quite different since a group of eggs looks much different from a closed carton of eggs. I also feel like separating the classes can have unwanted outcomes like separating contexts. For example eggs withing cartons could rely on context of being in a kitchen and grocery store so it may less accurately predict an image has eggs if it is a carton of eggs in a farm.

Is my thinking correct on this?

What has been your experience with similar situations?

This question specifically focuses on image classification using neural nets.

submitted by /u/Kerlin_Michel
[link] [comments]

[P] New $10,000 ML Challenge: Mapping Disaster Risk from Aerial Imagery

Written on November 5, 2019. Posted in Reddit MachineLearning.

https://www.drivendata.org/competitions/58/disaster-response-roof-type/

Excited to launch a new machine learning competition! The goal is to be able to a better job creating disaster response plans based on detailed maps of communities. In order to do this, we need to understand the risk to structures, which we can do by understanding what kind of roof a building has.

Come use your machine learning skills for a good cause! Plus it’s got interesting geo data, novel imagery, and the opportunity to develop new methods.

submitted by /u/dat-um
[link] [comments]

[R]GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification

Written on November 5, 2019. Posted in Reddit MachineLearning.

submitted by /u/mjangle1985
[link] [comments]

[D] Andrew Ng’s thoughts on ‘robustness’ – looking for relevant resources

Written on November 5, 2019. Posted in Reddit MachineLearning.

For those of you unfamiliar, Andrew Ng runs a weekly newsletter where he shares thoughts and new developments in deep learning. It’s called ‘The Batch’. I was very interested in something he said in today’s newsletter (which can be read here), in which he talks about how deep learning systems still fail in many real scenarios because they are not yet robust to changes in data quality/distributions

One of the challenges of robustness is that it is hard to study systematically. How do we benchmark how well an algorithm trained on one distribution performs on a different distribution? Performance on brand-new data seems to involve a huge component of luck. That’s why the amount of academic work on robustness is significantly smaller than its practical importance. Better benchmarks will help drive academic research.

I am looking for more resources that study this type of robustness systematically. Is anyone aware of any key works on this topic? For example looking at how real datasets and corresponding performance vary from train/test datasets a model is developed on?

Thanks!

submitted by /u/deep-yearning
[link] [comments]

[D] Using UMAP for clustering

Written on November 5, 2019. Posted in Reddit MachineLearning.

UMAP (Uniform Manifold Approximation and Projection) is a brand new dimension reduction technique, however it has been used already in the paper https://arxiv.org/abs/1908.05968 as a part of clustering pipeline. It gives a really nice results, however I’m not quite convinced of its correctnes. My concerns are similar to those regarding TSNE for clustering (nice stackoverflow discussion here: https://stats.stackexchange.com/questions/263539/clustering-on-the-output-of-t-sne). The UMAP lib for python also touches this issue: https://umap-learn.readthedocs.io/en/latest/clustering.html.

What are your thoughts?

submitted by /u/Andrejkarp
[link] [comments]

[D] List of DL topics with resources for a quick brief, especially before interviews

Written on November 5, 2019. Posted in Reddit MachineLearning.

Vision and Language Group, a deep learning group at IIT Roorkee, has made a list of topics of DL with resources which one should be familiar with, and that could come in handy before interviews for briefing up.

https://github.com/vlgiitr/DL_Topics

Feel free to contribute any amazing resources that have been useful for a quick prep before your interviews, and star the repo if it is helpful to you!

submitted by /u/dakshit97
[link] [comments]

[P] Filtering data in a Pyspark Pipeline without losing all the data?

Written on November 5, 2019. Posted in Reddit MachineLearning.

I have a project where I’m feeding a dataframe into a PipelineModel with two pretrained models inside. The flow goes something like this:

Input DF -> Preprocessing Transformers -> Model1 -> Model2 -> Output DF

The thing is, Model1 and Model2 predict on different values (e.g. Male vs Female). I tried using the SQLTransformer to filter the data on each type, but I drop everything, so the output of Model1 throws away all the data I need to predict in Model2.

Is there a way to filter data to be fed into Model1, then filter data to be fed into Model2, and then concatenate the dataframes to be returned?

Please let me know if I can clarify anything!

submitted by /u/Octosaurus
[link] [comments]

[P] DepthAI hardware: RGBd, Myriad X VPU, Object-Tracking, Neural Network Accelerators for Raspberry Pi

Written on November 5, 2019. Posted in Reddit MachineLearning.

We wanted to share with you all about some embedded and low-cost hardware we’ve been working on that combines disparity depth and AI via Intel’s Myriad X VPU. We’ve developed a SoM that’s not much bigger than a US quarter which takes direct image inputs from 3 cameras (2x OV9282, 1x IMX378), processes it, and spits the result back to the host via USB3.1.

We wanted disparity + AI so we could get object localization outputs – an understanding of where and what objects are in our field of view, and we wanted this done fast, with as little latency as possible. Oh, and at the edge. And for low power. Our ultimate goal is actually to develop a rear-facing AI vision system that will alert cyclists of potential danger from distracted drivers. An ADAS for bikes!

There are some Myriad X solutions on the market already, but most use PCIe, so the data pipeline isn’t as direct as Sensor–>Myriad–>Host, and the existing solutions also don’t offer a three camera solution for RGBd. So, we built it!

Hope the shameless plug is OK here (sorry mods!), and if anyone has any questions or comments, we’d love to hear it!

cnx-software article

hackster.io article

crowdsupply

hackaday https://hackaday.io/project/163679-luxonis-depthai

submitted by /u/Luxonis-Brian
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[N] Python Creator Guido van Rossum Quits Dropbox, And Announced His Retirement

[R] Announcing the release of StellarGraph version 0.8.1 open-source Python Machine Learning Library for graphs

[D] What are good heuristics when choosing classes for image classification?

[P] New $10,000 ML Challenge: Mapping Disaster Risk from Aerial Imagery

[R]GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification

[D] Andrew Ng’s thoughts on ‘robustness’ – looking for relevant resources

[D] Using UMAP for clustering

[D] List of DL topics with resources for a quick brief, especially before interviews

[P] Filtering data in a Pyspark Pipeline without losing all the data?

[P] DepthAI hardware: RGBd, Myriad X VPU, Object-Tracking, Neural Network Accelerators for Raspberry Pi