Category: Reddit MachineLearning

[D] Can I use tf.data to calculate new features as part of a pipeline, or should this be done before using the tf.data module?

Written on August 5, 2019. Posted in Reddit MachineLearning.

I am just curious about how much of the data processing process I can refactor into a tf.data pipeline for inputting my data into my model. My source data is used to calculate different features to create a dataset, and then this dataset is processed further for inputting into my models. So the process is basically like this:

Source Data (structured JSON which just has text fields for data parsed from a raw document) —>
Dataset (this fields are used to calculate numerical features, categorical features, and sequence features) —>
Processed Dataset (standard techniques – scaling, encoding, tokenization, padding, etc.)

And then I have my input data for the model. I am wondering whether I can refactor this entire process into a tf.data pipeline, or will the tf.data pipeline only handle the processing done in the second step described above? I am using TF 2.0 Beta by the way.

Any insights or help will be greatly appreciated.

submitted by /u/that_one_ai_nerd
[link] [comments]

[D] How to compute Ablation Study Neural Networks p-value?

Written on August 4, 2019. Posted in Reddit MachineLearning.

Let me just first say that my knowledge of statistics is shit.

I want to conduct an ablation study of a deep neural network, whose task is on regression.

I want to compare the performance (MSE) of the neural network vs the neural network without a few layers. I see in literature that people often report if results achieved are less than a p-value, say 0.05 or something.

How would I measure this p-value? Would I be able to measure the p-value simply given the 2 MSEs of the 2 models?

submitted by /u/temporal_templar
[link] [comments]

[D] Company HireVue provides “AI” for early-stage interview screening

Written on August 4, 2019. Posted in Reddit MachineLearning.

There’s been some (pretty universally negative) reaction on Twitter to a video put out by HireVue promoting their interview-related products, some of which use (unspecified) “AI”. They have candidates answer employer-specified questions to camera, and they claim to evaluate things like whether the candidate is enthusiastic, or making enough eye contact (with the camera lens, I guess).

I thought people might like to discuss here. The video and Twitter reactions are here: https://twitter.com/alvinfoo/status/1157793758806716417?s=19

Issues with this include baking existing hiring biases into poorly understood black boxes, that it’s disrespectful to candidates, the best of which will refuse to be interviewed in this way, and that it’s likely to give weird predictions when presented with anyone not well represented by the training set.

What other problems do people see with this? Is there any use of ML in the hiring process that you wouldn’t object to?

submitted by /u/grey–area
[link] [comments]

[P] PyTorch Implementation of Semantic Segmentation models

Written on August 4, 2019. Posted in Reddit MachineLearning.

Nothing fancy, but to get a handle of semantic segmentation methods, I re-implemented some well known models with a clear structured code (following this PyTorch template), in particularly:

The implemented models are: Deeplab V3+ – GCN – PSPnet – Unet – Segnet and FCN
Supported datasets: Pascal Voc, Cityscapes, ADE20K, COCO stuff,
Losses: Dice-Loss, CE Dice loss, Focal Loss and Lovasz Softmax,

with various data augmentations and learning rate schedulers (poly learning rate and one cycle).

I though I share this implementation in case anyone might be interested, and here it is :

Github: https://github.com/yassouali/pytorch_segmentation

submitted by /u/youali
[link] [comments]

[D] Should beginner’s tutorials be banned?

Written on August 4, 2019. Posted in Reddit MachineLearning.

This sub is full of them. They rise to the top for some bizarre reason and reaffirm that this subs focus is on helping people start off learning about a narrow set (neural networks / deep learning) of machine learning.

Allowing this content to be so prevalent drives the sub further from discussion of research and more into a place where spam links reside.

Furthermore, a lot of these beginners tutorials are written by beginners themselves. They contain mistakes, which upon being read by other beginners cloud their understanding and slow their learning.

Can we ban this type of content and push it to /r/learnmachinelearning or something?

submitted by /u/NicolasGuacamole
[link] [comments]

[Project] GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Written on August 4, 2019. Posted in Reddit MachineLearning.

We just released a general and high-performance graph embedding system, GraphVite.

Compared to existing machine learning systems that are mainly designed for data with regular structures (e.g., images, speech, and natural language), GraphVite is specifically designed for large-scale graphs. It runs on the CPU-GPU hybrid architectures and scales linearly to the number of GPUs. The system is one or two magnitudes faster than existing implementations. For example, for a graph with one million nodes, it only takes around one minute to learn the node representations with 4 GPUs. Besides the superior efficiency, GraphVite also supports a variety of applications and models, including

Node Embedding: DeepWalk, LINE, node2vec
Knowledge Graph Embedding: TransE, DistMult, ComplEx, SimplE, RotatE
Graph and High-dimensional Data Visualization: LargeVis

There are already more than 30 configurations and benchmarks on standard datasets. We are actively developing new applications and models. The system is expected to support the community of graph embedding or in general, deep learning for graphs.

Paper: https://arxiv.org/abs/1903.00757
Website: https://graphvite.io/
GitHub: https://github.com/DeepGraphLearning/graphvite

submitted by /u/kiddozhu
[link] [comments]

[P] Does anyone know where I could get a dataset for astronomical spectroscopy?

Written on August 4, 2019. Posted in Reddit MachineLearning.

Currently starting a project where I apply ML to find the different characteristics of celestial bodies based on spectrum data. I would appreciate if anyone would tell me where I can find such datasets. Thanks

submitted by /u/Keeeper-1
[link] [comments]

[D] Explaining Feedforward, Backpropagation and Optimization: The Math Explained Clearly with Visualizations. I took the time to write this long article (>5k words), and I hope it helps someone understand neural networks better.

Written on August 4, 2019. Posted in Reddit MachineLearning.

I have been studying Machine Learning in the last few months, and I wanted to really get to understand everything that goes on in a basic neural network (excluding the many architectures). Therefore, I took the time to write this long article, to explain what I have learned. In particular, the post on purpose very extensive and goes into the smaller details; this is to have everything in one place. As the site says, it is machine learning from scratch, and I share what I have learned.

The particular reason for posting here, is that I hope someone else could learn from this. The goal is to share the knowledge in the easiest absorbable way possible. I tried to visualize much of the process going on in neural networks, but I also went through the math, to the detail of the partial derivatives.

This was quite a journey, and it took about 1 month to read all the things I have read, and write it down, have it make sense and creating the graphics.

Regardless, here is the link. Any constructive feedback is appreciated.

https://mlfromscratch.com/neural-networks-explained/

submitted by /u/permalip
[link] [comments]

[N] Flatland Challenge – Multi-Agent Reinforcement Learning for Transportation Systems

Written on August 4, 2019. Posted in Reddit MachineLearning.

Hi all

We launched the Flatland Challenge, which is an official challenge of the Applied Machine Learning Days.

Flatland: Multi-Agent Reinforcement Learning Challenge

The Flatland Challenge is a competition to foster progress in multi-agent reinforcement learning for real world applications. The re-scheduling problem (RSP), which has traditionally been approached by operations research, serves as an excellent challenge to investigate the possibilies of deep learning for planning in stochastic environments. Different rounds with increasing difficulty and the presence of stochasticity in the environment encourage participants to look beyond classical planning algorithms and come up with solutions for the transport management systems of the future.

The Challenge

The challenge requires your creativity and savviness. In 2 submission rounds with increasing difficulty, you can prove that you have what it takes. We invite you to enter the race with your unique solution and to win great prizes – at the same time solving one of the key challenges in the world of transportation!

In contrast to most reinforcement learning challenges the focus of this challenge is not solely on the submission of great algorithms as controllers. We encourage the participants to come up with novel observation spaces for this challenge and share them with the community (community prize awarded) to improve performance on this task.

Real world applications

The Swiss Federal Railways (SBB) operate the densest mixed railway traffic in the world. SBB maintain and operate the biggest railway infrastructure in Switzerland. Today, there are more than 10,000 trains running each day, being routed over 13,000 switches and controlled by more than 32,000 signals. Each day 1.2 million passengers and almost half of Switzerland’s volume of transported goods are transported on this railway network. Due to the growing demand for mobility, SBB needs to increase the transportation capacity of the network by approximately 30% in the future.

The increase in transport capacity can be achieved through different measures, such as denser train schedules, investments in new infrastructure, and/or investments in new rolling stock. However, SBB currently lack suitable technologies and tools to quantitatively assess these different measures.

The SBB are therefore looking for novel approaches that can help revolutionize the transportation system of the future.

Prizes

Your problem solutions mean something to us – hence prizes with a total value of 30k CHF (approx. 30k USD) are reserved for those with the best submissions. You can excel in two categories: The best solution category and the community prize category. Within both those categories your submission is individually ranked taking into account your performance in Round 1 and Round 2. Make sure to check the participation rules before you start. Only submissions conforming to our rules have a chance of winning the prizes.

Best Solution Prize: Won by the participants with the best performing submission on our test set. Both of your rankings from the Round 1 and Round 2 are taken into account. Check the leader board on this site regularly for the latest information on your ranking.

The top three submissions in this category will be awarded the following cash prizes (in Swiss Francs):

CHF 7’500.- (~USD 7’500) for first prize

CHF 5’000.- (~USD 5’000) for second prize

CHF 2’500.- (~USD 2’500) for third prize

Community Contributions Prize: Awarded to the person/group who makes the biggest contribution to the community – done through generating new observations and sharing them with the community.

The top submission in this category will be awarded the following cash prize (in Swiss Francs): CHF 5’000.- (~USD 5’000)

In addition, we will hand-pick and award up to five (5) travel grants to the Applied Machine Learning Days 2019 in Lausanne, Switzerland. Participants with promising solutions may be invited to present their solutions at SBB in Bern, Switzerland.

Note: It is possible for a participant to win in both categories

Participate

Are you up for the challenge? More information about the Flatland Challenge can be found here.

Contribute

Want to help improve and build upon Flatland?

Head over to our gitlab repo to see how you can contribute shaping this environment.

Contact

For Challenge-related questions (technical and/or content questions):

Gitter Channel : https://gitter.im/AIcrowd-HQ/flatland-rl
Technical Issues : Please use the issue tracker in the public repository
Discussion Forum : https://discourse.aicrowd.com/

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. But in case look for a direct communication channel, feel free to reach out to us at :

mohanty [at] aicrowd.com
erik.nygren [at] sbb.ch

For press inquiries Please contact SBB Media Relations at press@sbb.ch

submitted by /u/ML_Erik
[link] [comments]

[P] Building NSFW Image Detector

Written on August 4, 2019. Posted in Reddit MachineLearning.

I have currently started building a nsfw image filter. I currently have a dataset of 120k nsfw images.

The main issue I am having is with the anomaly detection part. All of the images are so different, I can’t see how I can easily build a system to find if an image is nsfw or not.

Any advice or notes is greatly appreciated. Thanks.

submitted by /u/jweir136
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[D] Can I use tf.data to calculate new features as part of a pipeline, or should this be done before using the tf.data module?

[D] How to compute Ablation Study Neural Networks p-value?

[D] Company HireVue provides “AI” for early-stage interview screening

[P] PyTorch Implementation of Semantic Segmentation models

[D] Should beginner’s tutorials be banned?

[Project] GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

[P] Does anyone know where I could get a dataset for astronomical spectroscopy?

[D] Explaining Feedforward, Backpropagation and Optimization: The Math Explained Clearly with Visualizations. I took the time to write this long article (>5k words), and I hope it helps someone understand neural networks better.

[N] Flatland Challenge – Multi-Agent Reinforcement Learning for Transportation Systems

Flatland: Multi-Agent Reinforcement Learning Challenge

The Challenge

Real world applications

Prizes

Participate

Contribute

Contact

[P] Building NSFW Image Detector