Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Useful tools to help visualize matching data across multiple files or tables?

I’m in the process of trying to get a handle on some datasets. I know there are identical entries spread across several files, but I’d like to find a way to visualize those connections, either in a map or even just a table.

My immediate task just has three smallish CSV’s so I could easily write an R script to pull out the matches, but I’d prefer to more visual tool that can operate across larger bodies of data.

I remember seeing a Defcon presentation where a similar tool was described for matching metadata, so I’m going back through old videos to try and find that, but I’m hoping someone here might know some good suggestions.

Thanks!

submitted by /u/QuerulousPanda
[link] [comments]

[D] Ideas and advice how to improve accuracy score using Random Forest and Extra Trees classifier.

My project is classification of ultrasound 2D images, the size of the full data set is approximately 1000 images. For this analysis 250 features were handcrafted by calculating different parameters of the whole images, or horizontal slices of the images. For features selection Kbest with chi2 is used to select the best 50 features. To calculate balanced accuracy I am using sklearn.model_selection.cross_val_score, and Random Forest and Extra Trees (1000 trees). What confuses me is that when I split the data with train_test_split randomly with 9:1 ratio, and use cross_val_score only on 90% of the data the highest accuracy score is 80% with random forest, and 85% with extra trees. But when I don’t apply train_test_split and calculate balanced accuracy score on the full data set, the highest score is not higher than 60%. I expected to get better results when I included more data, but opposite happened. I would appreciate any advice or idea, how to improve the accuracy score.

submitted by /u/glitchdot2
[link] [comments]

[P] Quantum optical neural networks

Nanophotonic neural networks are an exciting emerging technology which promises low-energy, ultra high-throughput machine learning systems implemented purely optically. Our lab has previously done work on these devices, and our new paper which extends programmable photonics to the quantum domain is now on arXiv!

In this paper, we describe a photonic architecture for a quantum programmable gate array (QPGA) which can be dynamically reprogrammed to perform any quantum computation. We show how to exactly prepare arbitrary quantum states and operators on the device, and we apply machine learning techniques to automatically implement highly compact approximations to important quantum circuits.

Below is an animation of a simulated QPGA being trained to implement a quantum Fourier transform on five qubits. Supplementary materials and the TensorFlow code for the quantum circuit optimization section of the paper can be found in the GitHub repository for the paper.

Paper: arxiv.org/abs/1910.10141

GitHub repo: github.com/fancompute/qpga

Simulated QPGA learning to implement a 5-qubit quantum Fourier transform

submitted by /u/bencbartlett
[link] [comments]

[R]Research Guide: Image Quality Assessment for Deep Learning

The quality of images is relevant in building compression and image enhancement algorithms. Image Quality Assessment (IQA) is divided into two main areas; reference-based evaluation and no-reference evaluation.

In this guide, we’ll look at how deep learning has been used in image quality analysis.

https://heartbeat.fritz.ai/research-guide-image-quality-assessment-c4fdf247bf89

submitted by /u/mwitiderrick
[link] [comments]

[D] Feature Loss vs. GANs – what are the trade offs?

I’m doing a bit of reading on the speech enhancement problem, where you have an audio signal containing human speech plus some noise, and you want extract just the human speech. It’s pretty analogous to image denoising or “super-resolution”, and a lot of the techniques from the image domain are being borrowed and re-applied to audio quite successfully (eg. repurposing the U-Net architecture from image processing to spectrograms and then raw audio). It’s all pretty cool.

There’s some interesting work being done with loss functions this space and I’m looking for some clarification as to why you’d choose one approach over another. You want to compare a target image, or audio waveform, with a predicted sample, and you need to define a loss function which measures how “close” they are. The Related work – Loss functions (1.1.3) section of this paper gives a pretty good overview of the different approaches, which I’ll try to summarize here.

  • Mean squared error loss: A pretty standard regression loss as far as I know, but it’s limited to only considering one pixel at a time: “minimizing MSE encourages finding pixel-wise averages of plausible solutions which are typically overly-smooth and thus have poor perceptual quality”.
  • Feature loss: This is where you pre-train a network on a similar problem, such as image classification, and then you freeze the weights. For both the target and predicted sample, you run each through the classification network, then grab some internal activations from that network and call them “features”. You compute some distance between these feature vectors to get your loss. The key idea is that the classification network is able to capture important features that MSE loss cannot (more detail here).
  • GAN loss: A discriminator network trains in-tandem with the generator network, where the job of the discriminator is to classify whether its input is “real” or “generated”. Like the feature loss network, it can detect features that MSE loss cannot, but it can also punish identifiable quirks of the generator network, whereas feature loss can potentially be “hacked” by the generator network.

So my questions are:

  • Have I characterised these approaches well?
  • Why would you ever choose feature loss over using a discriminator network (ie. GAN)?
    • Discriminators can punish the generator for being predictably wrong (ie. common artifacts)
    • Pre-trained feature loss networks may better represent image features, if they have been trained for longer, on larger data sets
    • Apparently GANs can have stability issues when training
  • The SRGAN paper suggests using both feature loss and a GAN for their loss function – is this the best known approach?

submitted by /u/The_Amp_Walrus
[link] [comments]