Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Symmetry-equivalent representations

I’m training regression model, learning the mapping from integer-valued vectors to a single real-valued property. All cyclic permutations of my feature vectors are equivalent, that is they have the same y. I’m a bit lost in trying encode this.

One idea I had was to augment the dataset by generating all the cyclic permutations, but I don’t think this is a good way to go at all. I’ve stumbled on strategies to encode cyclic features such as months by mapping them to a periodic function, but in my case this wouldn’t work as the elements of my vector have a different meaning.

submitted by /u/throwervek
[link] [comments]

[Discussion] Is Sagemaker just a glorified EC2 instance?

I’m data scientist with a lot of model and math knowledge, and experience with mostly on-prem tools and some GCP. I’m trying to pick up more cloud skills. As I’m experimenting more with Sagemaker, I can figure out how it is more than just an EC2 instance with the right libraries installed. Is there anything more to it? What am I missing?

submitted by /u/AlexSnakeKing
[link] [comments]

[D] Which SOTA authorship attribution / text classification model to use?

I’m currently doing research for my thesis project, and was wondering which models to experiment with. I have a large dataset of political speeches (around 180.000) annotated with the respective party (10 parties total), and would like a model to learn to classify each party given the speeches.

My question is, which model is currently best for this type of task? I have some experience with Bi-LSTM models, and also CNN with LSTM – however I’m very interested if other models would perform better at this task, or if you any experience with the architecture of these type of models?

submitted by /u/mikkelmedm
[link] [comments]

Rapid large-scale fractional differencing to minimize memory loss while making a time series stationary. 6x-400x speed up over CPU implementation.

Happy to launch GFD: GPU-accelerated Fractional Differencing. A substantial 6x-400x speed-up for single GPU RAPIDS cuDF implementation over NumPy/Pandas CPU-implementation.

Feel free to play with the code on Google Colab, run it on GCP/AWS or your local machine with the entirely self-contained notebook.

Summary

Typically we attempt to achieve some form of stationarity via a transformation on our time series through common methods including integer differencing. However, integer differencing unnecessarily removes too much memory to achieve stationarity. An alternative, fractional differencing, allows us to achieve stationarity while maintaining the maximum amount of memory compared to integer differencing. While existing CPU-based implementations are inefficient for running fractional differencing on many large-scale time series, our GPU-based implementation enables rapid fractional differencing of up to 400x faster on a single machine.

Code

https://github.com/ritchieng/fractional_differencing_gpu

Presentation

https://www.researchgate.net/publication/335159299_GFD_GPU_Fractional_Differencing_for_Rapid_Large-scale_Stationarizing_of_Time_Series_Data_while_Minimizing_Memory_Loss

submitted by /u/ritchieng
[link] [comments]

[N] Trump falsely claims Google ‘manipulated’ millions of 2016 votes

https://www.cnn.com/2019/08/19/politics/trump-google-manipulated-votes-claim/index.html

The referenced article: https://aibrt.org/downloads/EPSTEIN_et_al_2017-SUMMARY-A_Method_for_Detecting_Bias_in_Search_Rankings-EMBARGOED_until_March_14_2017.pdf

Key point from the article referenced by CNN’s story: Was the bias the same for all search engines? No. The level of pro-Clinton bias we found on Google (0.19) was more than twice as high as the level of pro-Clinton bias we found on Yahoo (0.09).

Among other issues, one thing that CNN did not mention is the presumption that Google is wrong, Yahoo correct, given that there is no ground truth to compare to. Perhaps there were more pro-Clinton articles and news appearing those days. And more generally, I might guess that Yahoo’s and Google’s engines are simply different algorithms showing different things.

Before someone complains: yes, pagerank was considered “machine learning”, though not deep learning of course. Though it feels more like graph theory to me.

submitted by /u/errorsignal
[link] [comments]

[D] “Inverse Design” to create new optical chip components

I hope discussions of ML applications is OK in this sub. I came across this article recently about researchers in the field of photonics, which doesn’t have a lot of analytical equations to calculate performance by hand, using some basic ML techniques to create high performance components for photonic integrated circuits. They start with a black box, feed in the desired output performance, and then use basic electromagnetic boundary conditions and ML to work backward to what would be required to get there. They call this “inverse design”.

This paper goes into it a little more and shows an example of the result of the technique: https://arxiv.org/pdf/1504.00095.pdf

submitted by /u/gburdell
[link] [comments]