Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] NIPS vs. NeurIPS: guest post by Steven Pinker

From Scott Aaronson’s Shtetl-Optimized blog, an open email from Steve Pinker:

I appreciate your frank comments. At the same time, I do not agree with them. Please allow me to explain.

If this were a matter of sexual harassment or other hostile behavior toward women, I would of course support strong measures to combat it. Any member of the Symposium who uttered demeaning comments toward or about women certainly deserves censure.

But that is not what is at issue here. It’s an utterly irrelevant matter: the three-decades-old acronym for the Neural Information Processing Symposium, the pleasingly pronounceable NIPS. To state what should be obvious: nip is not a sexual word. As Chair of the Usage Panel of the American Heritage Dictionary, I can support this claim.

(And as my mother wrote to me: “I don’t get it. I thought Nips was a brand of caramel candy.”) [Indeed, I enjoyed those candies as a kid. –SA] Even if people with an adolescent mindset think of nipples when hearing the sound “nips,” the society should not endorse the idea that the concept of nipples is sexist. Men have nipples too, and women’s nipples evolved as organs of nursing, not sexual gratification. Indeed, many feminists have argued that it’s sexist to conceptualize women’s bodies from the point of view of male sexuality.

If some people make insulting puns that demean women, the society should condemn them for the insults, not concede to their puerility by endorsing their appropriation of an innocent sound. (The Linguistics Society of America and Boston Debate League do not change their names to disavow jejune clichés about cunning linguists and master debaters.) To act as if anything with the remotest connection to sexuality must be censored to protect delicate female sensibilities is insulting to women and reminiscent of prissy Victorian taboos against uncovered piano legs or the phrase “with the naked eye.”

Any harm to the community of computer scientists has been done not by me but by the pressure group and the Symposium’s surrender. As a public figure who hears from a broad range of people outside the academic bubble, I can tell you that this episode has not played well. It’s seen as the latest sign that academia has lost its mind—that it has traded reasoned argument, conceptual rigor, proportionality, and common sense for prudish censoriousness, snowflake sensibility, and virtue signaling. I often hear from intelligent non-leftists, “Why should I be impressed by the scientific consensus on climate change? Everyone knows that academics just fall into line with the politically correct position.” To secure the credibility of the academy, we have to make reasoned distinctions, and stop turning our enterprise into a laughingstock.

To repeat: none of this deprecates the important effort to stamp out harassment and misogyny in science, which I’m well aware of and thoroughly support, but which has nothing to do with the acronym NIPS.

You are welcome to share this note with interested parties.

Best,

Steve

submitted by /u/milaworld
[link] [comments]

[R] Genetically generated regex. I have a trouble understanding part of the paper.

[R] Genetically generated regex. I have a trouble understanding part of the paper.

Hello, I’m working on task of automatically generating regular expressions. I base my work on this paper:

https://esc.fnwi.uva.nl/thesis/centraal/files/f565297164.pdf

However I have a problem of understanding part on the 19th page, the part about ‘r` node in Enclosing Node part.

I’m not sure what ‘based on the number of capturing groups’ means. It it exactly the number of capturing groups in the part of a regexp before ‘r”, or number of matches of the expression in a string, or something else? And what’s the use of it?

I would be very thankful for any suggestions. This part says:

https://preview.redd.it/lh97wet58r641.png?width=1559&format=png&auto=webp&s=256cc717d5213ec159f166d887399b4da70f062d

Paper is really good read, by the way. Greatly written. However I don’t have much experience with REGEX so I’m not sure what this specific part means.

submitted by /u/Slajni
[link] [comments]

[D] How to differentiate between your idea or implementation being wrong?

As the title says, I’m curious as to what is the most efficient way to figure out if your idea is junk or your implementation messed up somewhere. Especially if your implementation failed due to strange autograd quirks of the framework you’re using (which has bit me in the past sometimes). I guess the common sense ones are:

  • Test it on toy cases (takes a long time to design)
  • Get advisor / someone else in the lab to take a look (often they agree with the general idea, but won’t have the inclination/time to study the implementation very deeply, understandably)
  • Git gud (pretty hard)

submitted by /u/tensorflower
[link] [comments]

[D] Evaluating “A Neural Algorithm of Artistic Style” by Gatys et al.

Paper: https://arxiv.org/abs/1508.06576

This is an exciting paper because it’s the first to introduce artistic style transfer using pre-trained neural networks.

What’s great is how the paper demonstrated how to extract content (e.g., shapes, contours) from an image and how to extract style from multiple layers (e.g., some layers extract fine-grained styles while others extract overall style). Combining these techniques can yield realistic and professional results.

The main issue is timing and compute power required. The paper extracts content features of an image with VGG19 and extracts style features from multiple layers with VGG19. Because it requires extensive compute power for feature extraction and for optimizing style and content loss, it takes 500-1000 iterations just to produce a low-resolution image. It would be ideal if the algorithm could produce results in a few iterations.

  1. Has anybody tried RESNET or any other state of the art network instead of VGG19?
  2. Does anyone know of a transformer network that produces similar results to this paper?
  3. Any recommendations on better style transfer papers?

submitted by /u/hotpot_ai
[link] [comments]

[D] Yann LeCun, Some folks still seem confused about what deep learning is, what do community really think the definition is.

LeCun tweeted here: https://twitter.com/ylecun/status/1209497021398343680?s=20, about what is the real definition of DL as he said “Some folks still seem confused about what deep learning is. Here is a definition: DL is constructing networks of parameterized functional modules & training them from examples using gradient-based optimization”

I think he is pointing to Gary Marques’debate with Bengio, and it seems controversial discussion over the thread, I don’t know what do you think about LeCun statement about DL definition.

submitted by /u/meldiwin
[link] [comments]

[D] Does the opaqueness of most dating app algorithms concern anyone else?

At the risk of sounding like I’m wearing a tinfoil hat, I’d like to vent regarding how messed up I think it is that most dating apps lack transparency when it comes to their match-making algorithms.

Before I jump in, let me just start by saying that a large percentage of dating takes place online these days. Therefore, anyone who wants to argue that we should all just meet in person can kindly frick off because you’re missing the point of this post.

My reasoning is as follows:

  1. How a dating algorithm functions will directly impact one’s chance of successfully finding a mate.
  2. Whether or not you can find a good mate will have a huge impact on your overall quality of life, mental health, financial success, etc.
  3. The ability to alter an algorithm to selectively favor or disfavor certain populations chance at successful mating via tweeking of a few lines of code is a unique superpower never before unleashed upon the world.
  4. Setting aside any notions of bad actors purposely inhibiting your ability to get laid (which who knows… maybe that could happen), isn’t it at all concerning that this mega-powerful ability has close to zero public oversight?

And yes, I’ll have to admit that part of the reason I am making this post is that I honestly feel like I might have been shadowbanned on Tinder for reasons that are unclear to me. It’s just a hunch of course, but it seems bizarre how much my match rate has decreased over the past couple of years. I’m bothered that I have no insight into why this might be. Maybe I’m only allowed to date people in my economic circle (i.e. poor) with the rest of the undesirables. Maybe I haven’t posted on Reddit enough in the past. Dunno.

I’d love to hear others’ thoughts on this matter.

submitted by /u/QMred
[link] [comments]

[D] Should autoencoders really be symmetric?

I always find myself wanting to make the decoder side of an autoencoder as symmetric as possible with respect to the encoder side, because it feels like an “elegant” design decision. But I suspect that it’s not optimal. And I’m not finding any direct discussions of this topic via google.

In most of mathematics, complex functions tend to have even more complex inverses. With respect to CNNs, convolutions are not strictly invertible, so it seems like the Conv2DTranspose operations could benefit from a higher complexity and parameter count to approximate it better. I’m curious if anyone has direct experience studying this, or if there are conventions for “optimizing” the decoder side of an autoencoder (or maybe it’s the encoder side needs more parameters…?).

My first inclination is to just double some numbers on the decoder side to give it twice as many parameters. But maybe including extra layers is better, since it more significantly increases the complexity of functions it can approximate. Or maybe none of this is theoretically necessary/relevant…?

Here’s an almost perfectly-symmetric reference network. Obviously I could experiment with it to come up with ideas, but I’m more interested in the general theory and if there’s any established ideas on the topic (and not just for CNNs, but all types of autoencoders).

Encoder:

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 48, 48, 3)] 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 24, 24, 32) 2432 _________________________________________________________________ conv2d_2 (Conv2D) (None, 12, 12, 64) 32832 _________________________________________________________________ conv2d_3 (Conv2D) (None, 6, 6, 128) 73856 _________________________________________________________________ flatten_3 (Flatten) (None, 4608) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 1179904 _________________________________________________________________ dense_2 (Dense) (None, 64) 16448 ================================================================= Total params: 1,305,472 

Decoder:

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 64)] 0 _________________________________________________________________ dense_3 (Dense) (None, 256) 16640 _________________________________________________________________ dense_4 (Dense) (None, 4608) 1184256 _________________________________________________________________ reshape_1 (Reshape) (None, 6, 6, 128) 0 _________________________________________________________________ conv2d_transpose_3 (Conv2DTr (None, 12, 12, 64) 73792 _________________________________________________________________ conv2d_transpose_4 (Conv2DTr (None, 24, 24, 32) 32800 _________________________________________________________________ conv2d_transpose_5 (Conv2DTr (None, 48, 48, 3) 2403 ================================================================= Total params: 1,309,891 

For reference, the above computation graph was produced with the following code fragment:

# Encoder enc_input = L.Input(shape=(48, 48, 3)) enc0 = L.Conv2D(filters= 32, kernel_size=5, strides=2, padding='same', activation='relu')(enc_input) enc1 = L.Conv2D(filters= 64, kernel_size=4, strides=2, padding='same', activation='relu')(enc0) enc2 = L.Conv2D(filters=128, kernel_size=3, strides=2, padding='same', activation='relu')(enc1) enc_flat = L.Flatten()(enc2) enc_dense = L.Dense(256, activation='tanh')(enc_flat) enc_out = L.Dense(64, activation='linear')(enc_dense) encoder = keras.Model(inputs=enc_input, outputs=enc_out, name='Encoder') # Decoder dec_input = L.Input(shape=(64,)) dec_dense1 = L.Dense(256, activation='tanh')(dec_input) dec_dense2 = L.Dense(6*6*128, activation='relu')(dec_dense1) dec_reshape = L.Reshape((6,6,128))(dec_dense2) dec2 = L.Conv2DTranspose(filters=64, kernel_size=3, strides=2, padding='same', activation='relu')(dec_reshape) dec1 = L.Conv2DTranspose(filters=32, kernel_size=4, strides=2, padding='same', activation='relu')(dec2) dec0 = L.Conv2DTranspose(filters= 3, kernel_size=5, strides=2, padding='same', activation='linear')(dec1) decoder = keras.Model(inputs=dec_input, outputs=dec0, name='Decoder') encoder.summary() decoder.summary() 

submitted by /u/etotheipi_
[link] [comments]

[D] Decision Tree Splitting strategy

[D] Decision Tree Splitting strategy

I have a dataset with 4 categorical features (Cholesterol, Systolic Blood pressure, diastolic blood pressure, and smoking rate). I use a decision tree classifier to find the probability of stroke. I am trying to verify my understanding of the splitting procedure done by Python Sklearn. Since it is a binary tree, there are three possible ways to split the first feature which is either to group categories {0 and 1 to a leaf, 2 to another leaf} or {0 and 2, 1}, or {0, 1 and 2}. What I know (please correct me here) is that the chosen split is the one with the highest information gain.

I have calculated the information gain for each of the three grouping scenarios:

{0 + 1 , 2} –> 0.17

{0 + 2 , 1} –> 0.18

{1 + 2 , 0} –> 0.004

However, sklearn’s decision tree chose the first scenario instead of the third (please check the picture).

Can anyone please help clarify the reason for selecting the first scenario? is there a priority for splits that results in pure nodes. thus selecting such a scenario although it has less information gain?

https://preview.redd.it/mkve4teopk641.jpg?width=1319&format=pjpg&auto=webp&s=fe487bedf67bc812d720ae2fe595fc41d9589dda

submitted by /u/elmsha
[link] [comments]