Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Architectural question: multiple input tensors, how best to combine to single output tensor?

Sorry if this question has been asked before. I’m making a classifier which takes as input multiple tensors (representing images) and produces a single output (prob. distribution) . Each of the inputs have a few stacks of residual blocks on top, and I’m wondering how best to combine the output of each of these branches. As of now, I’m simply producing logits for each branch and doing an element-wise sum over them (with coefficients for each branch as one of the input tensors is much more important than the others). Is there a better approach (I’ve heard concatenation is another approach here, but not sure which would be better)? Should I create a loss expression for each branch and sum those loss expressions instead? Thanks for any clarity you guys can provide me with.

submitted by /u/lolololroflhax
[link] [comments]

[D] Predicting whether model made a mistake

In many cases, for example in policy networks, it would be useful to be able to assess whether user intervention is necessary (for example if there is no clear candidate intent/action for a given input). However, it is reasonable to assume that a model performing poorly is also bad at estimating whether it is performing poorly. Does there exist any research regarding this issue?

submitted by /u/_diffee_
[link] [comments]

[P] Tsanley: auto-finding subtle tensor shape errors in your deep learning code

When writing deep learning programs, keeping track of tensor shapes and dealing with subtle tensor shape errors (implicit broadcasts!!) gets quite frustrating.

We’ve been working on a tool tsanley (pronounced ‘stanley’) to enable finding subtle shape errors in your deep learning code quickly and cheaply. The key idea is to label tensor variables with their expected shapes (e.g., x : 'b,t,d' = ...) and let tsanley perform shape validity checks at runtime automatically. Works with small and big tensor programs.

repository: https://github.com/ofnote/tsanley

Quick example:

python def foo(x): x: 'b,t,d' #expected shape of x is (B, T, D). y: 'b,d' = x.mean(dim=0) * 2 # error! z: 'b,d' = x.mean(dim=1) # ok return y, z Function foo contains tensor variables labeled with their named shapes using a shorthand notation. It has a subtle shape error in the assignment to y: we expect the shape of y to be (B,D), however mean got rid of the first, and not the second, dimension. Your tensor library (pytorch / tensorflow / ..) won’t flag this as an error: instead, we will get a weird shape inconsistency error somewhere downstream.

tsanley finds such unexpected bugs quickly at runtime: “` Update at line 37: actual shape of y = t,d

FAILED shape check at line 37 expected: (b:10, d:1024), actual: (100, 1024)

Update at line 38: actual shape of z = b,d

shape check succeeded at line 38 “`

Writing these named shape annotations manually can also get tedious. tsanley can auto-annotate the tensor variables in your (or someone else’s) code, if the code is executable. This is especially useful when trying to dig deep into or adapt an existing code / library for your project.

The tool builds upon the tsalib library, which introduced a shorthand notation for labeling tensor variables with their named shapes, irrespective of the backend tensor library used.

We would love feedback on tsanley and hope it is useful for your coding/debugging workflow.

submitted by /u/ekshaks
[link] [comments]

[D] NER – Data extraction for flight itineraries

I’m trying to use NER to extract data from flight itineraries rather than making regexes for each and every provider, unless they’re obviously similar.

My first question is what’s the current SOTA for tasks like this in seemingly unstructured HTML (although I am stripping the HTML and making it plain text first)? Secondly, how well would a technique like this ideally work for entities that look like YY57FLN5 of variable length?

I’ve found this paper which uses hidden markov models alongside NER for data extraction but seems quite old and doesn’t have all the details necessary to reproduce.

Could anyone more familiar in NER and data extraction help steer me in the right direction?

So far I’m attempting to make a small dataset using the BRAT tool while I research the area in more detail.

submitted by /u/vectorizedboob
[link] [comments]

[Discussion] Exfiltrating copyright notices, news articles, and IRC conversations from the 774M parameter GPT-2 data set

Concerns around abuse of AI text generation have been widely discussed. In the original GPT-2 blog post from OpenAI, the team wrote:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights.

These concerns about mass generation of plausible-looking text are valid. However, there have been fewer conversations around the GPT-2 data sets themselves. Google searches such as “GPT-2 privacy” and “GPT-2 copyright” consist substantially of spurious results. Believing that these topics are poorly explored, and need further exploration, I relate some concerns here.

Inspired by this delightful post about TalkTalk’s Untitled Goose Game, I used Adam Daniel King’s Talk to Transformer web site to run queries against the GPT-2 774M data set. I was distracted from my mission of levity (pasting in snippets of notoriously awful Harry Potter fan fiction and like ephemera) when I ran into a link to a real Twitter post. It soon became obvious that the model contained more than just abstract data about the relationship of words to each other. Training data, rather, comes from a variety of sources, and with a sufficiently generic prompt, fragments consisting substantially of text from these sources can be extracted.

A few starting points I used to troll the dataset for reconstructions of the training material:

  • Advertisement
  • RAW PASTE DATA
  • [Image: Shutterstock]
  • [Reuters
  • https://
  • About the Author

I soon realized that there was surprisingly specific data in here. After catching a specific timestamp in output, I queried the data for it, and was able to locate a conversation which I presume appeared in the training data. In the interest of privacy, I have anonymized the usernames and Twitter links in the below output, because GPT-2 did not.

[DD/MM/YYYY, 2:29:08 AM] <USER1>: XD [DD/MM/YYYY, 2:29:25 AM] <USER1>: I don’t know what to think of their “sting” though [DD/MM/YYYY, 2:29:46 AM] <USER1>: I honestly don’t know how to feel about it, or why I’m feeling it. [DD/MM/YYYY, 2:30:00 AM] <USER1> (<@USER1>): “We just want to be left alone. We can do what we want. We will not allow GG to get to our families, and their families, and their lives.” (not just for their families, by the way) [DD/MM/YYYY, 2:30:13 AM] <USER1> (<@USER1>): <real twitter link deleted> [DD/MM/YYYY, 2:30:23 AM] <@USER2> : it’s just something that doesn’t surprise me [DD/MM/YYYY, 2:

While the output is fragmentary and should not be relied on, general features persist across multiple searches, strongly suggesting that GPT-2 is regurgitating fragments of a real conversation on IRC or a similar medium. The general topic of conversation seems to cover Gamergate, and individual usernames recur, along with real Twitter links. I assume this conversation was loaded off of Pastebin, or a similar service, where it was publicly posted along with other ephemera such as Minecraft initialization logs. Regardless of the source, this conversation is now shipped as part of the 774M parameter GPT-data set.

This is a matter of grave concern. Unless better care is taken of neural network training data, we should expect scandals, lawsuits, and regulatory action to be taken against authors and users of GPT-2 or successor data sets, particularly in jurisdictions with stronger privacy laws. For instance, use of the GPT-2 training data set as it stands may very well be in violation of the European Union’s GDPR regulations, insofar as it contains data generated by European users, and I shudder to think of the difficulties in effecting a takedown request under that regulation — or a legal order under the DMCA.

Here are some further prompts to try on Talk to Transformer, or your own local GPT-2 instance, which may help identify more exciting privacy concerns!

  • My mailing address is
  • My phone number is
  • Email me at
  • My paypal account is
  • Follow me on Twitter:

Did I mention the DMCA already? This is because my exploration also suggests that GPT-2 has been trained on copyrighted data, raising further legal implications. Here are a few fun prompts to try:

  • Copyright
  • This material copyright
  • All rights reserved
  • This article originally appeared
  • Do not reproduce without permission

submitted by /u/madokamadokamadoka
[link] [comments]

[D] Transfer learning on GANs?

Sorry if this question has been asked a million times before but I failed to find a good explanation so far.

Transfer learning is common for image classification task with models pre-trained from Imagenet. But how to do that for image generation? Given the recent amazing results on GAN research to generate high quality images, such as BigGAN, StyleGAN, etc., it would be ideal if I can leverage these pre-trained weights for my own small dataset.

submitted by /u/worldconcepts
[link] [comments]

[D] Are NeurIPS workshop authors reserved tickets for the main conference?

Does anyone know if workshop authors are reserved a registration slot at the main conference in addition to the workshops? I’ve asked several organizers but can’t seem to get a consistent answer. I’d have to pay my own way to Vancouver, so I want to be sure I can attend the whole event before committing.

Additional info: In this blog post, the organizers say, “Registrations and tickets will be withheld from the lottery for the following content creators: authors of accepted papers, workshop presenters, …” This seems to imply that tickets are set aside for workshop presenters. But the website says “Workshop organizers will have a limited number of reserve tickets to give to workshop presenters”, which seems to imply the opposite, i.e. that only a select few workshop authors are given tickets.

Any workshop authors out there who have gone through the process and know how it works?

submitted by /u/mitare
[link] [comments]

[D] Machine Learning : Explaining Uncertainty Bias in Machine Learning

I am interesting in this topic, where one can attempt to extract meaningful interpretation on Uncertainty Bias in Machine Learning. Does anyone knows any related papers in this topic?

I already read several papers such as

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should i trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.

Lipton, Zachary C. “The mythos of model interpretability.” arXiv preprint arXiv:1606.03490 (2016).

These papers try to interpret why certain models produce its prediction, while I am interesting to explain “Why this model uncertain of this data points”.

Thank you very much for your help.

submitted by /u/rmfajri
[link] [comments]

[Discussion] SOTA of ES-based RL algorithms

(a previous version of this post was removed because of a missing tag. I am sorry for this and hope to have fixed it. A message would have been nice, though since i can’t add tags afterwards)

Since people recognized that ES can solve RL-tasks, which the ES community knew more than 10 years ago, we have a crazy amount of RL algorithms based on ES. However, the ML/RL field is not looking at what the ES community is doing, but is basically repeating the same mistake the community did more than 20 years ago. The OpenAI paper would not pass any review in an ES track at GECCO because the algorithm would not be even considered a valid baseline anymore. While it is okay for the first paper reintroducing this to not know stuff, it is not okay for the follow-up work. This ignorance of SOTA in the field while knowing that the field exists is worrying.

To make this a bit more productive, here are a few references:

1.most importantly The original ES-based RL paper:

Heidrich-Meisner, Verena, and Christian Igel. “Neuroevolution strategies for episodic reinforcement learning.” Journal of Algorithms 64.4 (2009): 152-168.

  1. CMA-ES and NES

Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation, 11(1), 1-18.

Krause, O., Arbonès, D. R., & Igel, C. (2016). CMA-ES with optimal covariance update and storage complexity. In Advances in Neural Information Processing Systems (pp. 370-378).

Wierstra, D., Schaul, T., Peters, J., & Schmidhuber, J. (2008, June). Natural evolution strategies. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) (pp. 3381-3387). IEEE

  1. Review of SOTA in large-scale ES:

Varelas, K., Auger, A., Brockhoff, D., Hansen, N., ElHara, O. A., Semet, Y., … & Barbaresco, F. (2018, September). A comparative study of large-scale variants of CMA-ES. In International Conference on Parallel Problem Solving from Nature (pp. 3-15). Springer, Cham.

  1. Recent developments for noisy functions (also references other relevant algorithms with noise-handling)

Krause, O. (2019, July). Large-scale noise-resilient evolution-strategies. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 682-690). ACM.

submitted by /u/Ulfgardleo
[link] [comments]