Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Can ML be used to detect repeated patterns in text?

I am trying to analyze system logs. It’s a messy unstructured text. I would like to detect repeated patterns.

As an example:

Feb 24 06:48:03 circle vpopmail[12039]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:49:03 circle vpopmail[12043]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:50:03 circle vpopmail[12099]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 08:13:31 circle vpopmail[13042]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208 Feb 24 08:13:32 circle vpopmail[13046]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208

The pattern can be

pattern = “.password fail [EMAIL PROTECTED]:($ip)”

I didn’t know the pattern in advance. I just discovered by eye-balling the text. You might say, you can tokenize the words and count frequencies. In my case, it’s hard to tokenize and decide on windows of substring. the patterns might vary.

Is there a name for such techniques. I couldn’t find ML techniques appleid to this problem?

submitted by /u/__Julia
[link] [comments]

[D] A little help with a FAQ over at r/lml

Hey guys!

It’s very often that people at r/lml ask regarding various resources to help build the prerequisite knowledge for math and other aspects regarding ML/DL.

It would be really appreciated if you can give your inputs too and help lay rest to the overly asked question.

https://www.reddit.com/r/learnmachinelearning/comments/cxrpjz/a_clear_roadmap_for_mldl/

Thanks in advance!!!

submitted by /u/EssentialCoder
[link] [comments]

[D] Yann LeCun: Deep Learning, Convolutional Neural Networks, and Self-Supervised Learning | Artificial Intelligence Podcast

[D] Yann LeCun: Deep Learning, Convolutional Neural Networks, and Self-Supervised Learning | Artificial Intelligence Podcast

Yann LeCun is one of the fathers of deep learning, the recent revolution in AI that has captivated the world with the possibility of what machines can learn from data. He is a professor at New York University, a Vice President & Chief AI Scientist at Facebook, co-recipient of the Turing Award for his work on deep learning. He is probably best known as the founder of convolutional neural networks, in particular their early application to optical character recognition. This conversation is part of the Artificial Intelligence podcast:

Video: https://www.youtube.com/watch?v=SGSOCuByo24

Audio: https://lexfridman.com/yann-lecun

https://i.redd.it/dbhmztwabtj31.png

Outline:

0:00 – Introduction

1:11 – HAL 9000 and Space Odyssey 2001

7:49 – The surprising thing about deep learning

10:40 – What is learning?

18:04 – Knowledge representation

20:55 – Causal inference

24:43 – Neural networks and AI in the 1990s

34:03 – AGI and reducing ideas to practice

44:48 – Unsupervised learning

51:34 – Active learning

56:34 – Learning from very few examples

1:00:26 – Elon musk: deep learning and autonomous driving

1:03:00 – Next milestone for human-level intelligence

1:08:53 – Her

1:14:26 – Question for an AGI system

submitted by /u/UltraMarathonMan
[link] [comments]

[D] Overlapping instance segmentation (MASK RCNN?)

I’m trying to solve a problem where I’m trying to train a network for instance segmentation. Here, unlike the examples present in natural images, most of the masks overlap with each other. Here’s an example with 2 masks that I expect. Notice the significant overlap.

https://imgur.com/a/QJQB7MM

I’m trying to go about this with Matterport’s Mask RCNN implementation. And first attempt of results weren’t great. But, I’m just starting, and I probably am not configuring it correctly (to train). But, just in case this is impossible to do or there’s a better way to do this, I thought I’d ask this community.

submitted by /u/mackie__m
[link] [comments]

[D] I started writing a book on practical considerations of ML, keen for feedback on its direction

I was finding I was constantly having the same conversations with people about implementing ML in practice, so I tried to find a resource I could provide to people that might help. However, I found there wasn’t much on the practical side of implementing ML – so about a year ago I drafted out a table of contents and started writing. I ended up shelving it for a bit, and have just picked it up again now – but am torn between just blogging what I’ve already got, or trucking on to create a unified resource (i.e. the book).

I’m keen to hear what the reddit ML community thinks – whether I should continue (maybe it’s been superceded?), and if I continue, if there’s anything you’d like to see covered in the book?

My intention – should I continue and complete it – is to self-publish online through something like LeanPub. I have no burning desire to see the book in print or make money off it, I’d really just like to raise awareness of what we all need to think about when we create ML solutions in the real world.

This is me: https://twitter.com/drkatnz

Here’s how the table of contents looks (about ~25% of the content is written already, and subsections aren’t shown. Feedback so far has been to include a section on biases, which has been added):

  1. Introduction
    1.1 Terminology
    1.2 How do I get started using machine learning?

  2. Do you really need machine learning?
    2.1 Data availability
    2.2 Liability
    2.3 Capability
    2.4 Other solutions
    2.5 Pre-requisite checklist

  3. Team
    3.1 Skills
    3.2 Common team structures
    3.3 Forming a team and getting started

  4. Building your first machine learning solution

  5. Data collection
    5.1 Collecting the data
    5.2 Data set size – how much is enough?
    5.3 Labeled versus unlabeled data

  6. Pre-processing
    6.1 Automatically cleaning the data
    6.2 Dealing with missing values
    6.3 Applying domain knowledge
    6.4 Feature cleanup
    6.5 Dealing with the minority class

  7. Algorithm considerations
    7.1 Unsupervised versus supervised
    7.2 ’Good enough’ accuracy
    7.3 Storage
    7.4 Speed

  8. Measuring accuracy
    8.1 Metrics
    8.2 Minimum required accuracy
    8.3 Test set
    8.4 Investigating prediction errors
    8.5 A/B Testing

  9. Identifying and Mitigating Biases
    9.1 Biases from data
    9.2 Biases from trained models
    9.3 Inventor’s bias
    9.4 Biases caused by perception of machine learning

10 Getting an algorithm to production
10.1 Infrastructure
10.2 Documentation
10.3 User interface
10.4 Abstaining classifiers
10.5 Runtime environment

  1. Managing live algorithms
    11.1 Monitoring
    11.2 Effect on the real world
    11.3 Auditing results
    11.4 Updating models
    11.5 Technical debt

What do y’all think?

submitted by /u/katnz
[link] [comments]

[D] Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case

Link: https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402

I think this is an important issue to discuss in our community. There are a number of community members that have publicly released their speech synthesis models and they have cloned other people’s voices without consent. I believe that sets a dangerous precedent.

It’s important that speech synthesis models are not posted online for everyone to use. Those open source models may easily be abused.

It’s important that we don’t clone celebrities voices without their explicit consent. A speech synthesis model of a celebrity can put that person at great risk of identity fraud.

What do you all think?

I am not without my own biases. I do work at a speech synthesis company. We have a concrete focus on the ethics of speech synthesis.

submitted by /u/Deepblue129
[link] [comments]

[R] Additional maths exam: worth it?

Hi everyone, I’m about to start a MS in CE (controls/robotics focus). I have the possibility to (unnecessarily) add a course called “Mathematics in Machine Learning”, 8 ECTS, in my second semester, making it go from 24 to 32 ECTS. The temptation arises from the fact that proving that I own strong maths foundations in ML could result in good ML research positions and salaries. However, I’ll also have to do compulsive courses like Convex Optimization and general ML, so I’d get a good basis anyway.

My question is: is the “extra” work worth it, or won’t companies care at all?

The course programme is the following:

  • Mathematical representations of data: spaces (including Hilbert spaces), metrics, distances, dissimilarities and kernels. Geometry of very high dimensional spaces and the curse of dimensionality.
  • Learning theory, PAC, Rademacher and VC dimension. Trade-off Bias vs Model Variance and Model Complexity.
  • Cross validation, bootstrap and applications.
  • Linear algebra-based methods: Principal Component Analysis, Linear Discriminant Analysis, Independent Component Analysis and Stochastic projections (Johnson – Lindenstrauss Transform).
  • Linear Models (regression, ANOVA, DOE).
  • Generalized linear models (categorical data, logistic and multinomial regression).
  • Model and feature selection, hyperparameter tuning (e.g. lasso, AIC, BIC, ridge).
  • Bayesian networks (basic concepts, exact and MCMC-based computations).

submitted by /u/Hybr1d97
[link] [comments]

[D] What is the reality of machine learning engineer?

I’m a physics engineer but I don’t find much attraction for the jobs and I feel kind of like escaping reality/responsibilities for a little bit by going back to school.

Before finishing school, I remembered telling people how I wanted to do ML and that my internship I did on computer vision was inspiring, that I wanted to do more project on that, etc. Now I have a job and, while very serious and “important” I’m left contemplating this avenue once more. I see at my current job how data crunching is important and tedious. I’m not sure how a ML project could easily be incorporated in a company that still relies on DOS systems but I see how crucial statistical analysis are to find root cause to production problems.

I’m increasingly tempted for the above reason to hop into a 1 year professional master program on AI. However, I wonder what’s the kind of job in medium/big corporate for data/ML engineer? I’m not looking to be a programmer because I’m not that young (28) and have a big physics background (I’m not competitive vs. someone who studied computer science for example). Should I attempt this? I know asking strangers is not the wisest but I find helpful to hear from some one else experience.

submitted by /u/Knackmanic
[link] [comments]

[Discussion] Where to get started using NLP for building a Discord Bot

Hey there! I’m not sure if this is the correct place to ask this question, but it seemed like a reasonable place to start. If there’s a better place for me to post this please let me know.

I’ve had this idea for a while of wanting to build a Discord bot that is somewhat of a grammar nazi and will shame people for incorrect use of their, there, and they’re. I don’t know if this is even possible, but was wondering if there’s an NLP framework that could take a users message as input, and if the grammar is incorrect regarding the use of “their”, “there”, or “they’re” I could generate a message based off of the response. Is this feasible?

Discord supports JavaScript and Python for bots. I’m a web developer by day and comfortable with JS. If this concept is feasible I’d love to use Python for it so I can start learning the language.

Thanks!

submitted by /u/stat30fbliss
[link] [comments]