Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Kaggle/NFL Big Data Bowl – $75,000

https://www.kaggle.com/c/nfl-big-data-bowl-2020/overview

How many yards will an NFL player gain after receiving a handoff?

American football is a complex sport. From the 22 players on the field to specific characteristics that ebb and flow throughout the game, it can be challenging to quantify the value of specific plays and actions within a play. Fundamentally, the goal of football is for the offense to run (rush) or throw (pass) the ball to gain yards, moving towards, then across, the opposing team’s side of the field in order to score. And the goal of the defense is to prevent the offensive team from scoring.

In the National Football League (NFL), roughly a third of teams’ offensive yardage comes from run plays.. Ball carriers are generally assigned the most credit for these plays, but their teammates (by way of blocking), coach (by way of play call), and the opposing defense also play a critical role. Traditional metrics such as ‘yards per carry’ or ‘total rushing yards’ can be flawed; in this competition, the NFL aims to provide better context into what contributes to a successful run play.

As an “armchair quarterback” watching the game, you may think you can predict the result of a play when a ball carrier takes the handoff – but what does the data say? In this competition, you will develop a model to predict how many yards a team will gain on given rushing plays as they happen. You’ll be provided game, play, and player-level data, including the position and speed of players as provided in the NFL’s Next Gen Stats data. And the best part – you can see how your model performs from your living room, as the leaderboard will be updated week after week on the current season’s game data as it plays out.

Deeper insight into rushing plays will help teams, media, and fans better understand the skill of players and the strategies of coaches. It will also assist the NFL and its teams evaluate the ball carrier, his teammates, his coach, and the opposing defense, in order to make adjustments as necessary.

Additionally, the winning model will be provided to the NFL’s Next Gen Stats group to potentially share with teams. You could help the NFL Network generate models to use during games, or for pre-game/post-game breakdowns.

submitted by /u/mystikaldanger
[link] [comments]

[D] Data research: the new direction?

Recently I am working on projects that deal with datasets directly. What I found is that there is so little research on data such as data annotation, though there is a lot of work on semi-supervised learning, and more recently, self-supervised learning.

In my opinion, the research community will gradually move from a model-centered research to data-centered research. I wrote an article to discuss this on Towards Data Science. https://towardsdatascience.com/data-the-predicament-and-opportunity-in-the-deep-learning-era-256f4b4fef I’d really like to hear you guys’ opinions~

submitted by /u/HaiwenHuang
[link] [comments]

[D] How to do trend analysis on textual data

Hi all, I am now working on a dataset of customer reviews and we would like to analyze how customer feedback change across time. For sentiment it is easy as I can build a sentiment classifier and have a sentiment scores, and do conventional time-series analysis on the score. However, when it comes to analysis like topic-modeling, is there any time-trend related analysis on topic-modeling? Thanks for any advices.

submitted by /u/InventorWu
[link] [comments]

[D] Marketplace for machine learning?

The idea is having a marketplace where researchers publish trained models and developers like myself buy the models (for example .pb files in Tensorflow) and use it to solve my client’s problem. I’ve been searching on Google for a few days but there is no such marketplace except free and open-source models.

Commercializing pre-trained models would create new jobs in the machine learning field and speed up the process of applying research results into practice.

For example, the researcher publishes the trained models of forecasting inventory demand, and the developer uses it to develop software for eCommerce websites.

How do you think about the idea?

submitted by /u/ConVit
[link] [comments]

[R] Learn faster with smarter data labeling

Hey, some research we’ve done in the direction of active learning.

Dealing with a big unlabeled dataset may become very expensive very fast. Therefore it makes sense to invest time into labeling optimization techniques. In the article below, we explore one of the optimizations called active learning. Active Learning is a branch of machine learning that seeks to minimize the total amount of data required for labeling by strategically sampling observations that provide new insight into the problem. In particular, algorithms try to select diverse and informative data for annotation (rather than random observations) from a pool of unlabeled data.

Excited to share:

https://towardsdatascience.com/learn-faster-with-smarter-data-labeling-15d0272614c4

submitted by /u/michael_htx
[link] [comments]

[N] Deep Graph Library new release (v0.4)

This new release brings the support of heterogeneous graph. A heterogeneous graph is a graph whose nodes and edges are typed, which is very common in knowledge graph, recommender system and many other scenarios. Using this new feature, DGL brings many new models with efficient implementation. Here are some examples:

  • Graph Convolutional Matrix Completion [Code in MXNet]

    Dataset RMSE (DGL) RMSE (Official) Speed (DGL) Speed (Official) Speed Comparison
    MovieLens-100K 0.9077 0.910 0.0246s/epoch 0.1008s/epoch 5x
    MovieLens-1M 0.8377 0.832 0.0695s/epoch 1.538s/epoch 22x
    MovieLens-10M 0.7875 0.777 0.6480s/epoch OOM

One highlight is that DGL can train the GCMC model on MovieLens-10M dataset in one GPU in only an hour. Previous implementation resorts to load mini-batches on-the-fly from CPU which could take up to 24 hours.

One highlight is that using the heterograph interface, the new code can train an R-GCN on the full AM RDF graph (>5M edges) using one GPU, while the original implementation can only run on CPU and consume 32GB memory. It takes 51.88s to train one epoch on CPU, while the new implementation takes only 0.1781s for one epoch on V100 GPU (291x faster !!).

Apart from the heterogeneous graph support, a new package DGL-KE is released for training popular network embedding models. Currently, DGL-KE supports TransE, DistMult, ComplEx and can train them very fast. It only takes 6.85 minutes to fully train a TransE model using one GPU on FB15K graph. As a comparison, GraphVite takes 14 minutes using four GPUs. More models (RESCAL, RotatE, pRotatE, TransH, TransR, TransD, etc) are under developing and will be released in the future.

All the models and training scripts are available and can be run off-the-shelf. Checkout this exciting new release (https://github.com/dmlc/dgl/releases/edit/v0.4.0) if you are working on network embedding or problems that can be formulated as heterogeneous graphs!

submitted by /u/jermainewang
[link] [comments]