[P] Predict figure skating world championship ranking from season performances (FunkSVD + learning to rank)
I’m trying to predict the ranking of figure skaters in the annual world championship by their scores in earlier competition events in the season. The obvious method to do is by average the scores for each skater across past events and rank them by those averages. However, since no two events are the same, the goal for my project is to separate the skater effect, the intrinsic ability of each skater, by the event effect, how an event influence the score of a skater.

I’ve previously posted on Reddit my attempts to do so using simple linear models, which you can read on Medium part 1 of my project. These models will output a latent score for each skater that we can use to rank them.

However, another approach to learn the latent scores of skater is to think of factorizing the eventskater matrix of raw scores in the season into a skaterspecific matrix and an eventspecific matrix that multiply together to approximate the raw score. Therefore, this is exactly the same as the matrix factorization in recommender systems, but with user=skater, item=event, and rating=raw score.

As a result, I used a variant of the famous FunkSVD algorithm to learn the latent scores of skater. In part 2 of my project, I tried finding just a single latent score for each skater, and rank skaters by those scores. Next, in part 3, I learned multiple latent factors for each skater using the same FunkSVD method. Since I’m implementing it from scratch, I try using various implementations of the algorithm: from a naive approach using for loop, to one using numpy broadcasting, and one using matrix multiplication, and benchmark them both in time and space complexity.

However, one major problem with multiple factors is that it’s hard to know which factor to rank skater with. Thankfully, the ranking metric I use in the project (Kendall’s tau) allows me to build a simple logistic regression model to combine these scores to rank the skaters. This can be done with pairwise differences in score in each factor as predictors, and the world championship ranking itself as the response. I later learned that this belong to the pairwise learningtorank methods often encountered in information system, and you can read my implementation of it in part 4.

However, the result at the end of this part was not very encouraging, likely due to the way that I use FunkSVD to train the latent factors. Therefore, I part 5, I modified my FunkSVD implementation to solve this problem, by training the factors in sequence instead of all at once. I then discovered afterward that Mr Funk also originally trained all of his factors in sequence, so I should have read his work more carefully at the start!
You can see all the code I used for my project in the Github repo. I’m more than happy to receive any questions or feedback from you guys on my project!
submitted by /u/seismatica
[link] [comments]