[R] Few-shot learning of talking heads

Written by torontoai on May 21, 2019. Posted in Reddit MachineLearning.

Hello!

I’d like to tell you about our recent paper. We’ve tackled the problem of a few-shot generation of talking heads: given a few (or even a single) image, train a model that is able to synthesize new images of that particular person with a new pose (viewpoint and expression).

Our model was trained on a publicly available dataset of YouTube videos (VoxCeleb2, 224p) and avoided mode collapse, even though the quality of images here is quite diverse. Hense, we’re able to generalize well for new images with identities unseen during training (we can even run it for paintings and get reasonable results).

The key ingredients are adversarial meta-learning, adversarial fine-tuning and adaptive instance normalization, for more details please refer to the paper, short description of our method as well as the results are in the video below.

ArXiv: https://arxiv.org/abs/1905.08233
Video: https://www.youtube.com/watch?v=p1b5aiTrGzY

One- and few-shot results produced by our model

submitted by /u/ezakharov
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[R] Few-shot learning of talking heads