[P] Implementation of Samsung’s “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models”
I have been working on implementing the model from the paper: [Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://arxiv.org/abs/1905.08233v1) (Zakharov et al.) for my own projects and research. It uses a very interesting model of GAN and triggered my interest.
The paper has been out now for a couple of months and some implementations already exist out there although the results they show are not quite at the level of what is seen in the paper. For my implementation I added further recommendations given by the paper’s author on various details that were unclear to me and other existing implementations upon only reading the paper. (added more depth to the network, adjusted adaIN parameters, …).
Due to a lack of compute resources at my disposition and due to the model being very heavy, I only trained it on 5 epochs on a test dataset (15 times less epochs than in the paper and with a dataset 34 times smaller) but the results look promising so far for the relatively small amount of training that went into it.
Here an example of fake faces it generated from facial landmarks and embedding vectors:
More examples with the original faces for which the landmarks were extracted from can be seen on my github repo.
I also did an functioning demo that uses the webcam and an vector to create live fake faces from your own. I have a link to a video of that on my repo and the code for the demo will soon be uploaded.
If anyone is interested in using or training the model further or improving the project feel free to take a look and contribute 🙂
I ended up writing the paper from scratch for learning purposes but I would like to thank u/MrCaracara for doing the first implementation I know of.