[P] I used a Variational Autoencoder to build a feature-based face editing software
In my latest weekend-project I have been using a Variational Autoencoder to build a feature-based face editor. The model is explained in my youtube video:
You can inspect the code at Github:
The feature editing is based on modifying the latent distribution of the VAE. After training of the VAE is completed, the latent space is mapped by encoding the training data once more. Latent space vectors of each feature are determined based on the labels of the training data. Then to edit an image, we can add a combination of feature vectors to its latent distribution, and then reconstruct it. The reconstruction creates an altered version of the original image, based on the featrures we added to the latent representation.
The model used is heavily inspired by the Bate-VAE used in this paper by google deepmind (https://pdfs.semanticscholar.org/a902/26c41b79f8b06007609f39f82757073641e2.pdf). I made some adjustments to it to incorporate more recent advancements in neural network architecture, like using a Leaky ReLu activation function. The dataset used is celebA, which consist of 200.000 annotated images of celebrities. I aligned and cropped the images to a 64×64 resolution before training. The model is implememted in PyTorch, and PyGame has been used for the GUI. Training on my single consumer grade GPU took about 1:30h. The finished application, inducing the trained model, runs smoothly even without GPU support.
This project has been quite cool, playing with the result has been good fun. I got a lot of hands-on experience with VAEs. Creating a YouTube video explaining the project let me to learn much more about video editing and presentation techniques. I’m testing the waters with presenting this project in video form, lets see if it pays off!