The Generative Neural Visual Artist (GeNeVA) task


The GeNeVA task involves a Teller giving a sequence of linguistic instructions to a Drawer for the ultimate goal of image generation.

The Teller is able to gauge progress through visual feedback of the generated image. This is a challenging task because the Drawer needs to learn how to map complex linguistic instructions to realistic objects on a canvas, maintaining not only object properties but relationships between objects (e.g., relative location). The Drawer also needs to modify the existing drawing in a manner consistent with previous images and instructions, so it needs to remember previous instructions. All of these involve understanding a complex relationship between objects in the scene and how those relationships are expressed in the image in a way that is consistent with all instructions given.


What thoughts does the community have about generating images conditioned on captions iteratively instead of doing generation in one go? Most papers do not seem to be doing this iterartively but some recent papers have appeared which seems to be a good idea to me.

