Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[R] OmniNet is all you need! ;)

[R] OmniNet is all you need! ;)

Paper url: https://arxiv.org/abs/1907.07804

Code: https://github.com/subho406/OmniNet

OmniNet is the first-ever truly universal architecture for multi-modal multi-task learning. A single OmniNet architecture can encode multiple inputs from almost any real-life domain (txt, image, video) and is capable of asynchronous multi-task learning across a wide range of tasks. The OmniNet architecture consists of multiple sub-networks called the neural peripherals, used to encode domain specific inputs as spatio-temporal representations, connected to a common central neural network called the Central Neural Processor (CNP). The CNP implements a Transformer based universal spatio-temporal encoder and a multi-task decoder. In the paper a single instance of OmniNet is jointly trained to perform the tasks of part-of-speech tagging, image captioning, visual question answering and video activity recognition. Due to the shared multi-modal representation learning architecture of the Central Neural Processor, OmniNet can also be used for zero-shot prediction for tasks it was never trained on. For example, the multi-model architecture can also be used for video captioning and video question answering even though the model was never trained on those tasks.

OmniNet architecture

submitted by /u/turing_1997
[link] [comments]