[D] Parallel multi-task learning vs. continual learning
Assuming we want to learn k tasks jointly, and the data for all tasks are available. We may either train a model with parallel multi-task learning (eg. each batch is a mixture of samples from the k tasks), or present tasks sequentially (eg. switch to a different task once every 5k time steps). The latter is kind of like continual learning, except that the set of tasks is fixed and there won’t be new ones. Which training paradigm yields better results? Any paper that gives theoretical analysis or makes empirical comparisons?