Evolving Losses for Video Representation Learning
Abstract
We present a new method to learn video representations from unlabeled data. We
formulate our unsupervised representation learning as a multi-modal, multi-task
learning problem. We also introduce the concept of finding a better loss function
to train such multi-task multi-modal representation space using an evolutionary
algorithm; our method automatically searches over different combinations of loss
functions capturing multiple (self-supervised) tasks and modalities
formulate our unsupervised representation learning as a multi-modal, multi-task
learning problem. We also introduce the concept of finding a better loss function
to train such multi-task multi-modal representation space using an evolutionary
algorithm; our method automatically searches over different combinations of loss
functions capturing multiple (self-supervised) tasks and modalities