- Rajhans Samdani
- Gideon Mann
Abstract
In practice, machine learning systems deal with multiple datasets over time. When the feature spaces between these datasets overlap, it is possible to transfer information from one task to another. Typically in transfer learning, all labeled data from a source task is saved to be applied to a new target task thereby raising concerns of privacy, memory and scaling. To ameliorate such concerns, we present a semi-supervised algorithm for text categorization that transfers information across tasks without storing the data of the source task. In particular, our technique learns a sparse low-dimensional projection from unlabeled and the source task data. In particular, our technique learns low-dimensional sparse word clusters-based features from the source task data and a massive amount of additional unlabeled data. Our algorithm is efficient, highly parallelizable, and outperforms competitive baselines by up to 9% on several difficult benchmark text categorization tasks.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work