Google Research



Hinglish-TOP consists of the largest 10K human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentation technique introduced in our paper. Queries are derived from TOPv2, a multi-domain task oriented semantic parsing dataset. Experiments suggest that with CST5, up to 20x less labeled data can achieve the same semantic parsing performance.