Natural Language understanding Uncertainty Evaluation (NaLUE) is a relabelled and aggregated version of three large NLU corpuses CLINC150 (Larson, 2019), Banking77 (Casanueva et al, 2020) and HWU64 (Liu et al, 2021) It contains 50k+ utterances spanning 18 verticals, 77 domains, and ~260 intents. For this task, the model needs to map each utterance to a 3-token sequence of (vertical, domain, intent).
To provide comprehensive evaluation of a NLU model's out-of-domain and tail generalization performance, NaLUE provides a standard in-domain split (ind), two out-of-scope (oos) splits which subdivides into a near-oos split and a standard-oos split. Here the near_oos split contains queries whose vertical or domain have partial overlap with those for the in-domain queries, and the standard oos split contains queries whose verticals and domains are completely disjoint from those in the in-domain queries. Finally, it also provides a tail_intent split which contains in-domain intents that are under-represented in the training data.
A complete evaluation of the performance of modern large pre-trained language models (i.e., T5) is available in the paper Plex: Towards Reliability using Pretrained Large Model Extensions