ICML Workshop on Human In the Loop Learning (2019)
Preview abstract
We tested, in a production setting, the use of active learning for selecting text documents for human annotations used to train a Thai segmentation machine learning model. In our study, two concurrent annotated samples were constructed, one through random sampling of documents from a text corpus, and the other through model-based scoring and ranking of documents from the same corpus. We observed that several of the assumptions forming the basis of offline (simulated) evaluation largely failed in the live setting. We present these challenges and propose guidelines addressing each of them which can be used for the design of live experimentation of active learning, and more generally for the application of active learning in live settings.View details
Preview abstract
In this paper we address the usefulness of the notion of a paradigm in the context of derivational morphology. We first define a notion of paradigmatic system that extends conservatively the notion as it is used in inflection so as to be applicable to collections of structured families of derivationally-related words. We then build on this definition in an empirical quantitative study of derivational families of verbs in French. We apply information-theoretic measures of predictability initially designed by Ackerman, Blevins and Malouf (2009) in the context of inflection. We conclude that key quantitative properties are common to inflectional and derivational paradigmatic systems, and hence that (partial) paradigms are an important ingredient of the study of derivation.View details