Cross-Lingual Consistency of Phonological Features: An Empirical Study
Abstract
The concept of a phoneme arose historically as a theoretical abstraction that applies language-internally. Using phonemes and phonological features in cross-linguistic settings raises an important question of conceptual validity: Are contrasts that are meaningful within a language also empirically robust across languages? This paper develops a method for assessing the cross-linguistic consistency of phonological features in phoneme inventories. The method involves training separate binary neural classifiers for several phonological contrast in audio spans centered on particular segments within continuous speech. To assess cross-linguistic consistency, these classifiers are evaluated on held-out languages and classification quality is reported. We apply this method to several common phonological contrasts, including vowel height, vowel frontness, and retroflex consonants, in the context of multi-speaker corpora for ten languages from three language families (Indo-Aryan, Dravidian, and Malayo-Polynesian). We empirically evaluate and discuss the consistency of phonological contrasts derived from features found in phonological ontologies such as PanPhon and PHOIBLE.