Learning the Inter-frame Distance for Discriminative Template-based Keyword Detection
Abstract
This paper proposes a discriminative approach to template-based
keyword detection. We introduce a method to learn the distance
used to compare acoustic frames, a crucial element for template
matching approaches. The proposed algorithm estimates the distance
from data, with the objective to produce a detector maximizing the
Area Under the receiver operating Curve (AUC), i.e. the standard
evaluation measure for the keyword detection problem. The experiments
performed over a large corpus, SpeechDatII, suggest that our model
is effective compared to an HMM system, e.g. the proposed approach
reaches 93.8\% of averaged AUC compared to 87.9\% for the HMM.
keyword detection. We introduce a method to learn the distance
used to compare acoustic frames, a crucial element for template
matching approaches. The proposed algorithm estimates the distance
from data, with the objective to produce a detector maximizing the
Area Under the receiver operating Curve (AUC), i.e. the standard
evaluation measure for the keyword detection problem. The experiments
performed over a large corpus, SpeechDatII, suggest that our model
is effective compared to an HMM system, e.g. the proposed approach
reaches 93.8\% of averaged AUC compared to 87.9\% for the HMM.