Semi-supervised learning for information extraction from dialogue
Abstract
In this work we present a method for semi-supervised learning
from transcripts of dialogue between humans. We consider
the scenario in which a large amount of transcripts are available,
and we would like to extract some semantic information from
them; however, only a small number of transcripts have been
labeled with this information. We present a method for leveraging
the unlabeled data to learn a better model than could be
learned from the labeled data alone. First, a recurrent neural
network (RNN) encoder-decoder is trained on the task of predicting
nearby turns on the full dialogue corpus; next, the RNN
encoder is reused as a feature representation for the supervised
learning problem. While previous work has explored the use of
pre-training for non-dialogue corpora, our method is specifically
geared toward the dialogue use case. We demonstrate an improvement
on a clinical documentation task, particularly in the
regime of small amounts of labeled data. We compare several
types of encoders, both in the context of a classification task and
in a human-evaluation of their learned representations. We show
that our method significantly improves the classification task in
the case where only a small amount of labeled data is available.
from transcripts of dialogue between humans. We consider
the scenario in which a large amount of transcripts are available,
and we would like to extract some semantic information from
them; however, only a small number of transcripts have been
labeled with this information. We present a method for leveraging
the unlabeled data to learn a better model than could be
learned from the labeled data alone. First, a recurrent neural
network (RNN) encoder-decoder is trained on the task of predicting
nearby turns on the full dialogue corpus; next, the RNN
encoder is reused as a feature representation for the supervised
learning problem. While previous work has explored the use of
pre-training for non-dialogue corpora, our method is specifically
geared toward the dialogue use case. We demonstrate an improvement
on a clinical documentation task, particularly in the
regime of small amounts of labeled data. We compare several
types of encoders, both in the context of a classification task and
in a human-evaluation of their learned representations. We show
that our method significantly improves the classification task in
the case where only a small amount of labeled data is available.