RiSER: Learning Better Representations for Richly Structured Emails

Furkan Kocayusufoğlu; Ying Sheng; Nguyen Ha Vo; James B. Wendt; Qi Zhao; Sandeep Tata; Marc Najork

RiSER: Learning Better Representations for Richly Structured Emails

Furkan Kocayusufoğlu

Ying Sheng

Nguyen Ha Vo

James B. Wendt

Qi Zhao

Sandeep Tata

Marc Najork

Proceedings of the 2019 World Wide Web Conference, pp. 886-895

Download Google Scholar

Abstract

Recent studies show that an overwhelming majority of emails are machine-generated and sent by businesses to consumers. Many large email services are interested in extracting structured data from such emails to enable intelligent assistants. This allows experiences like being able to answer questions such as ``What is the address of my hotel in New York?'' or ``When does my flight leave?''. A high-quality email classifier is a critical piece in such a system. In this paper, we argue that the rich formatting used in business-to-consumer emails contains valuable information that can be used to learn better representations. Most existing methods focus only on textual content and ignore the rich HTML structure of emails. We introduce RiSER (Richly Structured Email Representation) -- an approach for incorporating both the structure and content of emails. RiSER projects the email into a vector representation by jointly encoding the HTML structure and the words in the email. We then use this representation to train a classifier. To our knowledge, this is the first description of a neural technique for combining formatting information along with the content to learn improved representations for richly formatted emails. Experimenting with a large corpus of emails received by users of Gmail, we show that RiSER outperforms strong attention-based LSTM baselines. We expect that these benefits will extend to other corpora with richly formatted documents. We also demonstrate with examples where leveraging HTML structure leads to better predictions.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

RiSER: Learning Better Representations for Richly Structured Emails

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs