TDDC: Timely Disclosure Documents Corpus
Abstract
In this paper, we describe the details of the TDDC (Timely Disclosure Documents Corpus). TDDC was made by aligning the sentences
manually from past Japanese and English timely disclosure documents in PDF format published by companies listed on Tokyo Stock
Exchange. TDDC consists of approximately 1.4 million parallel sentences of Japanese and English. TDDC was used as the official
dataset for the 6th Workshop on Asian Translation in order to encourage developments of machine translation.
manually from past Japanese and English timely disclosure documents in PDF format published by companies listed on Tokyo Stock
Exchange. TDDC consists of approximately 1.4 million parallel sentences of Japanese and English. TDDC was used as the official
dataset for the 6th Workshop on Asian Translation in order to encourage developments of machine translation.