Recently, self-supervised pre-training of text representations has been success-fully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve dramatic success on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to jointly self-supervised and supervised train NMT models. To this end, a new task called crossover encoder-decoder (XEnDec) is designed to entangle their representations. The key idea is to combine pseudo parallel sentences (also generated byXEnDec)) used in self-supervised training and parallel sentences in supervised training through a second crossover. Experiments on two resource-rich translation benchmarks, WMT’14English-German and English-French, demonstrate our approach achieve substantial improvements over the Transformer. We also show that our approach is capable of improving the model robustness against input perturbations, in particular for code-switched perturbations.