Traditional recommendation systems using collaborative filtering (CF) approaches work relatively well when the candidate videos are sufficiently popular. With the increase of user-created videos, however, recommending fresh videos gets more and more important, but pure CF-based systems may not perform well in such cold-start situation. In this paper, we model recommendation as a video content-based similarity learning problem, and learn deep video embeddings trained to predict video relationships identified by a co-watch-based system but using only visual and audial content. The system does not depend on availability on video meta-data, and can generalize to both popular and tail content, including new video uploads. We demonstrate performance of the proposed method in large-scale datasets, both quantitatively and qualitatively.