The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. Researchers have tackled various domains including video classification, search, personalized recommendation, and more. However, there is a research gap in combining these domains in one unified learning framework. Towards that, we propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships. Then, we use the trained embedding network to tackle various domains including video classification and recommendation, showing significant improvements over state-of-the-art baselines. The proposed approach is highly scalable to deploy on large-scale video sharing platforms like YouTube.