Imitation learning is a popular approach for training effective visual navigation policies. However, collecting expert demonstrations for a legged robot is less practical because the robot is hard to control, and it walks slowly and cannot run continuously for a long time. In this work, we propose a zero-shot imitation learning framework for training a visual navigation policy on a legged robot from human demonstration (third-person perspective) only, allowing for more cost-effective data collection with better navigation capability. However, imitation learning from third-person perspective demonstrations raises unique challenges. Human demonstrations are captured with different camera perspectives, therefore, we design a feature disentanglement network~(FDN) that extracts perspective-agnostic state features. We reconstruct missing action labels by either building an inverse model of the robot's dynamics in the feature space and applying it to the demonstrations or developing efficient GUI to label human demonstrations. We take a model-based imitation learning approach for training a visual navigation policy from the perspective-agnostic, action-labeled demonstrations. We show that our framework can learn an effective visual navigation policy for a legged robot, Laikago, from expert demonstrations in both simulated and real-world environments. Our approach is zero-shot as the robot never tries to navigate a certain navigation path in the testing environment before the testing phase. We also justify our framework by performing an ablation study and comparing it with baseline algorithms.