I am an AI Resident researching models capable of learning useful representations from multimodal tasks. I am currently studying natural language grounding through the task of vision-and-language navigation, and further exploring unsupervised multimodal machine learning by exploring the naturally aligned signals inside videos. My research interests include unsupervised learning, multimodal learning, natural language grounding and computer vision. Prior to Google, I finished his master's degree at ITA, Brazil, researching visual gesture recognition for sign language. I previously interned at Google, at Traffic Estimation for Ads team, and at Microsoft, at Azure Efficiency team, researching deep neural network architectures for building a more intelligent cloud under Marcus Fontoura. When I am away from my computer screen, where I usually am to trying to find global minima of loss functions, I also very much enjoy hiking to our nature's local maxima.