Chang Ye

Chang Ye

Chang Ye is software engineer at Google. He has experience in reinforcement learning and computer vision.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Text-to-image generative models have demonstrated great performance in generating realistic images. These generations are assumed to reflect a deep understanding of visual scenes. One interesting question is whether these models can possess a zero/few shot generalization capabilities that are known from humans. For example, a human can see an example of a new object and a word associated with this object, use their knowledge in a highly general way to recognize or imagine this novel object in a completely different setting or context. In this work, we are interested in testing whether text-image models can possess this same capability. In this work, we would like to test the hypothesis that text-to-image models may learn familiar objects better than novel objects. We use prompt tuning methods to learn those novel concepts while keeping the text-image models fixed. We prompt tune the model as well to learn familiar concepts, and evaluate the generalization ability for novel objects compared to familiar objects by running generation in different contexts/environments. In addition, instead of initializing the embedding vectors with some similar concepts, we use randomly initialized embedding vectors for both familiar and novel objects. Our human-survey evaluation results demonstrates that in some settings text-image models learn familiar objects better than novel objects. View details