
Muhammad Ferjad Naeem
Ferjad Naeem is a Research Scientist at Google working on open world semantic understanding and reasoning. He is interested in building strong multimodal foundational models. Ferjad obtained his Ph.D. from ETH Zurich where he worked on Open-world Computer Vision with Language Guidance.
Authored Publications
Sort By
TOKENFORMER: Rethinking Transformers Scaling with Tokenized Model Parameters
Jan Eric Lenssen
Haiyang Wang
Liwei Wang
Fan Yue
Bernt Schiele
2025
Preview abstract
Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on fixed parameters within linear projections, especially when architectural modifications (e.g., channel dimensions) are introduced. Each scaling iteration typically requires retraining the entire model from the beginning, leading to suboptimal utilization of computational resources. To overcome this limitation, we introduce TokenFormer, a naturally scalable architecture that leverages the attention mechanism exclusively for computations among input tokens and interactions between input tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformer with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This innovative approach allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124 million to 1.4 billion parameters by incrementally adding new key-value parameters, achieving performance comparable to models trained from scratch while greatly reducing training costs. Code and models will be publicly available.
View details