Human-level visual performance has remained largely beyond the reach of engineered systems despite decades of research and significant advances in problem formulation, algorithms and computing power. We posit that significant progress can be made by combining existing technologies from machine vision, insights from theoretical neuroscience and large-scale distributed computing. Such claims have been made before and so it is quite reasonable to ask what are the new ideas we bring to the table that might make a difference this time around. From a theoretical standpoint, our primary point of departure from current practice is our reliance on exploiting time in order to turn an otherwise intractable unsupervised problem into a locally semi-supervised, and plausibly tractable, learning problem. From a pragmatic perspective, our system architecture follows what we know of cortical neuroanatomy and provides a solid foundation for scalable hierarchical inference. This combination of features provides the framework for implementing a wide range of robust object-recognition capabilities.