While he has worked on a prolific amount of things, some of his favorite past projects include: NP-completeness of the shortest path problem, a formula for efficiently reducing sparse polynomial systems to eigenvalue problems, a universal planar manipulator, collaborative filtering with privacy, and learning games on cell phones.
In the last few years he has been exploring roofline design for machine learning: establishing hardware limits from machines and networks, and designing algorithms to achieve them. BIDMach is a rooflined machine learning (and recently deep learning) toolkit with a focus on sparse data, and implementation on commodity hardware.
At Google he is working on improving inference and representations for deep learning. This has three threads right now:
- Deploying sparse allreduce and anytime updates on Tensorflow. This leverages Kylix which is a hierarchical, fault-tolerant allreduce protocol. Kylix can be run synchronously, but also supports graded, out-of-sequence model updates for lowest possible latency.
- MCMC updates for distributed model optimization. Most distributed deep learning systems try to tie the state of individual machines together. But a variety of evidence (the most relevant is Elastic Averaging SGD from Facebook) suggests that allowing parameter particles some autonomy for local search does better. This thread is based on a fast and simple Metropolis-Hastings test, and proposal distributions which combine exploration and exploitation.
- Sparse layer representations. For feature sets that follow power-law distributions, power-law shaped layers can better match model dof to observations. This leads to better performance on regression tasks, and is being evaluated on sequence-to-sequence models. An open question is whether power-law features have value deep inside the network. We are building them to find out.