Douglas Aberdeen
Doug worked for several years in the field of Reinforcement Learning before joining Google. Within Google he works on Gmail including things like spam detection, but most recently including Priority Inbox.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
The Learning Behind Gmail Priority Inbox
Preview
Ondrey Pacovsky
LCCC : NIPS 2010 Workshop on Learning on Cores, Clusters and Clouds
The War Against Spam: A report from the front line
Brad Taylor
Dan Fingal
NIPS 2007 Workshop on Machine Learning in Adversarial Environments for Computer Security
Preview abstract
Fighting spam is a success story of real-world machine learning. Despite the occasional spam that does reach our inboxes, the overwhelming majority of spam —
and there is a lot of it — is positively identified. At the same time, the rarity with which users feel the need to check their spam box for false positives demonstrates
a high precision of classification. This paper is an overview of Google’s approach to fighting email abuse with machine learning, and a discussion of some lessons learned.
View details
The Factored Policy-Gradient Planner
Olivier Buffet
Journal of Artificial Intelligence Research (JAIR), 173(2008), pp. 722-747
Preview abstract
We present an any-time concurrent probabilistic temporal planner (CPTP) that includes
continuous and discrete uncertainties and metric functions. Rather than relying on dy-
namic programming our approach builds on methods from stochastic local policy search.
That is, we optimise a parameterised policy using gradient ascent. The flexibility of this
policy-gradient approach, combined with its low memory use, the use of function approxi-
mation methods and factorisation of the policy, allow us to tackle complex domains. This
Factored Policy Gradient (FPG) planner can optimise steps to goal, the probability of
success, or attempt a combination of both. We compare the FPG planner to other plan-
ners on CPTP domains, and on simpler but better studied non-concurrent non-temporal
probabilistic planning (PP) domains. We present FPG-ipc, the PP version of the planner
which has been successful in the probabilistic track of the fifth international planning
competition.
View details
FF+FPG: Guiding a Policy-Gradient Planner
Olivier Buffet
Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling (ICAPS'07), Providence, USA(2007)
Policy-Gradients for PSRs and POMDPs
Olivier Buffet
Owen Thomas
Proc. 11th Intl. Conf. on Artificial Intelligence and Statistics (AIstats), Society for Artificial Intelligence and Statistics, San Juan, Puerto Rico(2007)
Concurrent Probabilistic Temporal Planning with Policy-Gradients
Olivier Buffet
Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling (ICAPS'07), Providence, USA(2007)
Natural Actor-Critic for Road Traffic Optimisation
Silvia Richter
Jin Yu
Advances in Neural Information Processing Systems, The {MIT} Press, Cambridge, MA(2007)
The Factored Policy Gradient planner (IPC-06 Version)
Policy-Gradient Methods for Planning
Advances in Neural Information Processing Systems, The {MIT} Press, Cambridge, MA(2006)
Policy-Gradient for Robust Planning
O. Buffet
Proceedings of the ECAI'06 Workshop on Planning, Learning and Monitoring with Uncertainty and Dynamic Worlds (PLMUDW'06)(2006)
Fast Online Policy Gradient Learning with {SMD} Gain Vector Adaptation
Nicol N. Schraudolph
Jin Yu
Advances in Neural Information Processing Systems, The {MIT} Press, Cambridge, MA(2006), pp. 1185-1192
Preview abstract
Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.
View details
Policy-Gradient for Robust Planning (French)
O. Buffet
Actes de la conférence francophone sur l'apprentissage automatique (CAp'06)(2006)
Planification robuste avec (L)RTDP
O. Buffet
Actes de la conférence francophone sur l'apprentissage automatique (CAp'05)(2005)
A Two-Teams Approach for Robust Probabilistic Temporal Planning
O. Buffet
Proceedings of the ECML'05 workshop on Reinforcement Learning in Non-Stationary Environments(2005)
Robust Planning with (L)RTDP
O. Buffet
Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI'05)(2005)
Prottle: A Probabilistic Temporal Planner
Simulation Methods for Uncertain Decision-Theoretic Planning
O. Buffet
Proceedings of the IJCAI 2005 Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains
Decision-Theoretic Military Operations Planning
Filtered Reinforcement Learning
Proceedings of the 15th European Conference on Machine Learning, Springer(2004), pp. 27-38
Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
Ph.D. Thesis, The Australian National University(2003)
Scaling Internal-State Policy-Gradient Methods for {POMDP}s
Jonathan Baxter
Proceedings of the 19th International Conference on Machine Learning, Morgan Kaufmann, Syndey, Australia(2002)
Emmerald: A fast Matrix-Matrix Multiply Using {I}ntel {SIMD} Technology
Jonathan Baxter
Concurrency and Computation: Practice and Experience, 13(2001), pp. 103-119
92c {/MFlop/s, Ultra-Large-Scale Neural-Network} Training on a {PIII} Cluster
General Matrix-Matrix Multiplication Using {SIMD} features of the {PIII}
Jonathan Baxter
Euro-Par 2000: Parallel Processing, Springer-Verlag, Munich, Germany