Delesley Hutchins
Research Areas
Authored Publications
Sort By
Preview abstract
We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is strikingly simple. It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens. Our design was inspired in part by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM cell up by several orders of magnitude.
Our implementation of recurrence has the same cost in both computation time and parameter count as a conventional transformer layer, but offers dramatically improved perplexity in language modeling tasks over very long sequences. Our model out-performs a long-range Transformer XL baseline by a wide margin, while running twice as fast. We demonstrate its effectiveness on PG19 (books), arXiv papers, and GitHub source code.
View details
Preview abstract
Language models typically need to be trained or finetuned in order to acquire
new knowledge, which involves updating their weights. We instead envision
language models that can simply read and memorize new data at inference time,
thus acquiring new knowledge immediately. In this work, we extend language
models with the ability to memorize the internal representations of past inputs. We
demonstrate that an approximate kNN lookup into a non-differentiable memory of
recent (key, value) pairs improves language modeling across various benchmarks
and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19),
code (Github), as well as formal theorems (Isabelle). We show that the performance
steadily improves when we increase the size of memory up to 262K tokens. On
benchmarks including code and mathematics, we find that the model is capable of
making use of newly defined functions and theorems during test time.
View details
C/C++ Thread Safety Analysis
Aaron Ballman
Dean Sutherland
2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation, IEEE
Preview abstract
Writing multithreaded programs is hard. Static analysis tools can help developers by allowing threading policies to be formally specified and mechanically checked. They essentially provide a static type system for threads, and can detect potential race conditions and deadlocks.
This paper describes Clang Thread Safety Analysis, a tool which uses annotations to declare and enforce thread safety policies in C and C++ programs. Clang is a production-quality C++ compiler which is available on most platforms, and the analysis can be enabled for any build with a simple warning flag: −Wthread−safety.
The analysis is deployed on a large scale at Google, where it has provided sufficient value in practice to drive widespread voluntary adoption. Contrary to popular belief, the need for annotations has not been a liability, and even confers some benefits with respect to software evolution and maintenance.
View details