In this paper we introduce a Focus Attention MEchanism to two popular Seq2Seq architectures: RoBERTaS2S and Pegasus . Both RoBERTaS2S and Pegasus use Transformer-based encoder-decoder architecture; at each decoding step decoder learns a single contextual representation necessary to predict the next token by attending to the input sequence and the sequence that has been predicted so far. The focus attention takes inspiration from human-written text and augments this contextual representation through a dynamic vocabulary biasing to proactively generate tokens that are similar or topical to the input sequence. When evaluated on the BBC extreme summarization task, both RoBERTaS2S and Pegasus with Focus Attention generate summaries that are more faithful to their input documents, in comparison to their counterparts. Models with focus attention can holistically learn any abstract-level properties, such as mostly extractive, mostly abstractive or text-editing only, embodied in the target texts, without introducing any task-specific architectural priors. Finally, by its virtue, it supports Focus Sampling -- a technique to sample topically relevant tokens to generate diverse, yet topically consistent and faithful outputs.