A new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle is proposed. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, our attention model produces a boolean rather than a continuous mask thus entirely concealing information from masked-out pixels. Using a set of synthetic datasets based on MNIST and CIFAR10 and a SVHN dataset, we demonstrate that our method can successfully attend to features defining the image class. We also discuss potential drawbacks of our methods and propose a mask randomization technique to alleviate one of them.