Jump to Content
Damien Kelly

Damien Kelly

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Could we compress images via standard codecs while avoiding visible artifacts? The answer is obvious -- this is doable as long as the bit budget is generous enough. What if the allocated bit-rate for compression is insufficient? Then unfortunately, artifacts are a fact of life. Many attempts were made over the years to fight this phenomenon, with various degrees of success. In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits. We design this editing operation as a learned convolutional neural network, and formulate an optimization problem for its training. Our loss takes into account a proximity between the original image and the edited one, a bit-budget penalty over the proposed image, and a no-reference image quality measure for forcing the outcome to be visually pleasing. The proposed approach is demonstrated on the popular JPEG compression, showing savings in bits and/or improvements in visual quality, obtained with intricate editing effects. View details
    Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data
    Abdullah Abuolaim
    Michael S. Brown
    International Conference on Computer Vision (ICCV) (2021)
    Preview abstract Recent work has shown impressive results on data-driven defocus deblurring using the two-image views available on modern dual-pixel (DP) sensors. One significant challenge in this line of research is access to DP data. Despite many cameras having DP sensors, only a limited number provide access to the low-level DP sensor images. In addition, capturing training data for defocus deblurring involves a time-consuming and tedious setup requiring the camera's aperture to be adjusted. Some cameras with DP sensors (e.g., smartphones) do not have adjustable apertures, further limiting the ability to produce the necessary training data. We address the data capture bottleneck by proposing a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Leveraging these realistic synthetic DP images, we introduce a recurrent convolutional network (RCN) architecture that improves deblurring results and is suitable for use with single-frame and multi-frame data (e.g., video) captured by DP sensors. Finally, we show that our synthetic DP data is useful for training DNN models targeting video deblurring applications where access to DP data remains challenging. View details
    Preview abstract We present a highly efficient blind image restoration method to remove mild blur in natural images. Contrary to the mainstream, we focus on removing slight blur that is often present damaging image quality and commonly generated by small out-of-focus, lens blur or slight camera motion. The proposed algorithm first estimates image blur and then compensates for it by combining multiple applications of the estimated blur in a principle-based way. In this sense, we present a novel procedure to design the approximate inverse of a filter and make only use of re-applications of the filter itself. To estimate image blur in natural images we introduce a simple yet robust algorithm based on empirical observations about the distribution of the gradient in sharp images. Our experiments show that, in the context of mild blur, the proposed method outperforms traditional and modern blind deconvolution methods and runs in a fraction of time. We finally show that the method can be used to blindly correct blur before applying an out-of-the-shelf deep super-resolution model leading to superior results than other highly complex and computational demanding methods. The proposed method can estimate and remove mild blur on a 12Mp image on a modern mobile phone device in a fraction of a second. View details
    Handheld Multi-Frame Super-Resolution
    Bartlomiej Wronski
    Manfred Ernst
    Marc Levoy
    ACM Transactions on Graphics (TOG), vol. 38 (2019), pp. 18
    Preview abstract Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google's flagship phone. View details
    Bitrate Classification of Twice-Encoded Audio using Objective Quality Features
    Colm Sloan
    Naomi Harte
    Anil Kokaram
    Andrew Hines
    8th International Conference on Quality of Multimedia Experience (QoMEX 2016)
    Preview abstract Streaming services such as Google Play Music and Sound-Cloud handle terabytes of audio data every week. These services aim to encode audio with a balance between quality of experience (QoE) [1] for the end user, the size of the encoded audio files, and the processing cost of the encoding. Users may upload files to a streaming service that have already been encoded because the user wants to reduce file size to decrease upload time. The same audio encoded as a 3 MB uncompressed WAV, a 510 KB 256kb/s AAC-LC, or a 250 KB 128 kb/s Opus all seem similar in quality to expert listeners [2]. Streaming services encode audio to a number of bitrates and formats to provide the best experience for users of different devices. For example, mobile users may prefer to compromise quality to limit bandwidth consumption. Services do not encode to bitrates higher than that of the uploaded files as there will be no increase in quality. Determining the lowest bitrate of the files allows the streaming service to forgo encoding the files to bitrates higher than that of the uploaded files, saving on processing and storage space. View details
    ViSQOLAudio: An objective audio quality metric for low bitrate codecs
    Andrew Hines
    Eoin Gillen
    Anil Kokaram
    Naomi Harte
    The Journal of the Acoustical Society of America, vol. 137 (6) (2015), EL449-EL455
    Preview abstract Streaming services seek to optimise their use of bandwidth across audio and visual channels to maximise the quality of experience for users. This letter evaluates whether objective quality metrics can predict the audio quality for music encoded at low bitrates by comparing objective predictions with results from listener tests. Three objective metrics were benchmarked: PEAQ, POLQA, and VISQOLAudio. The results demonstrate objective metrics designed for speech quality assessment have a strong potential for quality assessment of low bitrate audio codecs. View details
    Temporal Synchronization of Multiple Audio Signals
    Sasi Inguva
    Andy Crawford
    Hugh Denman
    Anil Kokaram
    Proceedings of the International Conference on Signal Processing (ICASSP), Florence, Italy (2014)
    Preview abstract Given the proliferation of consumer media recording devices, events often give rise to a large number of recordings. These recordings are taken from different spatial positions and do not have reliable timestamp information. In this paper, we present two robust graph-based approaches for synchronizing multiple audio signals. The graphs are constructed atop the over-determined system resulting from pairwise signal comparison using cross-correlation of audio features. The first approach uses a Minimum Spanning Tree (MST) technique, while the second uses Belief Propagation (BP) to solve the system. Both approaches can provide excellent solutions and robustness to pairwise outliers, however the MST approach is much less complex than BP. In addition, an experimental comparison of audio features-based synchronization shows that spectral flatness outperforms the zero-crossing rate and signal energy. View details
    Anil Kokaram
    Hugh Denman
    Andrew Crawford
    IEEE International Conference on Image Processing, IEEE, 1600 Amphitheatre Parkway (2012)
    Preview abstract The vast majority of previous work in noise reduction for visual media has assumed uncorrelated, white, noise sources. In practice this is almost always violated by real media. Film grain noise is never white, and this paper highlights that the same applies to almost all consumer video content. We therefore present an algorithm for measuring the spatial and temporal spectral density of noise in archived video content, be it consumer digital camera or film orginated. As an example of how this information can be used for video denoising, the spectral density is then used for spatio-temporal noise reduction in the Fourier frequency domain. Results show improved performance for noise reduction in an easily pipelined system. View details
    Voxel-based Viterbi Active Speaker Tracking (V-VAST) with best view selection for video lecture post-production
    Anil C. Kokaram
    Frank Boland
    ICASSP (2011), pp. 2296-2299