Onur G. Guleryuz
Onur G. Guleryuz is a Software Engineer at Google working on machine learning and computer vision problems with applications in augmented and virtual reality. Prior to Google he worked at LG Electronics, Futurewei, NTT DoCoMo, and Seiko-Epson all in Silicon Valley. Before coming to Silicon Valley in 2000 he served as an Asst. Prof. with NYU Tandon School of Engineering in New York.
His research interests include topics in machine learning, signal processing, computer vision, and information theory. He has served in numerous panels, conference committees, and media-related industry standardization bodies. He has authored an extensive number of refereed papers, granted US patents, and has leading edge contributions to products ranging from mobile phones to displays and printers. He has been an active member of IEEE Signal Processing Society, having served as chair of the IEEE Signal Processing Society Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP TC) and as a member of the IEEE Signal Processing Society Multimedia Signal Processing Technical Committee (MMSP TC) .
He received the BS degrees in electrical engineering and physics from Bogazici University, Istanbul, Turkey in 1991, the M.S. degree in engineering and applied science from Yale University, New Haven, CT in 1992, and the Ph.D. degree in electrical engineering from University of Illinois at Urbana-Champaign (UIUC), Urbana, in 1997.
He received the National Science Foundation Career Award, the IEEE Signal Processing Society Best Paper Award, the IEEE International Conference on Image Processing Best Paper Award, the Seiko-Epson Corporation President's Award for Research and Development, and the DoCoMo Communications Laboratories President's Award for Research. He is a Fellow of IEEE.
Further information including patent and publication details can be found here.
Authored Publications
Sort By
Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers
Phil A. Chou
Hugues Hoppe
Danhang Tang
Jonathan Taylor
Philip Davidson
arXiv:2402.05887 (2024)
Preview abstract
We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec’s performance on its intended content, it can effectively adapt the codec to other types of image/video content and to other distortion measures. Essentially, the sandwich learns to transmit “neural code images” that optimize overall rate-distortion performance even when the overall problem is well outside the scope of the codec’s design. Through a variety of examples, we apply the sandwich architecture to sources with different numbers of channels, higher resolution, higher dynamic range, and perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 3 adaptations. We derive VQ equivalents for the sandwich, establish optimality properties, and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in image/video compression and streaming.
View details
Sandwiched Video Compression: An Efficient Learned Video Compression Approach
Danhang "Danny" Tang
Jonathan Taylor
Phil Chou
IEEE International Conference on Image Processing (2023)
Preview abstract
We propose sandwiched video compression – a video compressionframework that wraps neural networks around a standard video codec.The framework consists of a neural pre-processor, a neural post-processor, and a standard video codec between them, trained jointlyto optimize a rate-distortion loss function. Training such a frameworkend-to-end requires a differentiable proxy for the standard videocodec, which is significantly more challenging than designing imagecodec proxies due to temporal processing such as motion prediction,inter/intra mode decisions, and in-loop filtering. In this work, wepropose a computationally efficient way of approximating a videocodec and demonstrate that the neural codes generated by the neuralpre-processor can be compressed to a better rate-distortion point thanthe original frames in the input video. More precisely, sandwichedHEVC YUV 4:4:4 in low-resolution mode and sandwiched HEVCYUV 4:0:0 show around 6.5 dB and 8 dB improvements over thestandard HEVC in the same mode and format, respectively. Moreover,when optimized for and tested with a perceptual similarity metric,Learned Perceptual Image Patch Similarity (LPIPS), we observe30%to40%improvement over the standard HEVC YUV 4:4:4, dependingon the rate.
View details
Sandwiched Image Compression: Increasing the resolution and dynamic range of standard codecs
Phil Chou
Hugues Hoppe
Danhang "Danny" Tang
Philip Davidson
2022 Picture Coding Symposium (PCS), IEEE (to appear)
Preview abstract
Given a standard image codec, we compress images that may have higher resolution and/or higher bit depth than allowed in the codec's specifications, by sandwiching the standard codec between a neural pre-processor (before the standard encoder) and a neural post-processor (after the standard decoder). Using a differentiable proxy for the the standard codec, we design the neural pre- and post-processors to transport the high resolution (super-resolution, SR) or high bit depth (high dynamic range, HDR) images as lower resolution and lower bit depth images. The neural processors accomplish this with spatially coded modulation, which acts as watermarks to preserve the important image detail during compression. Experiments show that compared to conventional methods of transmitting high resolution or high bit depth through lower resolution or lower bit depth codecs, our sandwich architecture gains ~9 dB for SR images and ~3 dB for HDR images at the same rate over large test sets. We also observe significant gains in visual quality.
View details
Sandwiched Image Compression: Wrapping Neural Networks Around a Standard Codec
Phil Chou
Hugues Hoppe
Danhang "Danny" Tang
Philip Davidson
2021 IEEE International Conference on Image Processing (ICIP), IEEE, Anchorage, Alaska, pp. 3757-3761
Preview abstract
We sandwich a standard image codec between two neural networks: a preprocessor that outputs neural codes, and a postprocessor that reconstructs the image. The neural codes are compressed as ordinary images by the standard codec. Using differentiable proxies for both rate and distortion, we develop a rate-distortion optimization framework that trains the networks to generate neural codes that are efficiently compressible as images. This architecture not only improves rate-distortion performance for ordinary RGB images, but also enables efficient compression of alternative image types (such as normal maps of computer graphics) using standard image codecs. Results demonstrate the effectiveness and flexibility of neural processing in mapping a variety of input data modalities to the rigid structure of standard codecs. A surprising result is that the rate-distortion-optimized neural processing seamlessly learns to transport color images using a single-channel (grayscale) codec.
View details
Deep Implicit Volume Compression
Danhang "Danny" Tang
Phil Chou
Christian Haene
Mingsong Dou
Jonathan Taylor
Shahram Izadi
Sofien Bouaziz
Cem Keskin
CVPR (2020)
Preview abstract
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in voxel grids and their corresponding textures. To compress the TSDF our method relies on a block-based neural architecture trained end-to-end achieving state-of-the-art compression rates. To prevent topological errors we losslessly compress the signs of the TSDF which also as a side effect bounds the maximum reconstruction error by the voxel size. To compress the affiliated texture we designed a fast block-base charting and Morton packing technique generating a coherent image that can be efficiently compressed using existing image-based compression algorithms. We demonstrate the performance of our algorithms on a large set of 4D performance sequences captured using multi-camera RGBD setups.
View details
Depth from motion for smartphone AR
Julien Valentin
Neal Wadhwa
Max Dzitsiuk
Michael John Schoenberg
Vivek Verma
Ambrus Csaszar
Ivan Dryanovski
Joao Afonso
Jose Pascoal
Konstantine Nicholas John Tsotsos
Mira Angela Leung
Mirko Schmidt
Sameh Khamis
Vladimir Tankovich
Shahram Izadi
Christoph Rhemann
ACM Transactions on Graphics (2018)
Preview abstract
Augmented reality (AR) for smartphones has matured from a technology for earlier adopters, available only on select high-end phones, to one that is truly available to the general public. One of the key breakthroughs has been in low-compute methods for six degree of freedom (6DoF) tracking on phones using only the existing hardware (camera and inertial sensors). 6DoF tracking is the cornerstone of smartphone AR allowing virtual content to be precisely locked on top of the real world. However, to really give users the impression of believable AR, one requires mobile depth. Without depth, even simple effects such as a virtual object being correctly occluded by the real-world is impossible. However, requiring a mobile depth sensor would severely restrict the access to such features. In this article, we provide a novel pipeline for mobile depth that supports a wide array of mobile phones, and uses only the existing monocular color sensor. Through several technical contributions, we provide the ability to compute low latency dense depth maps using only a single CPU core of a wide range of (medium-high) mobile phones. We demonstrate the capabilities of our approach on high-level AR applications including real-time navigation and shopping.
View details
Fast Lifting for 3D Hand Pose Estimation in AR/VR Applications
Christine Kaeser-Chen
IEEE International Conference on Image Processing, 2018 (2018)
Preview abstract
We introduce a simple model for the human hand skeleton that is geared toward estimating 3D hand poses from 2D keypoints. The estimation problem arises in AR/VR scenarios where low-cost cameras are used to generate 2D views through which rich interactions with the world are desired. Starting with a noisy set of 2D hand keypoints (camera-plane coordinates of detected joints of the hand), the proposed algorithm generates 3D keypoints that are (i) compliant with human hand skeleton constraints and (ii) perspective-project down to the given 2D keypoints. Our work considers the 2D to 3D lifting problem algebraically, identifies the parts of the hand that can be lifted accurately, points out the parts that may lead to ambiguities, and proposes remedies for ambiguous cases. Most importantly, we show that the finger-tip localization errors are a good proxy for the errors at other finger joints. This observation leads to a look-up-table-based formulation that instantaneously determines finger poses without solving constrained trigonometric problems. The result is a fast algorithm running super real-time on a single core. When hand bone-lengths are unknown our technique estimates these and allows smooth AR/VR sessions where a user's hand is automatically estimated in the beginning and the rest of the session seamlessly continued. Our work provides accurate 3D results that are competitive with the state-of-the-art without requiring any 3D training data.
View details