Jump to Content
Onur G. Guleryuz

Onur G. Guleryuz

Onur G. Guleryuz is a Software Engineer at Google working on machine learning and computer vision problems with applications in augmented and virtual reality. Prior to Google he worked at LG Electronics, Futurewei, NTT DoCoMo, and Seiko-Epson all in Silicon Valley. Before coming to Silicon Valley in 2000 he served as an Asst. Prof. with NYU Tandon School of Engineering in New York. His research interests include topics in machine learning, signal processing, computer vision, and information theory. He has served in numerous panels, conference committees, and media-related industry standardization bodies. He has authored an extensive number of refereed papers, granted US patents, and has leading edge contributions to products ranging from mobile phones to displays and printers. He has been an active member of IEEE Signal Processing Society, having served as chair of the IEEE Signal Processing Society Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP TC) and as a member of the IEEE Signal Processing Society Multimedia Signal Processing Technical Committee (MMSP TC) . He received the BS degrees in electrical engineering and physics from Bogazici University, Istanbul, Turkey in 1991, the M.S. degree in engineering and applied science from Yale University, New Haven, CT in 1992, and the Ph.D. degree in electrical engineering from University of Illinois at Urbana-Champaign (UIUC), Urbana, in 1997. He received the National Science Foundation Career Award, the IEEE Signal Processing Society Best Paper Award, the IEEE International Conference on Image Processing Best Paper Award, the Seiko-Epson Corporation President's Award for Research and Development, and the DoCoMo Communications Laboratories President's Award for Research. He is a Fellow of IEEE. Further information including patent and publication details can be found here.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers
    Phil A. Chou
    Berivan Isik
    Hugues Hoppe
    Danhang Tang
    Jonathan Taylor
    Philip Davidson
    arXiv:2402.05887 (2024)
    Preview abstract We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec’s performance on its intended content, it can effectively adapt the codec to other types of image/video content and to other distortion measures. Essentially, the sandwich learns to transmit “neural code images” that optimize overall rate-distortion performance even when the overall problem is well outside the scope of the codec’s design. Through a variety of examples, we apply the sandwich architecture to sources with different numbers of channels, higher resolution, higher dynamic range, and perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 3 adaptations. We derive VQ equivalents for the sandwich, establish optimality properties, and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in image/video compression and streaming. View details
    Sandwiched Image Compression: Increasing the resolution and dynamic range of standard codecs
    Phil Chou
    Hugues Hoppe
    Danhang "Danny" Tang
    Philip Davidson
    2022 Picture Coding Symposium (PCS), IEEE (to appear)
    Preview abstract Given a standard image codec, we compress images that may have higher resolution and/or higher bit depth than allowed in the codec's specifications, by sandwiching the standard codec between a neural pre-processor (before the standard encoder) and a neural post-processor (after the standard decoder). Using a differentiable proxy for the the standard codec, we design the neural pre- and post-processors to transport the high resolution (super-resolution, SR) or high bit depth (high dynamic range, HDR) images as lower resolution and lower bit depth images. The neural processors accomplish this with spatially coded modulation, which acts as watermarks to preserve the important image detail during compression. Experiments show that compared to conventional methods of transmitting high resolution or high bit depth through lower resolution or lower bit depth codecs, our sandwich architecture gains ~9 dB for SR images and ~3 dB for HDR images at the same rate over large test sets. We also observe significant gains in visual quality. View details
    Sandwiched Image Compression: Wrapping Neural Networks Around a Standard Codec
    Phil Chou
    Hugues Hoppe
    Danhang "Danny" Tang
    Philip Davidson
    2021 IEEE International Conference on Image Processing (ICIP), IEEE, Anchorage, Alaska, pp. 3757-3761
    Preview abstract We sandwich a standard image codec between two neural networks: a preprocessor that outputs neural codes, and a postprocessor that reconstructs the image. The neural codes are compressed as ordinary images by the standard codec. Using differentiable proxies for both rate and distortion, we develop a rate-distortion optimization framework that trains the networks to generate neural codes that are efficiently compressible as images. This architecture not only improves rate-distortion performance for ordinary RGB images, but also enables efficient compression of alternative image types (such as normal maps of computer graphics) using standard image codecs. Results demonstrate the effectiveness and flexibility of neural processing in mapping a variety of input data modalities to the rigid structure of standard codecs. A surprising result is that the rate-distortion-optimized neural processing seamlessly learns to transport color images using a single-channel (grayscale) codec. View details
    Deep Implicit Volume Compression
    Danhang "Danny" Tang
    Phil Chou
    Christian Haene
    Mingsong Dou
    Jonathan Taylor
    Shahram Izadi
    Sofien Bouaziz
    Cem Keskin
    CVPR (2020)
    Preview abstract We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in voxel grids and their corresponding textures. To compress the TSDF our method relies on a block-based neural architecture trained end-to-end achieving state-of-the-art compression rates. To prevent topological errors we losslessly compress the signs of the TSDF which also as a side effect bounds the maximum reconstruction error by the voxel size. To compress the affiliated texture we designed a fast block-base charting and Morton packing technique generating a coherent image that can be efficiently compressed using existing image-based compression algorithms. We demonstrate the performance of our algorithms on a large set of 4D performance sequences captured using multi-camera RGBD setups. View details
    Depth from motion for smartphone AR
    Julien Valentin
    Neal Wadhwa
    Max Dzitsiuk
    Michael John Schoenberg
    Vivek Verma
    Ambrus Csaszar
    Ivan Dryanovski
    Joao Afonso
    Jose Pascoal
    Konstantine Nicholas John Tsotsos
    Mira Angela Leung
    Mirko Schmidt
    Sameh Khamis
    Vladimir Tankovich
    Shahram Izadi
    Christoph Rhemann
    ACM Transactions on Graphics (2018)
    Preview abstract Augmented reality (AR) for smartphones has matured from a technology for earlier adopters, available only on select high-end phones, to one that is truly available to the general public. One of the key breakthroughs has been in low-compute methods for six degree of freedom (6DoF) tracking on phones using only the existing hardware (camera and inertial sensors). 6DoF tracking is the cornerstone of smartphone AR allowing virtual content to be precisely locked on top of the real world. However, to really give users the impression of believable AR, one requires mobile depth. Without depth, even simple effects such as a virtual object being correctly occluded by the real-world is impossible. However, requiring a mobile depth sensor would severely restrict the access to such features. In this article, we provide a novel pipeline for mobile depth that supports a wide array of mobile phones, and uses only the existing monocular color sensor. Through several technical contributions, we provide the ability to compute low latency dense depth maps using only a single CPU core of a wide range of (medium-high) mobile phones. We demonstrate the capabilities of our approach on high-level AR applications including real-time navigation and shopping. View details
    Fast Lifting for 3D Hand Pose Estimation in AR/VR Applications
    Christine Kaeser-Chen
    IEEE International Conference on Image Processing, 2018 (2018)
    Preview abstract We introduce a simple model for the human hand skeleton that is geared toward estimating 3D hand poses from 2D keypoints. The estimation problem arises in AR/VR scenarios where low-cost cameras are used to generate 2D views through which rich interactions with the world are desired. Starting with a noisy set of 2D hand keypoints (camera-plane coordinates of detected joints of the hand), the proposed algorithm generates 3D keypoints that are (i) compliant with human hand skeleton constraints and (ii) perspective-project down to the given 2D keypoints. Our work considers the 2D to 3D lifting problem algebraically, identifies the parts of the hand that can be lifted accurately, points out the parts that may lead to ambiguities, and proposes remedies for ambiguous cases. Most importantly, we show that the finger-tip localization errors are a good proxy for the errors at other finger joints. This observation leads to a look-up-table-based formulation that instantaneously determines finger poses without solving constrained trigonometric problems. The result is a fast algorithm running super real-time on a single core. When hand bone-lengths are unknown our technique estimates these and allows smooth AR/VR sessions where a user's hand is automatically estimated in the beginning and the rest of the session seamlessly continued. Our work provides accurate 3D results that are competitive with the state-of-the-art without requiring any 3D training data. View details
    No Results Found