Berivan Isik

Berivan Isik

Berivan Isik is a research scientist at Google, focusing on developing efficient and trustworthy large models. She earned her PhD from Stanford University in 2024. See her personal page for more up-to-date information.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers
    Phil A. Chou
    Hugues Hoppe
    Danhang Tang
    Jonathan Taylor
    Philip Davidson
    arXiv:2402.05887 (2024)
    Preview abstract We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec’s performance on its intended content, it can effectively adapt the codec to other types of image/video content and to other distortion measures. Essentially, the sandwich learns to transmit “neural code images” that optimize overall rate-distortion performance even when the overall problem is well outside the scope of the codec’s design. Through a variety of examples, we apply the sandwich architecture to sources with different numbers of channels, higher resolution, higher dynamic range, and perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 3 adaptations. We derive VQ equivalents for the sandwich, establish optimality properties, and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in image/video compression and streaming. View details
    Sketching for Distributed Deep Learning: A Sharper Analysis
    Mayank Shrivastava
    Qiaobo Li
    Sanmi Koyejo
    Arindam Banerjee
    Conference on Neural Information Processing Systems (NeurIPS) (2024)
    Preview abstract The high communication cost between the server and the clients is a significant bottleneck in scaling distributed learning for overparametrized deep models. One popular approach for reducing this communication overhead is randomized sketching. However, existing theoretical analyses for sketching-based distributed learning (sketch-DL) either incur a prohibitive dependence on the ambient dimension or need additional restrictive assumptions such as heavy-hitters. Nevertheless, despite existing pessimistic analyses, empirical evidence suggests that sketch-DL is competitive with its uncompressed counterpart -- thus motivating a sharper analysis. In this work, we introduce a sharper ambient dimension-independent convergence analysis for sketch-DL using the second-order geometry specified by the loss Hessian. Our results imply ambient dimension-independent communication complexity for sketch-DL. We present empirical results both on the loss Hessian and overall accuracy of sketch-DL supporting our theoretical results. Taken together, our results provide theoretical justification for the observed empirical success of sketch-DL. View details
    Preview abstract We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers. In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks. View details
    Sandwiched Video Compression: An Efficient Learned Video Compression Approach
    Danhang "Danny" Tang
    Jonathan Taylor
    Phil Chou
    IEEE International Conference on Image Processing (2023)
    Preview abstract We propose sandwiched video compression – a video compressionframework that wraps neural networks around a standard video codec.The framework consists of a neural pre-processor, a neural post-processor, and a standard video codec between them, trained jointlyto optimize a rate-distortion loss function. Training such a frameworkend-to-end requires a differentiable proxy for the standard videocodec, which is significantly more challenging than designing imagecodec proxies due to temporal processing such as motion prediction,inter/intra mode decisions, and in-loop filtering. In this work, wepropose a computationally efficient way of approximating a videocodec and demonstrate that the neural codes generated by the neuralpre-processor can be compressed to a better rate-distortion point thanthe original frames in the input video. More precisely, sandwichedHEVC YUV 4:4:4 in low-resolution mode and sandwiched HEVCYUV 4:0:0 show around 6.5 dB and 8 dB improvements over thestandard HEVC in the same mode and format, respectively. Moreover,when optimized for and tested with a perceptual similarity metric,Learned Perceptual Image Patch Similarity (LPIPS), we observe30%to40%improvement over the standard HEVC YUV 4:4:4, dependingon the rate. View details
    Preview abstract We propose the first learned compression framework, LVAC, for volumetric functions represented by implicit networks -- a.k.a. coordinate-based networks (CBNs). In order to evaluate LVAC and compare it with prior (traditional) methods, we specifically focus on compressing point cloud attributes since there are no compression baselines for other signals' CBN-based representations. LVAC serves as the first baseline for them. More concretely, we consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We represent the volumetric function by shifts of a CBN, or implicit neural network. Inputs to the network include both spatial coordinates and a latent vector per shift. To compress the latent vectors, we perform an end-to-end training of the overall pipeline where the latent vectors are rate-distortion optimized by back-propagation through a rate-distortion Lagrangian loss in an auto-decoder configuration. The result outperforms the current standard, RAHT, by 2--4 dB. View details