Jyrki Alakuijala
Dr. Jyrki Alakuijala is an active member of the open source software community, and a data compression researcher. Jyrki works at Google as a Technical Lead/Manager, and his recent published work has been with JPEG XL, Jpegli, Zopfli, Butteraugli, Guetzli, Pik, Brunsli, Gipfeli, WebP lossless, and Brotli compression formats and algorithms, and two hashing algorithms, CityHash and HighwayHash. Before his Google employment he developed software for neurosurgery and radiation therapy treatment planning.
Authored Publications
Sort By
CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems
Ani Udipi
JunSun Choi
Joonho Whangbo
Jerry Zhao
Edwin Lim
Vrishab Madduri
Yakun Sophia Shao
Borivoje Nikolic
Krste Asanovic
Proceedings of the 50th Annual International Symposium on Computer Architecture, Association for Computing Machinery, New York, NY, USA (2023)
Preview abstract
General-purpose lossless data compression and decompression ("(de)compression") are used widely in hyperscale systems and are key "datacenter taxes". However, designing optimal hardware compression and decompression processing units ("CDPUs") is challenging due to the variety of algorithms deployed, input data characteristics, and evolving costs of CPU cycles, network bandwidth, and memory/storage capacities.
To navigate this vast design space, we present the first large-scale data-driven analysis of (de)compression usage at a major cloud provider by profiling Google's datacenter fleet. We find that (de)compression consumes 2.9% of fleet CPU cycles and 10-50% of cycles in key services. Demand is also artificially limited; 95% of bytes compressed in the fleet use less capable algorithms to reduce compute, motivating a CDPU that changes cost vs. size tradeoffs.
Prior work has improved the microarchitectural state-of-the-art for CDPUs supporting various algorithms in fixed contexts. However, we find that higher-level design parameters like CDPU placement, hash table sizing, history window sizes, and more have as significant of an impact on the viability of CDPU integration, but are not well-studied. Thus, we present the first end-to-end design/evaluation framework for CDPUs, including: 1. An open-source RTL-based CDPU generator that supports many run-time and compile-time parameters. 2. Integration into an open-source RISC-V SoC for rapid performance and silicon area evaluation across CDPU placements and parameters. 3. An open-source (de)compression benchmark, HyperCompressBench, that is representative of (de)compression usage in Google's fleet.
Using our framework, we perform an extensive design space exploration running HyperCompressBench. Our exploration spans a 46× range in CDPU speedup, 3× range in silicon area (for a single pipeline), and evaluates a variety of CDPU integration techniques to optimize CDPU designs for hyperscale contexts. Our final hyperscale-optimized CDPU instances are up to 10× to 16× faster than a single Xeon core, while consuming a small fraction (as little as 2.4% to 4.7%) of the area.
View details
Preview abstract
Spiking neural networks with temporal coding schemes process information based on the relative timing of neuronal spikes. In supervised learning tasks, temporal coding allows learning through backpropagation with exact derivatives, and achieves accuracies on par with conventional artificial neural networks. Here we introduce spiking autoencoders with temporal coding and pulses, trained using backpropagation to store and reconstruct images with high fidelity from compact representations. We explore the effect of different spike time target latencies, data noise levels and embedding sizes, as well as classification performance from the embeddings. The spiking autoencoder performs similarly to conventional autoencoders and exceeds their reconstruction performance on inverted-brightness images. We find that inhibition is essential in the functioning of the spiking autoencoders, particularly when the input needs to be memorised for a longer time before the expected output spike times. To reconstruct images with a high target latency, the network learns to accumulate negative evidence and to use the pulses as excitatory triggers for producing the output spikes at the required times. Our results highlight the potential of spiking autoencoders as building blocks for more complex biologically-inspired architectures.
View details
Benchmarking JPEG XL lossy/lossless image compression
Evgenii Kliuchnikov
Evgeniy Upenik
Jon Sneyers
Luca Versari
Touradj Ebrahimi
Optics, Photonics and Digital Technologies for Imaging Applications VI, SPIE (2020)
Preview abstract
JPEG XL is a practical, royalty-free codec for scalable web distribution and efficient compression of high-quality photographs. It also includes previews, progression, animation, transparency, wide gamut, and high bit depth.
Experiments performed during standardization have shown the feasibility of economical storage without perceptive quality loss, lossless recompression of existing JPEG, and fast software encoders and decoders. We disclose the results of subjective and objective evaluations.
Users expect faithful reproductions of ever-larger images. JPEG XL is faster to share and more economical to store: 60% savings vs. JPEG at equivalent visual quality. We quantify this impact using a subjective evaluation versus existing codecs including HEVC-HM-YUV444 and JPEG.
New image codecs have to co-exist with the previous generation for several years. JPEG XL is unique in providing value for both existing JPEGs as well as new users. It includes coding tools to reduce the transmission and storage costs of JPEG by 22% while allowing byte-for-byte exact reconstruction of the original JPEG. Avoiding transcoding and additional artifacts helps to preserve our digital heritage.
Applications require fast and low-power decoding. JPEG XL was designed to benefit from multicore and SIMD, and actually decodes faster than JPEG. We report the resulting speeds on ARM and x86 CPUs. To enable reproduction of these results, we open sourced the JPEG XL software in 2019.
View details
JPEG XL next-generation image compression architecture and coding tools
Ruud van Asseldonk
Moritz Firsching
Thomas Fischbacher
Sebastian Gomez
Evgenii Kliuchnikov
Robert Obryk
Krzysztof Potempa
Alexander Rhatushnyak
Jon Sneyers
Zoltan Szabadka
Luca Versari
SPIE Applications of Digital Image Processing, SPIE (2019)
Preview abstract
An update on the JPEG XL standardization effort: JPEG XL is a practical approach focused on scalable web distribution and efficient compression of high-quality images. It will provide various benefits compared to existing image formats: significantly smaller size at equivalent subjective quality; fast, parallelizable decoding and encoding configurations; features such as progressive, lossless, animation, and reversible transcoding of existing JPEG; support for high-quality applications including wide gamut, higher resolution/bit depth/dynamic range, and visually lossless coding. Additionally, a royalty-free baseline is an important goal. The JPEG XL architecture is traditional block-transform coding with upgrades to each component. We describe these components and analyze decoded image quality.
View details
Temporal coding in spiking neural networks with alpha synaptic function
Krzysztof Potempa
Luca Versari
Thomas Fischbacher
arXiv:1907.13223 (2019)
Preview abstract
The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic dimension of temporal coding present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. An image can be encoded in this manner by an input layer where each neuron spikes at a time proportional to the brightness of an individual pixel. In classification tasks, the output of the network is indicated by the first neuron to spike in the output layer. By encoding information in time in this manner, we are able to train the network to perform supervised learning with backpropagation, using exact derivatives of the postsynaptic spike times with respect to presynaptic spike times. The network operates using a biologically-plausible alpha synaptic transfer function. Additionally, we use trainable synchronisation pulses that provide bias, add more flexibility during the training process and allow the exploitation of the decay part of the alpha function. We show that such spiking networks can be trained successfully on noisy temporal Boolean logic problems. Moreover, they perform better than comparable spiking models on the MNIST benchmark when encoded in time. During training, we find that the network spontaneously discovers two operating regimes: a slow regime, where a decision is taken after all hidden neurons have spiked and the accuracy is very high, and a fast regime, where a decision is taken very fast but the accuracy is lower. These results demonstrate the computational power of spiking networks with biological characteristics that encode information in the timing of individual neurons. By studying temporal coding in spiking networks, we aim to create building blocks towards energy-efficient, state-based and more complex biologically-inspired neural architectures.
View details
Brotli: A General-Purpose Data Compressor
Andrea Farruggia
Paolo Ferragina
Evgenii Kliuchnikov
Robert Obryk
Zoltan Szabadka
ACM Transactions on Information Systems (2019)
Preview abstract
Brotli is an open source general-purpose data compressor introduced by Google in late 2013 and now adopted in most known browsers and Web servers. It is publicly available on GitHub and its data format was submitted as RFC 7932 in July 2016. Brotli is based on the Lempel-Ziv compression scheme and planned as a generic replacement of Gzip and ZLib. The main goal in its design was to compress data on the Internet, which meant optimizing the resources used at decoding time, while achieving maximal compression density.
This article is intended to provide the first thorough, systematic description of the Brotli format as well as a detailed computational and experimental analysis of the main algorithmic blocks underlying the current encoder implementation, together with a comparison against compressors of different families constituting the state-of-the-art either in practice or in theory. This treatment will allow us to raise a set of new algorithmic and software engineering problems that deserve further attention from the scientific community.
View details
Preview abstract
Algorithms that rely on a pseudorandom number generator often lose their performance guarantees when adversaries can predict the behavior of the generator. To protect non-cryptographic applications against such attacks, we propose 'strong' pseudorandom generators characterized by two properties: computationally indistinguishable from random and backtracking-resistant. Some existing cryptographically secure generators also meet these criteria, but they are too slow to be accepted for general-purpose use. We introduce a new open-sourced generator called 'Randen' and show that it is 'strong' in addition to outperforming Mersenne Twister, PCG, ChaCha8, ISAAC and Philox in real-world benchmarks. This is made possible by hardware acceleration. Randen is an instantiation of Reverie, a recently published robust sponge-like random generator, with a new permutation built from an improved generalized Feistel structure with 16 branches. We provide new bounds on active s-boxes for up to 24 rounds of this construction, made possible by a memory-efficient search algorithm. Replacing existing generators with Randen can protect randomized algorithms such as reservoir sampling from attack. The permutation may also be useful for wide-block ciphers and hashing functions.
View details
Guetzli: Perceptually Guided JPEG Encoder
Robert Obryk
Ostap Stoliarchuk
Zoltan Szabadka
arXiv (2017)
Preview abstract
Guetzli is a new JPEG encoder that aims to produce visually indistinguishable images at a lower bit-rate than other common JPEG encoders. It optimizes both the JPEG global quantization tables and the DCT coefficient values in each JPEG block using a closed-loop optimizer. Guetzli uses Butteraugli, our perceptual distance metric, as the source of feedback in its optimization process. We reach a 29-45% reduction in data size for a given perceptual distance, according to Butteraugli, in comparison to other compressors we tried. Guetzli's computation is currently extremely slow, which limits its applicability to compressing static content and serving as a proof- of-concept that we can achieve significant reductions in size by combining advanced psychovisual models with lossy compression techniques.
View details
Preview abstract
HighwayHash is a new pseudo-random function based on AVX2 multiply and permute instructions for thorough and fast hashing. It is 5.2 times as fast as SipHash for 1 KiB inputs. An open-source implementation is available under a permissive license. We discuss design choices and provide statistical analysis, speed measurements and preliminary cryptanalysis. Assuming it withstands further analysis, strengthened variants may also substantially accelerate file checksums and stream ciphers.
View details