Qifei Wang
Research Areas
Authored Publications
Sort By
BlazeStyleGAN: A Real-Time On-Device StyleGAN
Fei Deng
Lu Wang
Chuo-Ling Chang
Tingbo Hou
(2023)
Preview abstract
StyleGAN models have been widely adopted for generating and editing face images. Yet, few work investigated running StyleGAN models on mobile devices. In this work, we introduce BlazeStyleGAN --- to the best of our knowledge, the first StyleGAN model that can run in real-time on smartphones. We design an efficient synthesis network with the auxiliary head to convert features to RGB at each level of the generator, and only keep the last one at inference. We also improve the distillation strategy with a multi-scale perceptual loss using the auxiliary heads, and an adversarial loss for the student generator and discriminator. With these optimizations, BlazeStyleGAN can achieve real-time performance on high-end mobile GPUs. Experimental results demonstrate that BlazeStyleGAN generates high-quality face images and even mitigates some artifacts from the teacher model.
View details
Multi-path Neural Networks for On-device Multi-domain Visual Classification
Andrew Howard
Gabriel M. Bender
Grace Chu
Jeff Gilbert
Joshua Greaves
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021), pp. 3019-3028
Preview abstract
Learning multiple domains/tasks with a single model is important for improving data efficiency and lowering inference cost for numerous vision tasks, especially on resource-constrained mobile devices. However, hand-crafting a multi-domain/task model can be both tedious and challenging. This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices. The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space. An adaptive balanced domain prioritization algorithm is proposed to balance optimizing the joint model on multiple domains simultaneously. The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths. This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains. Extensive evaluations on the Visual Decathlon dataset demonstrate that the proposed multi-path model achieves state-of-the-art performance in terms of accuracy, model size, and FLOPS against other approaches using MobileNetV3-like architectures. Furthermore, the proposed method improves average accuracy over learning single-domain models individually, and reduces the total number of parameters and FLOPS by 78% and 32% respectively, compared to the approach that simply bundles single-domain models for multi-domain learning.
View details
Preview abstract
We study the effect of normalization on single domain generalization, the goal of which is to learn a model that performs well on many unseen domains with only single do-main data for training. We propose a new type of normalization, LSLR , that has an adaptive form that generalizes other normalizations. The key idea is to learn both the standardization and rescaling statistics for normalization with neural networks. This new normalization has better adaptivity and is capable of helping model generalize better for single domain generalization with a robust objective. Combined with adversarial domain augmentation methods, we can optimize the robust objective approximately. We show that our method consistently outperforms the baselines and achieves state-of-the-art results on three standard bench-marks for single domain generalization.
View details
MUSIQ: Multi-scale Image Quality Transformer
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Preview abstract
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ, and KonIQ-10k.
View details
Learnable Cost Volume Using the Cayley Representation
Taihong Xiao
Jinwei Yuan
Xin-Yu Zhang
Kehan Xu
The European Conference on Computer Vision (ECCV) (2020)
Preview abstract
Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors. However, the standard inner product in the commonly-used cost volume may limit the representation capacity of flow models because it neglects the correlation among different channel dimensions and weighs each dimension equally. To address this issue, we propose a learnable cost volume (LCV) using an elliptical inner product, which generalizes the standard inner product by a positive definite kernel matrix. To guarantee its positive definiteness, we perform spectral decomposition on the kernel matrix and re-parameterize it via the Cayley representation. The proposed LCV is a lightweight module and can be easily plugged into existing models to replace the vanilla cost volume. Experimental results show that the LCV module not only improves the accuracy of state-of-the-art models on standard benchmarks, but also promotes their robustness against illumination change, noises, and adversarial perturbations of the input signals.
View details