Yeqing Li
Research Areas
Authored Publications
Sort By
High Resolution Medical Image Analysis with Spatial Partitioning
Le Hou
Niki J. Parmar
Noam Shazeer
Xiaodan Song
Youlong Cheng
High Resolution Medical Image Analysis with Spatial Partitioning (2019)
Preview abstract
Medical images such as 3D computerized tomography (CT) scans, have a typical resolution of 512×512×512 voxels, three orders of magnitude more pixel data than ImageNet images. It is impossible to train CNN models directly on such high resolution images, because feature maps of a single image do not fit in the memory of single GPU/TPU. Existing image analysis approaches alleviate this problem by dividing (e.g. taking 2D slices of 3D scans) or down-sampling input images, which leads to complicated implementation and sub-optimal performance due to information loss. In this paper, we implement spatial partitioning, which internally distributes input and output of convolution operations across GPUs/TPUs. Our implementation is based on the Mesh-TensorFlow framework and is transparent to end users. To the best of our knowledge, this is the first work on training networks on 512×512×512 resolution CT scans end-to-end, without significant computational overhead.
View details
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Carl Martin Vondrick
Jitendra Malik
CVPR (2018)
Preview abstract
This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly.
AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.
View details
Guided Attention for Large Scale Scene Text Verification
Dafang He
Alex Gorban
Derrall Heath
Julian Ibarz
Qian Yu
Daniel Kifer
C. Lee Giles
arXiv (2018)
Attention-based Extraction of Structured Information from Street View Imagery
Zbigniew Wojna
Alex Gorban
Dar-Shyang Lee
Qian Yu
Julian Ibarz
ICDAR (2017), pp. 8
Preview abstract
We present a neural network model, based on
CNNs, RNNs and attention mechanisms, which achieves 84.04%
accuracy on the challenging French Street Name Signs (FSNS)
dataset, significantly outperforming the previous state of the
art (Smith’16), which achieved 72.46%. Furthermore, our new
method is much simpler and more general than the previous
approach. To demonstrate the generality of our model, we also
apply it to two datasets, derived from Google Street View, in
which the goal is to extract business names from store fronts,
and extract structured date/time information from parking signs.
Finally, we study the speed/accuracy tradeoff that results from
cutting pretrained inception CNNs at different depths and using
them as feature extractors for the attention mechanism. The
resulting model is not only accurate but efficient, allowing it
to be used at scale on a variety of challenging real-world text
extraction problems.
View details