Face Tracking and Recognition with Visual Constraints in Real-World Videos
Abstract
We address the problem of tracking and recognizing
faces in real-world, noisy videos. We track faces using
a tracker that adaptively builds a target model reflecting
changes in appearance, typical of a video setting. However,
adaptive appearance trackers often suffer from drift, a gradual
adaptation of the tracker to non-targets. To alleviate this
problem, our tracker introduces visual constraints using a
combination of generative and discriminative models in a
particle filtering framework. The generative term conforms
the particles to the space of generic face poses while the discriminative
one ensures rejection of poorly aligned targets.
This leads to a tracker that significantly improves robustness
against abrupt appearance changes and occlusions,
critical for the subsequent recognition phase. Identity of the
tracked subject is established by fusing pose-discriminant
and person-discriminant features over the duration of a
video sequence. This leads to a robust video-based face recognizer
with state-of-the-art recognition performance. We
test the quality of tracking and face recognition on realworld
noisy videos from YouTube as well as the standard
Honda/UCSD database. Our approach produces successful
face tracking results on over 80% of all videos without
video or person-specific parameter tuning. The good tracking
performance induces similarly high recognition rates:
100% on Honda/UCSD and over 70% on the YouTube set
containing 35 celebrities in 1500 sequences.
faces in real-world, noisy videos. We track faces using
a tracker that adaptively builds a target model reflecting
changes in appearance, typical of a video setting. However,
adaptive appearance trackers often suffer from drift, a gradual
adaptation of the tracker to non-targets. To alleviate this
problem, our tracker introduces visual constraints using a
combination of generative and discriminative models in a
particle filtering framework. The generative term conforms
the particles to the space of generic face poses while the discriminative
one ensures rejection of poorly aligned targets.
This leads to a tracker that significantly improves robustness
against abrupt appearance changes and occlusions,
critical for the subsequent recognition phase. Identity of the
tracked subject is established by fusing pose-discriminant
and person-discriminant features over the duration of a
video sequence. This leads to a robust video-based face recognizer
with state-of-the-art recognition performance. We
test the quality of tracking and face recognition on realworld
noisy videos from YouTube as well as the standard
Honda/UCSD database. Our approach produces successful
face tracking results on over 80% of all videos without
video or person-specific parameter tuning. The good tracking
performance induces similarly high recognition rates:
100% on Honda/UCSD and over 70% on the YouTube set
containing 35 celebrities in 1500 sequences.