Shumin Zhai

Shumin Zhai

Shumin Zhai is a Human-Computer Interaction research scientist at Google where he leads and directs research, design, and development of input methods and haptics systems on Google’s and its partner’s flagship products. His research career has contributed to foundational models and understandings of human-computer interaction as well as practical user interface inventions and products based on his scientific and technical insights. He originated and led the SHARK/ShapeWriter project at IBM Research and a start-up company that pioneered the touchscreen word-gesture keyboard paradigm, filing the first patents of this paradigm, publishing the first generation of scientific papers, releasing the first word-gesture keyboard in 2004 and a top ranked (6th) iPhone app called ShapeWriter WritingPad in 2008. His publications have won the ACM UIST Lasting Impact Award and a IEEE Computer Society Best Paper Award, among others. He served as the 4th Editor-in-Chief of ACM Transactions on Computer-Human Interaction, and frequently contributes to other academic boards and program committees. He received his Ph.D. degree at the University of Toronto in 1995. In 2006, he was selected as one of ACM's inaugural class of Distinguished Scientists. In 2010 he was named Member of the CHI Academy and Fellow of the ACM.

His external web page is at www.shuminzhai.com.

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    TapNet: The Design, Training, Implementation, and Applications of a Multi-Task Learning CNN for Off-Screen Mobile Input
    Michael Xuelin Huang
    Nazneen Nazneen
    Alex Chao
    ACM CHI Conference on Human Factors in Computing Systems, ACM (2021)
    Preview abstract Off-screen interaction offers great potential for one-handed and eyes-free mobile interaction. While a few existing studies have explored the built-in mobile phone sensors to sense off-screen signals, none met practical requirement. This paper discusses the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone using built-in accelerometer and gyroscope. With sensor location as auxiliary information, TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location. We developed four datasets consisting of over 180K training samples, 38K testing samples, and 87 participants in total. Experimental evaluation demonstrated the effectiveness of the TapNet design and its significant improvement over the state of the art. Along with the datasets, codebase, and extensive experiments, TapNet establishes a new technical foundation for off-screen mobile input. View details
    Active Edge: Designing Squeeze Gestures for the Google Pixel 2
    Claire Lee
    Melissa Barnhart
    Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, 274:1-274:13
    Preview abstract Active Edge is a feature of Google Pixel 2 smartphone devices that creates a force-sensitive interaction surface along their sides, allowing users to perform gestures by holding and squeezing their device. Supported by strain gauge elements adhered to the inner sidewalls of the device chassis, these gestures can be more natural and ergonomic than on-screen (touch) counterparts. Developing these interactions is an integration of several components: (1) an insight and understanding of the user experiences that benefit from squeeze gestures; (2) hardware with the sensitivity and reliability to sense a user's squeeze in any operating environment; (3) a gesture design that discriminates intentional squeezes from innocuous handling; and (4) an interaction design to promote a discoverable and satisfying user experience. This paper describes the design and evaluation of Active Edge in these areas as part of the product's development and engineering. View details
    i’sFree: Eyes-Free Gesture Typing via a Touch-Enabled Remote Control
    Suwen Zhu
    Xiaojun Bi
    Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 448:1-448:12 (to appear)
    Preview abstract Entering text without having to pay attention to the keyboard is compelling but challenging due to the lack of visual guidance. We propose i'sFree to enable eyes-free gesture typing on a distant display from a touch-enabled remote control. i'sFree does not display the keyboard or gesture trace but decodes gestures drawn on the remote control into text according to an invisible and shifting Qwerty layout. i'sFree decodes gestures similar to a general gesture typing decoder, but learns from the instantaneous and historical input gestures to dynamically adjust the keyboard location. We designed it based on the understanding of how users perform eyes-free gesture typing. Our evaluation shows eyes-free gesture typing is feasible: reducing visual guidance on the distant display hardly affects the typing speed. Results also show that the i’sFree gesture decoding algorithm is effective, enabling an input speed of 23 WPM, 46% faster than the baseline eyes-free condition built on a general gesture decoder. Finally, i'sFree is easy to learn: participants reached 22 WPM in the first ten minutes, even though 40% of them were first-time gesture typing users. View details
    Modeling Gesture-Typing Movements
    Human-Computer Interaction, 33 (2018), pp. 234-280
    Preview abstract Word–Gesture keyboards allow users to enter text using continuous input strokes (also known as gesture typing or shape writing). We developed a production model of gesture typing input based on a human motor control theory of optimal control (specifically, modeling human drawing movements as a minimization of jerk—the third derivative of position). In contrast to existing models, which consider gestural input as a series of concatenated aiming movements and predict a user’s time performance, this descriptive theory of human motor control predicts the shapes and trajectories that users will draw. The theory is supported by an analysis of user-produced gestures that found qualitative and quantitative agreement between the shapes users drew and the minimum jerk theory of motor control. Furthermore, by using a small number of statistical via-points whose distributions reflect the sensorimotor noise and speed–accuracy trade-off in gesture typing, we developed a model of gesture production that can predict realistic gesture trajectories for arbitrary text input tasks. The model accurately reflects features in the figural shapes and dynamics observed from users and can be used to improve the design and evaluation of gestural input systems. View details
    M3 Gesture Menu: Design and Experimental Analyses of Marking Menus for Touchscreen Mobile Interaction
    Kun Li
    Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 249:1-249:14
    Preview abstract Despite their learning advantages in theory, marking menus have faced adoption challenges in practice, even on today's touchscreen-based mobile devices. We address these challenges by designing, implementing, and evaluating multiple versions of M3 Gesture Menu (M3), a reimagination of marking menus targeted at mobile interfaces. M3 is defined on a grid rather than in a radial space, relies on gestural shapes rather than directional marks, and has constant and stationary space use. Our first controlled experiment on expert performance showed M3 was faster and less error-prone by a factor of two than traditional marking menus. A second experiment on learning demonstrated for the first time that users could successfully transition to recall-based execution of a dozen commands after three ten-minute practice sessions with both M3 and Multi-Stroke Marking Menu. Together, M3, with its demonstrated resolution, learning, and space use benefits, contributes to the design and understanding of menu selection in the mobile-first era of end-user computing. View details
    A Cost–Benefit Study of Text Entry Suggestion Interaction
    Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, pp. 83-88
    Preview abstract Mobile keyboards often present error corrections and word completions (suggestions) as candidates for anticipated user input. However, these suggestions are not cognitively free: they require users to attend, evaluate, and act upon them. To understand this trade-off between suggestion savings and interaction costs, we conducted a text transcription experiment that controlled interface assertiveness: the tendency for an interface to present itself. Suggestions were either always present (extraverted), never present (introverted), or gated by a probability threshold (ambiverted). Results showed that although increasing the assertiveness of suggestions reduced the number of keyboard actions to enter text and was subjectively preferred, the costs of attending to and using the suggestions impaired average time performance. View details
    Long-Short Term Memory Neural Network for Keyboard Gesture Recognition
    Thomas Breuel
    Johan Schalkwyk
    International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
    Preview
    Effects of Language Modeling and its Personalization on Touchscreen Typing Performance
    Andrew Fowler
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2015), ACM, New York, NY, USA, pp. 649-658
    Preview abstract Modern smartphones correct typing errors and learn userspecific words (such as proper names). Both techniques are useful, yet little has been published about their technical specifics and concrete benefits. One reason is that typing accuracy is difficult to measure empirically on a large scale. We describe a closed-loop, smart touch keyboard (STK) evaluation system that we have implemented to solve this problem. It includes a principled typing simulator for generating human-like noisy touch input, a simple-yet-effective decoder for reconstructing typed words from such spatial data, a large web-scale background language model (LM), and a method for incorporating LM personalization. Using the Enron email corpus as a personalization test set, we show for the first time at this scale that a combined spatial/language model reduces word error rate from a pre-model baseline of 38.4% down to 5.7%, and that LM personalization can improve this further to 4.6%. View details
    Optimizing Touchscreen Keyboards for Gesture Typing
    Brian Smith
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2015), ACM, New York, NY, USA, pp. 3365-3374
    Preview abstract Despite its growing popularity, gesture typing suffers from a major problem not present in touch typing: gesture ambiguity on the Qwerty keyboard. By applying rigorous mathematical optimization methods, this paper systematically investigates the optimization space related to the accuracy, speed, and Qwerty similarity of a gesture typing keyboard. Our investigation shows that optimizing the layout for gesture clarity (a metric measuring how unique word gestures are on a keyboard) drastically improves the accuracy of gesture typing. Moreover, if we also accommodate gesture speed, or both gesture speed and Qwerty similarity, we can still reduce error rates by 52% and 37% over Qwerty, respectively. In addition to investigating the optimization space, this work contributes a set of optimized layouts such as GK-D and GK-T that can immediately benefit mobile device users. View details
    Both Complete and Correct? Multi-Objective Optimization of Touchscreen Keyboard
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2014), ACM, New York, NY, USA, pp. 2297-2306
    Preview