Michael Terry
Research Areas
Authored Publications
Sort By
"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
Michael Xieyang Liu
Frederick Liu
Alex Fiannaca
Terry Koo
In Extended Abstract in ACM CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024), pp. 9 (to appear)
Preview abstract
Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.
View details
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Michael Xieyang Liu
Krystal Kallarackal
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024)
Preview abstract
Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive workflows for users to understand when and why a model performs better or worse than a baseline model, and how the responses from two models are qualitatively different. We iteratively designed and developed the tool by closely working with researchers and engineers at Google. This paper details the user challenges we identified, the design and development of the tool, and an observational study with participants who regularly evaluate their models.
View details
Programming with a Programming Language: Challenges and Opportunities for Designing Developer Tools for Prompt Programming
Alex Fiannaca
Chinmay Kulkarni
Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA ’23), ACM, Hamburg, Germany (2023) (to appear)
Preview abstract
Existing tools for prompt programming provide little support to prompt programmers. Consequently, as prompts become more complex, they can be hard to read, understand, and edit. In this work, we draw on modern integrated development environments for traditional programming to improve the editor experience of prompt programming. We describe methods for understanding the semantically meaningful structure of natural language prompts in the absence of a rigid formal grammars, and demonstrate a range of editor features that can leverage this information to assist prompt programmers. Finally, we relate initial feedback from design probe explorations with a set of domain experts and provide insights to help guide the development of future prompt editors.
View details
PromptInfuser: Bringing User Interface Mock-ups to Life with Large Language Model Prompts
Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery (to appear)
Preview abstract
Large Language Models have enabled novices without machine learning (ML) experience to quickly prototype ML functionalities with prompt programming. This paper investigates incorporating prompt-based prototyping into designing functional user interface (UI) mock-ups. To understand how infusing LLM prompts into UI mock-ups might affect the prototyping process, we conduct a exploratory study with five designers, and find that this capability might significantly speed up creating functional prototypes, inform designers earlier on how their designs will integrate ML, and enable user studies with functional prototypes earlier. From these findings, we built PromptInfuser, a Figma plugin for authoring LLM-infused mock-ups. PromptInfuser introduces two novel LLM-interactions: input-output, which makes content interactive and dynamic, and frame-change, which directs users to different frames depending on their natural language input. From initial observations, we find that PromptInfuser has the potential to transform the design process by tightly integrating UI and AI prototyping in a single interface.
View details
“The less I type, the better”: How AI Language Models can Enhance or Impede Communication for AAC Users
Stephanie Valencia
Richard Cave
Krystal Kallarackal
Katie Seaver
ACM Conference on Human Factors in Computing Systems (ACM CHI) 2023, ACM (2023) (to appear)
Preview abstract
Users of augmentative and alternative communication (AAC) devices sometimes find it difficult to communicate in real time with others due to the time it takes to compose messages. AI technologies such as large language models (LLMs) provide an opportunity to support AAC users by improving the quality and variety of text suggestions. However, these technologies may fundamentally change how users interact with AAC devices as users transition from typing their own phrases to prompting and selecting AI-generated phrases. We conducted a study in which 12 AAC users tested live suggestions from a language model across three usage scenarios: extending short replies, answering biographical questions, and requesting assistance. Our study participants believed that AI-generated phrases could save time, physical and cognitive effort when communicating, but felt it was important that these phrases reflect their own communication style and preferences. This work identifies opportunities and challenges for future AI-enhanced AAC devices.
View details
Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges
Qiaosi Wang
Michael Adam Madaio
Shivani Kapania
Lauren Wilcox
ACM Conference on Human Factors in Computing Systems (ACM CHI) 2023, ACM (2023)
Preview abstract
The shift towards Responsible AI (RAI) in the tech industry necessitates new practices and adaptations to roles. To understand practices at the intersection of user experience (UX) and RAI, we conducted an interview study with industrial UX practitioners and RAI subject matter experts, both of whom are actively involved in addressing RAI concerns, both early in and throughout the development of new AI-based prototypes, demos, and products. Many of the specific practices and their associated challenges have yet to be surfaced, and distilling them offers a critical view into how practitioners' roles are adapting to meet present-day RAI challenges. We present and discuss three emerging practices in which RAI is being enacted and reified in UX work. We conclude by arguing that the emerging practices, goals, and types of expertise that surfaced in our study point to an evolution in praxis that suggests important areas for further research in HCI.
View details
The Prompt Artists
Stefania Druga
Alex Fiannaca
Pedro Vergani
Chinmay Kulkarni
Creativity and Cognition 2023 (2023)
Preview abstract
In this paper, we present the results of a study examining the art practices, artwork, and motivations of prolific users of the latest generation of text-to-image models. Through interviews, observations, and a survey, we present a sampling of the artistic styles, and describe the developed community of practice. We find that: 1) the text prompt and resulting image collectively can be considered the art piece (prompts as art), and 2) prompt templates (prompts with “slots” for others to fill in with their own words) are developed to create generative art pieces. We also find that this community’s premium on unique outputs leads to artists seeking specialized vocabulary to produce distinctive art pieces (e.g., by going to architectural blogs), while others look for “glitches” in the model that can turn into artistic styles in their own right. From these findings, we outline specific implications for design.
View details
The Design Space of Generative Models
Jess Scon Holbrook
Chinmay Kulkarni
NeurIPS 2022 Human-Centered AI Workshop (2022) (to appear)
Preview abstract
Card et al.’s classic paper "The Design Space of Input Devices" established the value of design spaces as a tool for HCI analysis and invention. We posit that developing design spaces for emerging pre-trained, general AI models is necessary for supporting their integration into human-centered systems and practices. We explore what it means to develop an AI model design space by proposing two design spaces relating to pre-trained AI models: the first considers how HCI can impact pre-trained models (i.e., interfaces for models) and the second considers how pre-trained models can impact HCI (i.e., models as an HCI prototyping material).
View details
Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models
Aaron Michael Donsbach
Edwin Toh
Ellen Jiang
CHI (2022)
Preview abstract
In this paper, we present a natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts. To understand the user experience of natural language code synthesis with these types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s "syntax,'' despite their input being natural language. Participants also faced challenges in debugging model input, and demonstrated a wide range of variability in the scope and specificity of their requests. From these findings, we discuss design implications for future natural language code synthesis tools built using generating language models.
View details
Prompt-based Prototyping with Large Language Models
Ellen Jiang
Edwin Toh
Aaron Michael Donsbach
ACM CHI case study track (2022)
Preview abstract
Prototyping is notoriously difficult to do with machine learning (ML), but recent advances in large language models may lower the barriers to people prototyping with ML, through the use of natural language prompts. This case study reports on the real-world experiences of industry professionals (e.g. designers, program managers, front-end developers) prototyping new ML-powered feature ideas via prompt-based prototyping. Through interviews with eleven practitioners during a three-week sprint and a workshop, we find that prompt-based prototyping reduced barriers of access by substantially broadening who can prototype with ML, sped up the prototyping process, and grounded communication between collaborators. Yet, it also introduced new challenges, such as the need to reverse-engineer prompt designs, source example data, and debug and evaluate prompt effectiveness. Taken together, this case study provides important implications that lay the groundwork toward a new future of prototyping with ML.
View details