Nitesh Goyal

Nitesh Goyal

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract With the rise of open data in the last two decades, more datasets are online and more people are using them for projects and research. But how do people find datasets? We present the first user study of Google Dataset Search, a dataset-discovery tool that uses a web crawl and open ecosystem to find datasets. Google Dataset Search contains a superset of the datasets in other dataset-discovery tools—a total of 45 million datasets from 13,000 sources. We found that the tool addresses a previously identified need: a search engine for datasets across the entire web, including datasets in other tools. However, the tool introduced new challenges due to its open approach: building a mental model of the tool, making sense of heterogeneous datasets, and learning how to search for datasets. We discuss recommendations for dataset-discovery tools and open research questions. View details
    Preview abstract Online harassment is a major societal challenge that impacts multiple communities. Some members of community, like female journalists and activists, bear significantly higher impacts since their profession requires easy accessibility, transparency about their identity, and involves highlighting stories of injustice. Through a multi-phased qualitative research study involving a focus group and interviews with 27 female journalists and activists, we mapped the journey of a target who goes through harassment. We introduce PMCR framework, as a way to focus on needs for Prevention, Monitoring, Crisis and Recovery. We focused on Crisis and Recovery, and designed a tool to satisfy a target’s needs related to documenting evidence of harassment during the crisis and creating reports that could be shared with support networks for recovery. Finally, we discuss users’ feedback to this tool, highlighting needs for targets as they face the burden and offer recommendations to future designers and scholars on how to develop tools that can help targets manage their harassment. View details
    Preview abstract Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We formed three such rater pools for this study--specialized rater pools of raters from the U.S. who identify as African American, LGBTQ, and those who identify as neither. Each of these rater pools annotated the same set of comments, which contains many references to these identity groups. We found that rater identity is a statistically significant factor in how raters will annotate toxicity for identity-related annotations. Using preliminary content analysis, we examined the comments with the most disagreement between rater pools and found nuanced differences in the toxicity annotations. Next, we trained models on the annotations from each of the different rater pools, and compared the scores of these models on comments from several test sets. Finally, we discuss how using raters that self-identify with the subjects of comments can create more inclusive machine learning models, and provide more nuanced ratings than those by random raters. View details
    Capturing Covertly Toxic Speech via Crowdsourcing
    Alyssa Whitlock Lees
    Daniel Borkan
    Ian Kivlichan
    Jorge M Nario
    HCI, https://sites.google.com/corp/view/hciandnlp/home(2021) (to appear)
    Preview abstract We study the task of extracting covert or veiled toxicity labels from user comments. Prior research has highlighted the difficulty in creating language models that recognize nuanced toxicity such as microaggressions. Our investigations further underscore the difficulty in parsing such labels reliably from raters via crowdsourcing. We introduce an initial dataset, COVERTTOXICITY, which aims to identify such comments from a refined rater template, with rater associated categories. Finally, we fine-tune a comment-domain BERT model to classify covertly offensive comments and compare against existing baselines. View details
    Designing for Mobile Experience Beyond the Native Ad Click: Exploring Landing Page Presentation Style & Media Usage
    Marc Bron
    Mounia Lalmas
    Andrew Haines
    Henriette Cramer
    Journal of the Association for Information Science and Technology(2018)
    Preview abstract Many free mobile applications are supported by advertising. Ads can greatly affect user perceptions and behavior. In mobile apps, ads often follow a “native” format: they are designed to conform in both format and style to the actual content and context of the application. Clicking on the ad leads users to a second destination, outside of the hosting app, where the unified experience provided by native ads within the app is not necessarily reflected by the landing page the user arrives at. Little is known about whether and how this type of mobile ads is impacting user experience. In this paper, we use both quantitative and qualitative methods to study the impact of two design decisions for the landing page of a native ad on the user experience: (i) native ad style (following the style of the application) versus a non-native ad style; and (ii) pages with multimedia versus static pages. We found consider-able variability in terms of user experience with mobile ad landing pages when varying presentation style and multimedia usage, especially interaction between presence of video and ad style (native or non-native). W e also discuss insights and recommendations for improving the user experience with mobile native ads. View details
    Intelligent Interruption Management using Electro Dermal Activity based Physiological Sensor for Collaborative Sensemaking
    Susan R. Fussell
    Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 1(2017), 52:1-52:21
    Preview abstract Sensemaking tasks are difficult to accomplish with limited time and attentional resources because analysts are faced with a constant stream of new information. While this information is often important, the timing of the interruptions may detract from analyst's work. In an ideal world, there would be no interruptions. But that is not the case in real world sensemaking tasks. So, in this study, we explore the value of timing interruptions based on an analyst's state of arousal as detected by Electrodermal activity derived form galvanic skin response (EDA). In a laboratory study, we compared performance when interruptions were timed to occur during increasing arousal, decreasing arousal, at random intervals or not at all. Analysts performed significantly better when interruptions occurred during periods of increasing arousal than when they were random. Further, analysts rated process component of team experience significantly higher also during periods of increasing arousal than when they were random. Self-reported workload was not impacted by interruptions timing. We discuss how system designs could leverage inexpensive off-the-shelf wrist sensors to improve interruption timing. View details
    Effects of Sensemaking Translucence on Distributed Collaborative Analysis
    Susan R. Fussell
    Effects of Sensemaking Translucence on Distributed Collaborative Analysis(2016)
    Preview abstract Collaborative sensemaking requires that analysts share their information and insights with each other, but this process of sharing runs the risks of prematurely focusing the investigation on specific suspects. To address this tension, we propose and test an interface for collaborative crime analysis that aims to make analysts more aware of their sensemaking processes. We compare our sensemaking translucence interface to a standard interface without special sensemaking features in a controlled laboratory study. We found that the sensemaking translucence interface significantly improved clue finding and crime solving performance, but that analysts rated the interface lower on subjective measures than the standard interface. We conclude that designing for distributed sensemaking requires balancing task performance vs. user experience and real-time information sharing vs. data accuracy. View details
    RAMPARTS: Supporting Sensemaking with Spatially-Aware Mobile Interactions
    Paweł Wozniak
    Przemysław Kucharski
    Lars Lischke
    Sven Mayer
    Morten Fjeld
    Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
    Preview abstract Synchronous colocated collaborative sensemaking requires that analysts share their information and insights with each other. The challenge is to know when is the right time to share what information without disrupting the present state of analysis. This is crucial in ad-hoc sensemaking sessions with mobile devices because small screen space limits information display. To address these tensions, we propose and evaluate RAMPARTS—a spatially aware sensemaking system for collaborative crime analysis that aims to support faster information sharing, clue-finding, and analysis. We compare RAMPARTS to an interactive tabletop and a paper-based method in a controlled laboratory study. We found that RAMPARTS significantly decreased task completion time compared to paper, without affecting cognitive load or task completion time adversely compared to an interactive tabletop. We conclude that designing for ad-hoc colocated sensemaking on mobile devices could benefit from spatial awareness. In particular, spatial awareness could be used to identify relevant information, support diverse alignment styles for visual comparison, and enable alternative rhythms of sensemaking. View details
    Designing for Collaborative Sensemaking: Leveraging Human Cognition For Complex Tasks
    Susan R. Fussell
    ArXiv(2015)
    Designing for Collaborative Sensemaking: Leveraging Human Cognition For Experts & non-Experts
    AAAI HCOMP(2015)
    Effects of implicit sharing in collaborative analysis
    Gilly Leshed
    Dan Cosley
    Susan R. Fussell
    Effects of Implicit Sharing in Collaborative Analysis, ACM(2014)
    Preview abstract When crime analysts collaborate to solve crime cases, they need to share insights in order to connect the clues, identify a pattern, and attribute the crime to the right culprit. We designed a collaborative analysis tool to explore the value of implicitly sharing insights and notes, without requiring analysts to explicitly push information or request it from each other. In an experiment, pairs of remote individuals played the role of crime analysts solving a set of serial killer crimes with both partners having some, but not all, relevant clues. When implicit sharing of notes was available, participants remembered more clues related to detecting the serial killer, and they perceived the tool as more useful compared to when implicit sharing was not available. View details
    Effects of visualization and note-taking on sensemaking and analysis
    Gilly leshed
    Dan Cosley
    Effects of Visualization and Note-taking on Sensemaking and Analysis(2013)
    Preview abstract Many sophisticated tools have been developed to help analysts detect patterns in large datasets, but the value of these tools' individual features is rarely tested. In an experiment in which participants played detectives solving homicides, we tested the utility of a visualization of data links and a notepad for collecting and organizing annotations. The visualization significantly improved participants' ability to solve the crime whereas the notepad did not. Having both features available provided no benefit over having just the visualization. The results raise questions about the potential constraints on the usefulness of intelligence analysis tools. View details
    Leveraging partner's insights for distributed collaborative sensemaking
    Gilly Leshed
    Dan Cosley
    Leveraging Partner's Insights for Distributed Collaborative Sensemaking(2013)
    Preview abstract SAVANT is a web-based tool that enables information and knowledge sharing between remote partners through explicit and implicit communication to help them collaboratively analyze and make sense of distributed data. SAVANT's implicit sharing provides an opportunity to leverage partners' insights and reduce cognitive tunneling, and explicit sharing facilitates discussion. Both techniques assist collaborative sensemaking processes. View details
    Massively distributed authorship of academic papers
    Bill Tomlinson
    Joel Ross
    Paul Andre
    Eric Baumer
    Donald Patterson
    Joseph Corneli
    Martin Mahaux
    Syavash Nobarany
    Marco Lazzari
    Birgit Penzenstadler
    Andrew Torrance
    David Callele
    Gary Olson
    Marcus Stünder
    Fabio Romancini Palamedi
    Albert Ali Salah
    Eric Morrill
    Xavier Franch
    Florian Floyd Mueller
    Joseph'Jofish' Kaye
    Rebecca W Black
    Marisa L Cohn
    Patrick C Shih
    Johanna Brewer
    Pirjo Näkki
    Jeff Huang
    Nilufar Baghaei
    Craig Saper
    Massively distributed authorship of academic papers, ACM(2012)
    Preview abstract Wiki-like or crowdsourcing models of collaboration can provide a number of benefits to academic work. These techniques may engage expertise from different disciplines, and potentially increase productivity. This paper presents a model of massively distributed collaborative authorship of academic papers. This model, developed by a collective of thirty authors, identifies key tools and techniques that would be necessary or useful to the writing process. The process of collaboratively writing this paper was used to discover, negotiate, and document issues in massively authored scholarship. Our work provides the first extensive discussion of the experiential aspects of large-scale collaborative research. View details
    Cultural differences across governmental website design
    William Miner
    Nikhil Nawathe
    Cultural Differences Across Governmental Website Design, ACM(2012)
    Preview abstract In this paper, we study the relevance of Hall and Hofstede's works to the web design beyond traditional domain areas like e-commerce, and advertising. Existing theories explain how design may be affected by cultural differences, and we explore how those differences can be seen in the government website design across Brazil, Russia, India, China, and US. We describe our findings confirming that differences exist, more so between China and US than the rest, and point out where cultural theories fail to explain the results, in particular for Brazil, Russia and India and finally, focus more on the differences between China and US. View details
    SPRING: speech and pronunciation improvement through games, for Hispanic children
    Anuj Tewari
    Matthew K. Chan
    Tina Yau
    John Canny
    Ulrik Schroeder
    SPRING: Speech and Pronunciation Improvement Through Games, for Hispanic Children(2010)
    Preview abstract Lack of proper English pronunciations is a major problem for immigrant population in developed countries like U.S. This poses various problems, including a barrier to entry into mainstream society. This paper presents a research study that explores the use of speech technologies merged with activity-based and arcade-based games to do pronunciation feedback for Hispanic children within the U.S. A 3-month long study with immigrant population in California was used to investigate and analyze the effectiveness of computer aided pronunciation feedback through games. In addition to quantitative findings that point to statistically significant gains in pronunciation quality, the paper also explores qualitative findings, interaction patterns and challenges faced by the researchers in dealing with this community. It also describes the issues involved in dealing with pronunciation as a competency. View details