Jump to Content

Fernando Diaz

Fernando Diaz is a research scientist at Google. Fernando’s research focuses on the design of information access systems, including search engines, music recommendation services and crisis response platforms. He is particularly interested in understanding and addressing the societal implications of artificial intelligence more generally. Previously, Fernando was the assistant managing director of Microsoft Research Montréal, where he also led FATE Montréal, and a director of research at Spotify, where he helped establish its research organization on recommendation, search, and personalization. Fernando’s work has received special recognition and awards at SIGIR, CIKM, CSCW, WSDM, ISCRAM, and ECIR. He is the recipient of the 2017 British Computer Society Karen Spärck Jones Award and holds a CIFAR AI Chair. Fernando has co-organized several NIST TREC tracks, WSDM (2013), Strategic Workshop on Information Retrieval (2018), FAT* (2019), SIGIR (2021), and the CIFAR Workshop on Artificial Intelligence and the Curation of Culture (2019). He received his BS in Computer Science from the University of Michigan Ann Arbor and his MS and PhD from the University of Massachusetts Amherst.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract As new forms of data capture emerge to power new AI applications, questions abound about the ethical implications of these data collection practices. In this paper, we present clinicians' perspectives on the prospective benefits and harms of voice data collection during health consultations. Such data collection is being proposed as a means to power models to assist clinicians with medical data entry, administrative tasks, and consultation analysis. Yet, clinicians' attitudes and concerns are largely absent from the AI narratives surrounding these use cases, and the academic literature investigating them. Our qualitative interview study used the concept of an informed consent process as a type of design fiction, to support elicitation of clinicians' perspectives on voice data collection and use associated with a fictional, near-term AI assistant. Through reflexive thematic analysis of in-depth sessions with physicians, we distilled eight classes of potential risks that clinicians are concerned about, including workflow disruptions, self-censorship, and errors that could impact patient eligibility for services. We conclude with an in-depth discussion of these prospective risks, reflect on the use of the speculative processes that illuminated them, and reconsider evaluation criteria for AI-assisted clinical documentation technologies in light of our findings. View details
    Joint Multisided Exposure Fairness for Recommendation
    Bhaskar Mitra
    Chen Ma
    Haolun Wu
    Xue Liu
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)
    Preview abstract Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in information exposure can result in observable social harms, such as withholding economic opportunities from historically marginalized groups (\emph{allocative harm}) or amplifying gendered and racialized stereotypes (\emph{representational harm}). Previously, \citet{diaz2020evaluating} developed the \emph{expected exposure} metric---that incorporates existing user browsing models that have previously been developed for information retrieval---to study fairness of content exposure to individual users. We extend their proposed framework to formalize a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation. Furthermore, we study and discuss the relationships between the different exposure fairness dimensions proposed in this paper, as well as demonstrate how stochastic ranking policies can be optimized towards said fairness goals. View details
    On Natural Language User Profiles for Transparent and Scrutable Recommendation
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22) (2022)
    Preview abstract Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests. View details
    Offline Retrieval Evaluation Without Evaluation Metrics
    Andres Ferraro
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)
    Preview abstract Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare multiple systems’ performance on the same query or user. Although evaluation metrics provide a convenient summary of system performance, they can also obscure subtle behavior in the original ranking and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly comparing ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP substantially improves discriminative power while being robust to missing data and correlating well with existing metrics. View details
    Measuring Commonality in Recommendation of Cultural Content: Recommender Systems to Enhance Cultural Citizenship
    Andres Ferraro
    Georgina Born
    Gustavo Ferreira
    Proceedings of the 16th ACM Conference on Recommender Systems (2022)
    Preview abstract Recommender systems have become the dominant means of curating cultural content, significantly influencing the nature of how individuals experience culture. While the majority of academic and industrial research on recommender systems optimizes for personalized user experience, this paradigm does not capture the ways that recommender systems impact culture as an aggregate concept. And, although existing novelty, diversity, and fairness studies recommender systems are related to the broader social role of cultural content, they do not adequately center culture as a core concept. In this work, we introduce the commonality as a new measure of recommender systems that reflects the degree to which recommendations familiarize a given user population with specified categories of cultural content. Our proposed commonality metric responds to a set of arguments developed through an interdisciplinary dialogue between researchers in computer science and the social sciences and humanities. Taking movie recommendation as a case study, we empirically compare the performance of more than twenty recommendation algorithms using commonality and existing utility, diversity, novelty, and fairness metrics. Our results demonstrate that commonality captures a property of system behavior complementary to existing metrics. These results suggest the need for alternative, non-personalized interventions in recommender systems. In this way, commonality contributes to a growing body of scholarship developing ‘public good’ rationales for digital media and machine learning systems. View details
    Retrieval Enhanced Machine Learning
    Hamed Zamani
    SIGIR 2022: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Perspectives Track)
    Preview abstract Information access systems have supported people during tasks across a variety of domains. In this perspective paper, we advocate for broadening the scope of information access research to include machines. We believe that machine learning can be substantially advanced by developing a research program around retrieval as a core algorithmic method. This paper describes how core principles of indexing, representation, retrieval, and relevance can extend supervised learning algorithms. It proposes a generic retrieval-enhanced machine learning (REML) framework and describes challenges in and opportunities introduced by implementing REML. We also discuss different optimization approaches for training REML models and review a number of case studies that are simplified and special implementations of the proposed framework. The research agenda introduced in this paper will smooth the path towards developing machine learning models with better scalability, sustainability, effectiveness, and interpretability. View details
    Mixed Methods Development of Evaluation Metrics
    Brian St Thomas
    Christine Hosey
    Praveen Ravichandran
    (2021)
    Preview abstract Designers of online search and recommendation services often need to develop metrics to assess system performance. This tutorial focuses on mixed methods approaches to developing user-focused evaluation metrics. This starts with choosing how data is logged or how to interpret current logged data, with a discussion of how qualitative insights and design decisions can restrict or enable certain types of logging. When we create a metric from that logged data, there are underlying assumptions about how users interact with the system and evaluate those interactions. We will cover what these assumptions look like for some traditional system evaluation metrics and highlight quantitative and qualitative methods that investigate and adapt these assumptions to be more explicit and expressive of genuine user behavior. We discuss the role that mixed methods teams can play at each stage of metric development, starting with data collection, designing both online and offline metrics, and supervising metric selection for decision making. We describe case studies and examples of these methods applied in the context of evaluating personalized search and recommendation systems. Finally, we close with practical advice for applied quantitative researchers who may be in the early stages of planning collaborations with qualitative researchers for mixed methods metrics development. View details
    Art Sheets for Art Datasets
    Ramya Malur Srinivasan
    Jordan Jennifer Famularo
    Beth Coleman
    NeurIPS Dataset & Benchmark track (2021)
    Preview abstract As machine learning (ML) techniques are being employed to authenticate artworks and estimate their market value, computational tasks have expanded across a variety of creative domains and datasets drawn from the arts. With recent progress in generative modeling, ML techniques are also used for simulating artistic styles and for producing new content in various media such as music, visual arts, poetry, etc. While this progress has opened up new creative avenues, it has also paved the way for adverse downstream effects such as cultural appropriation (e.g., cultural misrepresentation, offense, and undervaluing) and amplification of gender and racial stereotypes, to name a few. Many such concerning issues stem from the training data in ways that diligent evaluation can uncover, prevent, and mitigate. In this paper, we provide a checklist of questions customized for use with art datasets, building on the questionnaire for datasets provided in Datasheets, by guiding assessment of developer motivation together with dataset provenance, composition, collection, pre-processing, cleaning, labeling, use (including data generation/synthesis), distribution, and maintenance. Case studies exemplify the value of our questionnaire. We hope our work aids ML scientists and developers by providing a framework for responsible design, development, and use of art datasets. View details
    On Evaluating Session-Based Recommendation with Implicit Feedback
    Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2021)
    Preview abstract Session-based recommendation systems are used in environments where system recommendation actions are interleaved with user choice reactions. Domains include radio-style song recommendation, session-aware related-items in a shopping context, and next video recommendation. In many situations, interactions logged from a production policy can be used to train and evaluate such session-based recommendation systems. This paper presents several concerns with interpreting logged interactions as reflecting user preferences and provides possible mitigations to those concerns. View details
    No Results Found