Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11142 publications
Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
Preview
Michele Marazzi
Kaveh Razavi
Salman Qazi
Diego Meyer
Patrick Jattke
IEEE Security & Privacy (S&P) (2026)
VISTA: A Test-Time Self-Improving Video Generation Agent
Hootan Nakhost
Xuan Long Do
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)
Preview abstract
Despite rapid advances in text-to-video (T2V) synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. To address this, we introduce VISTA, a novel multi-agent system that autonomously refines prompts to improve video generation. VISTA operates in an iterative loop, first decomposing a user's idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. To rigorously evaluate our proposed approach, we introduce MovieGen-Bench, a new benchmark of diverse single- and multi-scene video generation tasks. Experiments show that while prior methods yield inconsistent gains, VISTA consistently improves video quality, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA's outputs in 68% of comparisons.
View details
Preview abstract
Generative AI is reshaping software development, yet its psychological impact remains under-researched. During May and August 2025 we conducted reflexive thematic analysis of interviews with 12 senior engineers (≥5 years experience) recruited from Western technology hubs to explore shifts in professional identity. We identify a central transition from "coder to conductor," where AI acts as a cognitive partner. Key findings include: (1) a re-architecting of focus from implementation to strategy; (2) a shift in productivity metrics from output to impact; and (3) a dual-impact on agency, where AI empowers autonomy but threatens competence through de-skilling anxieties. These findings suggest that as implementation becomes commoditised, organisational training and career progression must prioritise architectural mastery and metacognitive oversight to ensure sustained developer motivation and system integrity.
View details
The Ontic-Epistemic Distinction: Implications for Robust Machine Intelligence
Shreya Ishita
Master's Thesis (2026) (to appear)
Preview abstract
The current pursuit of robust Machine Intelligence is largely predicated on a substrate independent, functionalist view of cognition, where sufficiently large syntactic processing is expected to eventually yield semantic understanding. This paper explores the ontological distinctions between these computational frameworks and biological cognition, specifically regarding the emergence of robustness. By analyzing phenomena such as the "reversal curse" and performance on novel reasoning benchmarks (e.g., ARC-AGI), I examine whether current limitations are transient artifacts of scale or indicative of a distinct architectural category.
Synthesizing Stevan Harnad’s "Symbol Grounding Problem" with Evan Thompson’s framework of Intrinsic Normativity in autopoietic systems, I argue that true generality requires "Sense-Making", a process distinct from "Information Processing", whereby an agent’s internal states are causally coupled with its environment via survival or system wide stakes. Without this intrinsic normativity, machines may remain epistemic instruments rather than ontic agents. By defining this "Ontic Gap," this paper offers a theoretical lens for evaluating AI safety and governance, moving beyond behavioral simulation to address the structural conditions of understanding.
View details
A probabilistic framework for learning non‐intrusive corrections to long‐time climate simulations from short‐time training data
Benedikt Barthel
Rob Carver
Fei Sha
Themistoklis Sapsis
Journal of Advances in Modeling Earth Systems (2026)
Preview abstract
Despite advances in high performance computing, accurate numerical simulations of global atmospheric dynamics remain a challenge. The resolution required to fully resolve the vast range scales as well as the strong coupling with—often not fully-understood—physics renders such simulations computationally infeasible over time horizons relevant for long-term climate risk assessment. While data-driven parameterizations have shown some promise of alleviating these obstacles, the scarcity of high-quality training data and their lack of long-term stability typically hinders their ability to capture the risk of rare extreme events. In this work we present a general strategy for training variational (probabilistic) neural network models to non-intrusively correct under-resolved long-time simulations of turbulent climate systems. The approach is based on the paradigm introduced by Barthel Sorensen et al. (2024, https://doi.org/10.1029/2023ms004122) which involves training a post-processing correction operator on under-resolved simulations nudged toward a high-fidelity reference. Our variational framework enables us to learn the dynamics of the underlying system from very little training data and thus drastically improve the extrapolation capabilities of the previous deterministic state-of-the art—even when the statistics of that training data are far from converged. We investigate and compare three recently introduced variational network architectures and illustrate the benefits of our approach on an anisotropic quasi-geostrophic flow. For this prototype model our approach is able to not only accurately capture global statistics, but also the anistropic regional variation and the statistics of multiple extreme event metrics—demonstrating significant improvement over previously introduced deterministic architectures.
View details
Preview abstract
High-volume enterprise service organizations face a persistent challenge in transitioning from reactive support models to proactive, preventative ones. This paper introduces the Agentic Trend-to-Knowledge (ATK) methodology, a novel, autonomous framework designed to address this gap. The ATK methodology employs an AI agent that operates in a recurring, closed loop. It first uses a two-stage process for the autonomous thematic analysis of recent support cases to identify the most significant recurring issue. It then leverages Retrieval-Augmented Generation (RAG) to source relevant institutional knowledge. A key innovation is the agent's adaptive, bimodal response: if relevant knowledge is found, it drafts a proactive communication for human review; if a knowledge gap is detected, it autonomously creates a content creation task for the appropriate team. This transforms the agent from an automation tool into a proactive process owner that creates a virtuous cycle of continuous improvement for both case deflection and knowledge base quality. By automating the entire workflow from insight to action, the ATK framework provides a concrete methodology for shifting from a "human-in-the-loop" to a more strategic "human-on-the-loop" operational paradigm.
View details
Type-Aware Ranking of Urban Similarity from Aerial Imagery
Idan Kligvasser
Yotam Intrator
Yuval Desheh
Aviad Barzilai
Niv Efron
Ehud Rivlin
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 821-829
Preview abstract
Estimating and ranking cross-city similarity from aerial imagery is a fundamental challenge in remote sensing and geospatial representation learning. Urban environments differ widely in road layout, marking conventions, and infrastructure design, yet standard visual representations often struggle to disentangle these meaningful structural variations from superficial appearances. In this work, we propose a type-aware contrastive learning framework that measures urban similarity by explicitly modeling distinct infrastructure elements. Leveraging open-vocabulary retrieval, we construct a globally diverse dataset of road-related features, such as intersections, crosswalks, and bus lanes, and train a type-conditioned Vision Transformer that fuses visual features with CLIP-derived semantic embeddings. Crucially, we introduce an adaptive per-type contrastive loss that dynamically emphasizes infrastructure categories with high discriminative power while down-weighting less informative types. To quantify city-level similarity, we aggregate per-type cosine similarities via a lightweight classifier to generate a global city-to-city similarity matrix. Experiments demonstrate that this type-aware approach significantly improves clustering quality and successfully generalizes to unseen cities, establishing a scalable, interpretable foundation for comparative urban analysis.
View details
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
Preview abstract
Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated on the StereoSet and Contract-NLI datasets using Gemma-3 4B, PLD improved Macro F1 scores from 57\% to 90.0\% and 67\% to 83\% respectively, enabling this compact model to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.
View details
Raman Spectroscopy Pre-Trained Encoder: A Self-Supervised Learning Approach For Data-Efficient Domain-Independent Spectroscopy Analysis
Abhiraam Eranti
Yogesh Tewari
Dr. RAFAEL PALACIOS
Amar Gupta
IEEE Access (2026)
Preview abstract
Deep-learning methods have boosted the analytical power of Raman spectroscopy, yet they still require large, task-specific, labeled datasets and often fail to transfer across application domains. The study explores pre-trained encoders as a solution. Pre-trained encoders have significantly impacted Natural Language Processing and Computer Vision with their ability to learn transferable representations that can be applied to a variety of datasets, significantly reducing the amount of time and data required to create capable models. The following work puts forward a new approach that applies these benefits to Raman Spectroscopy. The proposed approach, RSPTE (Raman Spectroscopy Pre-Trained Encoder), is designed to learn generalizable spectral representations without labels. RSPTE employs a novel domain adaptation strategy using unsupervised Barlow Twins decorrelation objectives to learn fundamental spectral patterns from multi-domain Raman Spectroscopy datasets containing samples from medicine, biology, and mineralogy. Transferability is demonstrated through evaluation on several models created by fine-tuning RSPTE for different application domains: Medicine (detection of Melanoma and COVID), Biology (Pathogen Identification), and Agriculture. As an example, using only 20% of the dataset, models trained with RSPTE achieve accuracies ranging 50%–86% (depending on the dataset used) while without RSPTE the range is 9%–57%. Using the full dataset, accuracies with RSPTE range 81%–97%, and without pretraining 51%–97%. Current methods and state-of-the-art models in Raman Spectroscopy are compared to RSPTE for context, and RSPTE exhibits competitive results, especially with less data as well. These results provide evidence that the proposed RSPTE model can effectively learn and transfer generalizable spectral features across different domains, achieving accurate results with less data in less time (both data collection time and training time).
View details
Preview abstract
The advent of 3D Gaussian Splatting has revolutionized graphics rendering by offering high visual quality and fast rendering speed. However, training large-scale scenes at high quality remains challenging due to the substantial memory demands required to store Gaussians and optimizer states. To address these limitations, we propose GS-Offload, fast and memory-efficient training system for 3D Gaussian Splatting. GS-Offload stores Gaussians and optimizer states in host memory and selectively transfer only the necessary data to GPU memory on demand, significantly reducing GPU memory usage. With carefully designed software pipelining and CPU-side optimizer acceleration, GS-Offload achieves training speed near that of GPU-only setups, while significantly lowering GPU memory demands.
View details
Preview abstract
Responsive user interfaces enable dynamically adjusting user interfaces based on device-specific aspects such as screen size, aspect ratio, display resolution, etc. However, traditional responsive design fails to account for different types of constraints of a user and task criticality of the task being performed via the UI. Misalignment between the UI design, user context and task criticality can lead to user error. This disclosure describes techniques, implemented with user permission, for dynamically modifying the layout, information density, and/or interactive physics of a user interface based on a dual-factor analysis of user cognitive state and task criticality. The user's cognitive state can be inferred from behavioral telematics. Task criticality can be inferred from semantic analysis. The information density and other parameters of a user interface are automatically adjusted based on such analyses. Such adjustments include applying or relaxing restrictions on interactivity and adjusting visual prominence of various UI elements to adjust the information density of the user interface. The adjustments can also include adjusting friction as appropriate, hiding certain aspects of the user interface, or other types of adjustments.
View details
Agentic AI Infrastructure in Practice: Learn These Key Hurdles to Deploy Production AI Agents Efficiently
https://swisscognitive.ch/ (2026)
Preview abstract
The emergence of Agentic AI—autonomous systems capable of reasoning, decision-making, and multi-step execution—represents a paradigm shift in enterprise technology. Moving beyond simple generative tasks, these agents offer the potential to solve long-standing industry pain points, with over 90% of enterprises planning integration within the next three years. However, the transition from successful proof-of-concept (PoC) to a resilient, production-grade system presents significant hurdles.
This article categorizes these challenges into three primary domains:
Technical and Engineering Hurdles: Issues such as "entangled workflows" that complicate debugging, the struggle to maintain output quality and mitigate hallucinations, and the unpredictability caused by shifting underlying models or data sources.
People, Process, and Ecosystem Hurdles: The high operational costs and unclear ROI of large models, the necessity of a new "Agent Ops" skillset, the complexity of integrating agents with disparate enterprise systems, and a rapidly evolving regulatory landscape.
The Pace of Change and Security risks: The technical debt incurred by shifting software frameworks and the expanded attack surface created by autonomous agents.
The article concludes that successful deployment requires a shift from informal "vibe-testing" to rigorous engineering discipline. By adopting code-first frameworks, establishing robust evaluation metrics (KPIs), and prioritizing functional deployment over theoretical optimization, organizations can effectively manage the lifecycle of Agentic AI and realize its transformative business value.
View details
Preview abstract
This disclosure describes systems and methods for a multi-agent framework that can automate and scale cognitive work. The framework can, for example, use a cognitive assembly line of specialized computational agents to perform tasks such as research and drafting. A beneficial component could be an adversarial review panel (ARP), which is a multi-agent review system where distinct agent personas critique a generated draft from varied perspectives. The structured feedback from the ARP can be used to automatically iterate on and refine the work product. This approach can improve the intellectual rigor of generated content and reduce the time required for production, which may allow human operators to focus on activities such as strategic oversight and final validation.
View details
ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding
Sunny Rajagopalan
Alireza Golestaneh
Shubhra Chandra
Min Zhou
Jonathan Vronsky
Songbai Yan
2026
Preview abstract
We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90\% while maintaining 99.8\% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, intersample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs.
View details
Preview abstract
This article delves into how Google Site Reliability Engineers (SREs) leverage Gemini 3 and the Gemini CLI to aggressively reduce Mean Time to Mitigation (MTTM) during real-world outages. By focusing on the SRE motto of "Eliminate Toil," the article walks through a simulated incident, demonstrating how an agentic CLI acts as a human-in-the-loop copilot across the entire incident lifecycle: from initial paging and investigation, through safe, tool-driven mitigation and root cause analysis, to automated postmortem generation and action item filing. This direct integration of Gemini's reasoning capabilities with operational data and internal tools creates a virtuous cycle where past incident learnings continuously inform and improve future solutions.
View details