Long T. Le

Long T. Le

Long T. Le is a Staff Research Engineer in Google Cloud AI Research with the mission to bring advance AI to the world. He's currently focusing in new LLM solution like distillation, RAG, Agent. Before that, he worked on a new deep learning method for tabular data, covid-19 forecasting and recommendation AI. Before joining Google, he was a machine learning engineer in Capital One in NYC. At Capital One, he developed different models in loan optimization and first-party fraud detection. He earned his Ph.D. in computer science from Rutgers University. Before that, he earned a bachelor in computing from National University at Singapore.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning. View details
    Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization
    Cheng-Yu Hsieh
    Yung-Sung Chuang
    Chun-Liang Li
    Abhishek Kumar
    James Glass
    Alexander Ratner
    Ranjay Krishna
    Preview abstract Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences. View details
    Preview abstract Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works focus on generating diverse instructions and applying LLM to increase instruction complexity, often neglecting downstream use cases. It remains unclear how to tailor high-quality data to elicit better instruction-following abilities in different target instruction distributions and LLMs. To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution, and then decode metadata to create tailored instructions. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples. Extensive experiments on four open-domain instruction following benchmarks validate the effectiveness of CodecLM over the current state-of-the-arts. View details
    A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan
    Joel Shor
    Arkady Epshteyn
    Ashwin Sura Ravi
    Beth Luan
    Chun-Liang Li
    Daisuke Yoneoka
    Dario Sava
    Hiroaki Miyata
    Hiroki Kayama
    Isaac Jones
    Joe Mckenna
    Johan Euphrosine
    Kris Popendorf
    Nate Yoder
    Shashank Singh
    Shuhei Nomura
    Thomas Tsai
    npj Digital Medicine (2021)
    Preview abstract The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We evaluate an AI-improved forecasting approach that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases and hospitalizations during the following 28 days. We present an international, prospective evaluation of model performance across all states and counties in the USA and prefectures in Japan. National mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths before and after prospective deployment remained consistently <3% (US) and <10% (Japan). Average statewide (US) and prefecture wide (Japan) MAPE was 6% and 20% respectively (14% when looking at prefectures with more than 10 deaths).We show our model performs well even during periods of considerable change in population behavior, and that it is robust to demographic differences across different geographic locations.We further demonstrate the model provides meaningful explanatory insights, finding that the model appropriately responds to local and national policy interventions. Our model enables counterfactual simulations, which indicate continuing NPIs alongside vaccinations is essential for more rapidly recovering from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions. View details
    Preview abstract We propose a novel model that integrates machine learning into compartmental disease modeling to predict the progression of Covid-19. Our model incorporates explainable encoding of information-bearing covariates to improve performance. The motivation to maintain explainability is two-fold: the behavior of the resulting model will be credible with epidemiologists, and will instill confidence in the intended end-users - policy makers and healthcare institutions. The proposed model can be applied at different geographic resolutions, and we demonstrate it for United States' states and counties. We show that the forecasting accuracy of our model is significantly better than the alternatives, and the explanatory insights from it are qualitatively meaningful. View details