Long T. Le
Long T. Le is a Senior Research Software Engineer in Google Cloud AI Research with the mission to bring advance AI to the world. He works on a new deep learning method for tabular data, covid-19 forecasting and recommendation AI. Before joining Google, he was a machine learning engineer in Capital One in NYC. At Capital One, he developed different models in loan optimization and first-party fraud detection. He earned his Ph.D. in computer science from Rutgers University. Before that, he earned a bachelor in computing from National University at Singapore.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
CodecLM: Aligning Language Models with Tailored Synthetic Data
Chun-Liang Li
Jin Miao
NAACL 2024
Preview abstract
Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works focus on generating diverse instructions and applying LLM to increase instruction complexity, often neglecting downstream use cases. It remains unclear how to tailor high-quality data to elicit better instruction-following abilities in different target instruction distributions and LLMs. To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution, and then decode metadata to create tailored instructions. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples. Extensive experiments on four open-domain instruction following benchmarks validate the effectiveness of CodecLM over the current state-of-the-arts.
View details
A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan
Arkady Epshteyn
Ashwin Sura Ravi
Beth Luan
Chun-Liang Li
Daisuke Yoneoka
Dario Sava
Hiroaki Miyata
Hiroki Kayama
Isaac Jones
Joe Mckenna
Johan Euphrosine
Kris Popendorf
Nate Yoder
Shashank Singh
Shuhei Nomura
Thomas Tsai
npj Digital Medicine (2021)
Preview abstract
The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We evaluate an AI-improved forecasting approach that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases and hospitalizations during the following 28 days. We present an international, prospective evaluation of model performance across all states and counties in the USA and prefectures in Japan. National mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths before and after prospective deployment remained consistently <3% (US) and <10% (Japan). Average statewide (US) and prefecture wide (Japan) MAPE was 6% and 20% respectively (14% when looking at prefectures with more than 10 deaths).We show our model performs well even during periods of considerable change in population behavior, and that it is robust to demographic differences across different geographic locations.We further demonstrate the model provides meaningful explanatory insights, finding that the model appropriately responds to local and national policy interventions. Our model enables counterfactual simulations, which indicate continuing NPIs alongside vaccinations is essential for more rapidly recovering from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions.
View details
Interpretable Sequence Learning for Covid-19 Forecasting
Chun-Liang Li
Arkady Epshteyn
Shashank Singh
Martin Nikoltchev
Yash Kumar Sonthalia
NeurIPS (2020)
Preview abstract
We propose a novel model that integrates machine learning into compartmental disease modeling to predict the progression of Covid-19. Our model incorporates explainable encoding of information-bearing covariates to improve performance. The motivation to maintain explainability is two-fold: the behavior of the resulting model will be credible with epidemiologists, and will instill confidence in the intended end-users - policy makers and healthcare institutions. The proposed model can be applied at different geographic resolutions, and we demonstrate it for United States' states and counties. We show that the forecasting accuracy of our model is significantly better than the alternatives, and the explanatory insights from it are qualitatively meaningful.
View details
No Results Found