Abstract
We introduce a new context-enriched time series forecasting benchmark TimesX. TimesX contains a wide selection of high-quality real-world time series and diverse textual contexts from an automated generating pipeline, which helps address three main issues of existing benchmarks: (1) poor generalization due to low data volume and data being synthetic, (2) restricted forms of context, and (3) an inability to mitigate data leakage. We conduct a thorough empirical study of current multimodal solutions on TimesX. Our results suggest that most multimodal solutions that work well on existing benchmarks may fail on TimesX. In contrast, simple ensemble methods that leverage the rich textual context can outperform strong unimodal baselines and other multimodal baselines.
** Below this is what was submitted to ITP. **
We create a real world multimodal time-series forecasting benchmark that encompasses diverse domains and regions. Each time-series is annotated by various kinds of contexts like metadata, date and holiday information, dynamic events related to the time-series. This is sufficiently more advanced than other available benchmarks which rely wither on static metadata alone or synthetic examples. This forms a test bed for multimodal forecasting. We also present some baseline results showing that ensembles of publicly available LLMs and time-series foundation models can demonstrate non-trivial performance on this bechmark.