The recent advancements in machine learning and artificial intelligence (particularly foundation models such as BERT, GPT-3, T5, ResNet, etc.) have demonstrated remarkable capabilities and driven significant revolutionary changes to the way we make inferences from complex data. These models represent a fundamental shift in the way data are approached and offer exciting new research directions and opportunities for multimodal learning and data fusion.
Given the potential of foundation models to transform the field of multimodal learning, there is a need to bring together experts and researchers to discuss the latest developments in this area, exchange ideas, and identify key research questions and challenges that need to be addressed. By hosting this workshop, we aim to create a forum for researchers to share their insights and expertise on multimodal data fusion and learning using foundation models, and to explore potential new research directions and applications in the rapidly evolving field. We expect contributions from interdisciplinary researchers to study and model interactions between (but not limited to) modalities of language, graphs, time-series, vision, tabular data, sensors, and more. Our workshop will emphasize interdisciplinary work and aim at seeding cross-team collaborations around new tasks, datasets, and models.
Nikhil Madaan, Krishna Kesari, Manisha Verma, Shaunak Mishra and Tor Steiner
Daniel Shin, Gao Pei, Priyadarshini Kumari and Tarek Besold
Jincheng Li, Chunyu Xie, Xiaoyu Wu, Bin Wang and Dawei Leng
Tobias Pettersson, Maria Riveiro and Tuwe Löfström
Youxiang Zhu, Nana Lin, Xiaohui Liang, John Batsis, Robert Roth and Brian MacWhinney
Yarong Feng, Zongyi Liu, Yuan Ling, Shunyan Luo, Shujing Dong, Shuyi Wang and Bruce Ferry
Ziniu Hu
August 7th, 2023, 1:00 PM – 5:00 PM (Pacific Time), Long Beach, CA, USA.
Introduction by organizers.
Xin (Luna) Dong Principal Scientist, Meta
Youxiang Zhu, Nana Lin, Xiaohui Liang, John Batsis, Robert Roth and Brian MacWhinney
Daniel Shin, Gao Pei, Priyadarshini Kumari and Tarek Besold
Tobias Pettersson, Maria Riveiro and Tuwe Löfström
Nikhil Madaan, Krishna Kesari, Manisha Verma, Shaunak Mishra and Tor Steiner
Jincheng Li, Chunyu Xie, Xiaoyu Wu, Bin Wang and Dawei Leng
Aidong Zhang Thomas M. Linville Endowed Professor of Computer Science, University of Virginia
Yarong Feng, Zongyi Liu, Yuan Ling, Shunyan Luo, Shujing Dong, Shuyi Wang and Bruce Ferry
Ziniu Hu
Concluding remarks by organizers.
This workshop will provide a platform to discuss the latest advances and trends in theory, methodologies, and applications in the field of multimodal learning. The workshop theme for this year will be on the use of foundation models. These foundation models, such as BERT, T5, LLaMA and GPT-4 which were trained on massive data collections, have significantly revolutionized the field of natural language processing (NLP). The use of such foundation models for solving several NLP tasks represent a fundamental paradigm shift in the way several problems are being solved especially due to their ability to integrate knowledge from other domains such as computer vision (DALL-E, CLIP), retrieval, knowledge graphs and more. Moreover, foundation models have brought some fundamental changes to the multimodal problem setting, especially when integrating text or images with graphs, time-series, and other forms of structured data. As such, the workshop aims to focus on utilizing these foundation models and integrating multiple modalities. Though the workshop might also include discussions and papers about general multimodal learning problems, more emphasis will be given to the works that utilize recently developed foundation models. Our goal will be to explore and showcase the innovative ways in which multimodal learning and data fusion can be employed, with a particular emphasis on how to leverage the capabilities of foundation models for these purposes. The workshop topics include, but are not limited to:
The workshop seeks to bring together researchers in the machine learning and data mining communities and provide a unique opportunity for interdisciplinary researchers to explore and data interactions with foundation models between various modalities, such as text, images, graphs, tabular data, time-series, and more. This workshop will feature invited talks, accepted paper presentations, and a panel discussion to encourage knowledge sharing and foster cross-team collaboration within research and industry communities in the fields of Natural Language Processing (NLP), Information Retrieval, Data Mining, Machine Learning, and others.