Speaker: Haotian Liu (UW-Madison)
Title: Steerable Visual Intelligence
Time: Mar 8, 2024, 12:30 PM - 1:30 PM CT
Abstract: Understanding and reasoning about the visual world based on human instructions has long been a challenging problem. The previous paradigm, which involved training supervised models on many sub-tasks and unifying them into a large system, was not streamlined and offered limited steerability. In this talk, I will introduce two of my recent works, REACT and the LLaVA-series, that approach this problem by enhancing customizability using retrieval, and bringing improved steerability with natural language instructions. We demonstrate that REACT and the LLaVA-series offer a promising path for building customizable, large multimodal models that follow human intent at an affordable cost. Finally, I will present several future directions I am eager to explore in building next-generation steerable visual intelligence systems.
Bio: Haotian Liu is a final-year PhD student at University of Wisconsin-Madison, advised by Prof. Yong Jae Lee. His research primarily focuses on computer vision and vision-language multimodal learning. His recent work has centered on building customizable and steerable large models that follow humans’ intent, including instruction-following multimodal models, controllable image generation, and customizable foundation models. He co-organized the 1st and 2nd Workshop on Computer Vision in the Wild in ECCV 2022 and CVPR 2023.
Location: Engineering Research Building (1500 Engineering Drive) Room 514
1 окт 2024