Categoria: Seminari e Convegni
Stato: Corrente
21 maggio, ore 14.30

Advancing Instance-Level Perception: End-to-End Sequence Modeling for Tracking and Efficient

Seminario Online

Seminario online Advancing Instance-Level Perception: End-to-End Sequence Modeling for Tracking and Efficient, tenuto dal dott. Mattia Segù.

Quando: 21 maggio, ore 14.30
Dove: online al link: https://tinyurl.com/yc296vdp

Relatore: Mattia Segù - ETH Zurich
Organizza: prof.ssa Tatiana Tommasi del DAUIN

L'evento è svolto nell'ambito del Ciclo di seminari online "Ellis Turin Talk" in collaborazione con l'Artificial intelligence Hub del Politecnico di Torino e il Gruppo di ricerca Vandal del Dipartimento di Automatica e Informatica.

Abstract: Instance-level perception - the ability to localize, segment, and classify individual objects over time - is fundamental to systems that interact with the physical world. Recent advances in model architectures and data quality have enabled unified models capable of detecting and segmenting objects across both closed-set categories and free-form referring expressions. However, existing approaches struggle to scale to end-to-end instance tracking and efficiently adapt to edge deployment, posing key challenges for real-world applications. In this talk, I will present two recent works - SambaMOTR and MOBIUS - that push the boundaries of instance-level perception by addressing multi-object tracking and efficient segmentation. SambaMOTR enables end-to-end multiobject tracking by leveraging Samba, a set-of-sequences model that captures long-range dependencies, tracklet interactions, and temporal occlusions, improving robustness in dynamic environments with complex motion. MOBIUS makes vision-language instance segmentation scalable through a bottleneck encoder for efficient scale and modality fusion, and a language-guided calibration loss for adaptive decoder pruning, reducing inference time by up to 75% while maintaining state-of-the-art performance across both high-end and mobile devices. Through the lens of these two works, I will explore how sequence modeling and efficient multi-modal perception can be leveraged to develop scalable, real-time object perception models, enabling robust tracking and segmentation in complex environments.