Today, we're joined by Amir Bar, a PhD candidate at Tel Aviv University and UC Berkeley to discuss his research on visual-based learning, including his recent paper, “EgoPet: Egomotion and Interaction Data from an Animal’s Perspective" - arxiv.org/pdf/2404.09991. Amir shares his research projects focused on self-supervised object detection and analogy reasoning for general computer vision tasks. We also discuss the current limitations of caption-based datasets in model training, the ‘learning problem’ in robotics, and the gap between the capabilities of animals and AI systems. Amir introduces ‘EgoPet,’ a dataset and benchmark tasks which allow motion and interaction data from an animal's perspective to be incorporated into machine learning models for robotic planning and proprioception. We explore the dataset collection process, comparisons with existing datasets and benchmark tasks, the findings on the model performance trained on EgoPet, and the potential of directly training robot policies that mimic animal behavior.
🎧 / 🎥 Listen or watch the full episode on our page: twimlai.com/go/692.
🔔 Subscribe to our channel for more great content just like this: ru-vid.com?sub_confi...
🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: twimlai.com/podcast/twimlai/
Follow us on Twitter: / twimlai
Follow us on LinkedIn: / twimlai
Join our Slack Community: twimlai.com/community/
Subscribe to our newsletter: twimlai.com/newsletter/
Want to get in touch? Send us a message: twimlai.com/contact/
📖 CHAPTERS
===============================
00:00 - Introduction
02:36 - Research interests
09:42 - Research projects
20:02 - EgoPet
27:31 - EgoPet dataset
29:25 - Visual Interaction Prediction (VIP) vs object recognition
31:09 - Findings on the model performance trained on EgoPet dataset
32:29 - Benchmark tasks (VIP, VPP, LP)
37:50 - Future directions
🔗 LINKS & RESOURCES
===============================
EgoPet: Egomotion and Interaction Data from an Animal’s Perspective - arxiv.org/pdf/2404.09991
DETReg: Unsupervised Pretraining with Region Priors for Object Detection - arxiv.org/abs/2106.04550
Visual Prompting via Image Inpainting - arxiv.org/abs/2209.00647
Sequential Modeling Enables Scalable Learning for Large Vision Models - arxiv.org/abs/2312.00785
📸 Camera: amzn.to/3TQ3zsg
🎙️Microphone: amzn.to/3t5zXeV
🚦Lights: amzn.to/3TQlX49
🎛️ Audio Interface: amzn.to/3TVFAIq
🎚️ Stream Deck: amzn.to/3zzm7F5
7 авг 2024