Hi Li, this is very interesting work. I have a couple of questions if you may answer, 1: How do you synch the tactile and visual information? 2: Can this system predict other tasks for which it is not trained?
Hi Fahad, thank you for your interest in our work! 1. We record the timestamps for both the tactile and visual recordings. The stamps are then used to synchronize the collected frames from different data sources. 2. The test set contains motion trajectories that have different initial configurations and action sequences, but they are still from the same task that the model was trained on. We didn't test the model's generalization ability on unseen tasks, in which we would expect certain levels of generalization if the model is trained on a diversified set of tasks, but more experiments are needed to make more concrete statements.