Abstract
Visual perception is fundamental to robotic manipulation for recognizing objects, goals, and contextual details. Third-person cameras provide global views but can miss contact-rich interactions and require calibration. Wrist-mounted egocentric cameras reduce these limitations but introduce occlusion, motion blur, and partial observability, which complicate visuomotor learning. Furthermore, existing perception modules that rely solely on pixels or fuse imagery with proprioception as flat vectors do not explicitly model structured scene representations in dynamic egocentric views. To address these challenges, a multi-slot attention fusion encoder for egocentric manipulation is introduced. Learnable slot queries extract localized visual features from image tokens, and Feature-wise Linear Modulation (FiLM) conditions each slot on the robot’s joint states, producing a structured slot-based latent representation that adapts to viewpoint and configuration changes without requiring object labels or external camera priors. The resulting structured slot-based latent representation is used as input to a Soft Actor–Critic (SAC) agent, which achieves a higher mean cumulative return than pixel-only CNN/DrQ and state-only baselines on a ManiSkill3 egocentric manipulation task. Probing experiments and real-camera evaluation further show that the learned representation remains stable under egocentric viewpoint shifts and partial occlusions, indicating robustness in practical manipulation settings.
| Original language | English |
|---|---|
| Article number | 1365 |
| Journal | Electronics (Switzerland) |
| Volume | 15 |
| Issue number | 7 |
| DOIs | |
| State | Published - Apr 2026 |
Keywords
- egocentric perception
- reinforcement learning
- robot manipulation
- visual representation learning
Fingerprint
Dive into the research topics of 'Multi-Slot Attention with State Guidance for Egocentric Robotic Manipulation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver