We propose Wonderful Team, a zero-shot, single-model, multi-agent system for solving visual robotics tasks. Taking inspiration from recent advances in the multi-agent LLM literature, our system employs specialized agents to collaboratively manage different task aspects, from high-level planning to low-level execution, within a single integrated system. In particular, we develop a multi-agent LLM system wherein each agent is responsible for a separate component of task execution: including planning, object identification and location, action proposal, memory, and self-correction.
Results Overview
Note that the results presented here are based on a selection of VIMABench tasks. For more information, please refer to the paper.
Execution Examples
[Real] Spatial Planning
[Real] Fruit Placement
“Place each fruit in the area that matches its color, if such an area exists.”
[Real] Price Ranking
“Based on the price tags and any discounts on the fruits, rank them from the most expensive to the cheapest and place them in the corresponding bowl.”
[Real] Superhero Companions
“Fruits and snacks of similar color make perfect companions. Distribute the unmatched items from the top left corner to the superheroes to help each of them have companion pairs.”
[Sim] Same Texture
“Put all objects with the same texture as {Object} into it”
[Sim] Same Shape
“Put all objects with the same profile as {Object} into it”
[Sim] Pick & Restore
“Put {Object_1} into {Object_2} then {Object_3}. Finally restore it into its original container”