Skip to content

Encoding maps [WIP] #804

@s-desh

Description

@s-desh

Goal: A way to encode maps for agents to understand space.

v0
Requirements:

  • The agent should be able to act on queries like "go to the largest room", "go to first room after the corridor" etc.

PRs:
#815

  • Adds evals for point placement and map comprehension.
  • Encodes costmap as RGB image and overlays robot pose.
  • Uses this encoding in a skill that can pull maps and place points based on the query for navigation. (Incomplete)

TODOs:
Evals:

  1. Add larger and realistic maps for eval (only for evals - directly test images instead of image -> costmap -> image)
  2. Support to test multiple models
  3. Separate encoding into OccupancyGridImage

Skill:

  1. Use Agents to send map instead of VLModel
  2. Complete interpret map skill after comparing agent / vlmodel perf.
  3. Testing with agents.

v0.1

  • Overlay object bboxes?

v0.2

  • multifloor? (side by side images)
  • history? (can use time dimension in video models)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions