Encoding maps [WIP]

Goal: A way to encode maps for agents to understand space.

**v0**
Requirements:
- The agent should be able to act on queries like "go to the largest room", "go to first room after the corridor" etc.

PRs:
#815 
- Adds evals for point placement and map comprehension.  
- Encodes costmap as RGB image and overlays robot pose.
- Uses this encoding in a skill that can pull maps and place points based on the query for navigation. (Incomplete)

TODOs:
Evals:
1. Add larger and realistic maps for eval (only for evals - directly test images instead of image -> costmap -> image)
2. Support to test multiple models
3. Separate encoding into OccupancyGridImage

Skill:
1. Use Agents to send map instead of `VLModel`
2. Complete interpret map skill after comparing agent / vlmodel perf.
3. Testing with agents.


**v0.1**

- Overlay object bboxes?

**v0.2**
- multifloor? (side by side images)
- history? (can use time dimension in video models)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding maps [WIP] #804

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Encoding maps [WIP] #804

Description

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions