This repository contains the official implementation of LookPlanGraph, a method for embodied instruction following that leverages a scene graph composed of static assets and object priors. It also includes the GraSIF (Graph Scenes for Instruction Following) benchmark.
LookPlanGraph enables robots to plan and execute complex instructions in dynamic environments where object positions may change. It uses a Memory Graph to track the scene state and a Scene Graph Simulator (SGS) to validate actions. A Graph Augmentation Module utilizes a Vision Language Model (VLM) to dynamically update the graph based on the agent's observations.
Code/
├── LookPlanGraph/ # Core implementation of the LookPlanGraph
├── baselines/ # Baseline methods implementations
├── benchmarks/ # Benchmarks
│ ├── grasif/ # GraSIF benchmark
│ └── dynamic_env/ # Data for dynamic environments
├── utils/ # Utility scripts
├── results/ # Directory for storing experiment results
│ ├── grasif/
│ ├── ablation/
│ ├── show_config.yaml
│ └── calculate_metrics.py
├── config_grasif.yaml # Main configuration file
├── grasif_test.py # Main entry point for running experiments
└── requirements.txt # Python dependencies
-
Clone the repository:
git clone <repository-url> cd LookPlanGraph
-
Install dependencies:
pip install -r requirements.txt
-
Set up Environment Variables: You need an API key for the LLM provider (OpenRouter is used by default).
export OPEN_ROUTER_KEY='your_key_here'
Alternatively, you can set the key directly in
config_grasif.yaml.
The experiment settings are defined in config_grasif.yaml. You can modify this file to select the method, dataset, and LLM.
LLM:
model_name: meta-llama/llama-3.3-70b-instruct # Model to use
# ...
methods:
names: [LookPlanGraph] # Options: LookPlanGraph, ReAct, SayPlan, SayPlanLite, LLMasP, LLM+P
mode: null # Ablation modes: 'no_memory', 'no_corrections', etc.
dataset:
subdatasets: [SayPlanOffice] # Options: SayPlanOffice, VirtualHome, Behaviour1kTo run the evaluation on the GraSIF benchmark:
python grasif_test.pyThis script will:
- Load the configuration from
config_grasif.yaml. - Initialize the selected dataset(s).
- Run the specified method(s) on the tasks.
- Save the results (success rates, plans, logs) to the
results/directory.
- ReAct, SayPlan, SayPlanLite: Can be run directly by adding them to the
methods.nameslist inconfig_grasif.yaml. - LLM+P / LLMasP: These methods require a separate PDDL solver, which is implemented as a server inside a Docker container.
- Navigate to
Code/baselines/llmpp/. - Build and run the Docker container.
- Set the
llmpp_urlfrom Docker inconfig_grasif.yaml(default:http://localhost:8091).
- Navigate to
We evaluated LookPlanGraph against several baselines on the GraSIF dataset. The table below summarizes the performance in terms of Success Rate (SR), Average Plan Precision (APP), and Tokens Per Action (TPA).
| Method | SayPlan Office (SR↑ / APP↑ / TPA↓) |
BEHAVIOR-1K (SR↑ / APP↑ / TPA↓) |
RobotHow (SR↑ / APP↑ / TPA↓) |
|---|---|---|---|
| LLM-as-P | 0.47 / 0.59 / 1409 | 0.39 / 0.53 / 178 | 0.44 / 0.51 / 3417 |
| LLM+P | 0.07 / 0.21 / 1945 | 0.33 / 0.37 / 160 | 0.30 / 0.38 / 5396 |
| SayPlan | 0.46 / 0.59 / 3697 | 0.36 / 0.43 / 1888 | 0.86 / 0.87 / 5576 |
| SayPlan Lite | 0.53 / 0.68 / 1368 | 0.61 / 0.76 / 524 | 0.84 / 0.89 / 4641 |
| ReAct | 0.38 / 0.64 / 2503 | 0.47 / 0.61 / 1713 | 0.89 / 0.91 / 1322 |
| LookPlanGraph | 0.62 / 0.73 / 1989 | 0.60 / 0.77 / 1472 | 0.87 / 0.89 / 2653 |
The implementations of LLM as Planner and LLM+P were adapted from the original repository: https://github.com/Cranial-XIX/llm-pddl.
We utilize Fast Downward (https://github.com/aibasel/downward) as the underlying planner.
If you use this code or dataset in your research, please cite our paper:
@inproceedings{onishchenko2025lookplangraph,
title={LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation},
author={Onishchenko, Anatoly O. and Kovalev, Alexey K. and Panov, Aleksandr I.},
year={2025}
}