# LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation This repository contains the official implementation of **LookPlanGraph**, a method for embodied instruction following that leverages a scene graph composed of static assets and object priors. It also includes the **GraSIF** (Graph Scenes for Instruction Following) benchmark. ## Overview LookPlanGraph enables robots to plan and execute complex instructions in dynamic environments where object positions may change. It uses a **Memory Graph** to track the scene state and a **Scene Graph Simulator (SGS)** to validate actions. A **Graph Augmentation Module** utilizes a Vision Language Model (VLM) to dynamically update the graph based on the agent's observations. ## Repository Structure ``` Code/ ├── LookPlanGraph/ # Core implementation of the LookPlanGraph ├── baselines/ # Baseline methods implementations ├── benchmarks/ # Benchmarks │ ├── grasif/ # GraSIF benchmark │ └── dynamic_env/ # Data for dynamic environments ├── utils/ # Utility scripts ├── results/ # Directory for storing experiment results │ ├── grasif/ │ ├── ablation/ │ ├── show_config.yaml │ └── calculate_metrics.py ├── config_grasif.yaml # Main configuration file ├── grasif_test.py # Main entry point for running experiments └── requirements.txt # Python dependencies ``` ## Installation 1. **Clone the repository:** ```bash git clone cd LookPlanGraph ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Set up Environment Variables:** You need an API key for the LLM provider (OpenRouter is used by default). ```bash export OPEN_ROUTER_KEY='your_key_here' ``` Alternatively, you can set the key directly in `config_grasif.yaml`. ## Usage ### Configuration The experiment settings are defined in `config_grasif.yaml`. You can modify this file to select the method, dataset, and LLM. ```yaml LLM: model_name: meta-llama/llama-3.3-70b-instruct # Model to use # ... methods: names: [LookPlanGraph] # Options: LookPlanGraph, ReAct, SayPlan, SayPlanLite, LLMasP, LLM+P mode: null # Ablation modes: 'no_memory', 'no_corrections', etc. dataset: subdatasets: [SayPlanOffice] # Options: SayPlanOffice, VirtualHome, Behaviour1k ``` ### Running Experiments To run the evaluation on the GraSIF benchmark: ```bash python grasif_test.py ``` This script will: 1. Load the configuration from `config_grasif.yaml`. 2. Initialize the selected dataset(s). 3. Run the specified method(s) on the tasks. 4. Save the results (success rates, plans, logs) to the `results/` directory. ### Running Baselines * **ReAct, SayPlan, SayPlanLite:** Can be run directly by adding them to the `methods.names` list in `config_grasif.yaml`. * **LLM+P / LLMasP:** These methods require a separate PDDL solver, which is implemented as a server inside a Docker container. 1. Navigate to `Code/baselines/llmpp/`. 2. Build and run the Docker container. 3. Set the `llmpp_url` from Docker in `config_grasif.yaml` (default: `http://localhost:8091`). ## Results We evaluated LookPlanGraph against several baselines on the GraSIF dataset. The table below summarizes the performance in terms of Success Rate (SR), Average Plan Precision (APP), and Tokens Per Action (TPA). | Method | SayPlan Office
(SR↑ / APP↑ / TPA↓) | BEHAVIOR-1K
(SR↑ / APP↑ / TPA↓) | RobotHow
(SR↑ / APP↑ / TPA↓) | | :--- | :---: | :---: | :---: | | **LLM-as-P** | 0.47 / 0.59 / 1409 | 0.39 / 0.53 / **178** | 0.44 / 0.51 / 3417 | | **LLM+P** | 0.07 / 0.21 / 1945 | 0.33 / 0.37 / 160 | 0.30 / 0.38 / 5396 | | **SayPlan** | 0.46 / 0.59 / 3697 | 0.36 / 0.43 / 1888 | 0.86 / 0.87 / 5576 | | **SayPlan Lite** | 0.53 / 0.68 / **1368** | **0.61** / 0.76 / 524 | 0.84 / 0.89 / 4641 | | **ReAct** | 0.38 / 0.64 / 2503 | 0.47 / 0.61 / 1713 | **0.89** / **0.91** / **1322** | | **LookPlanGraph** | **0.62** / **0.73** / 1989 | 0.60 / **0.77** / 1472 | 0.87 / 0.89 / 2653 | ## Acknowledgements The implementations of LLM as Planner and LLM+P were adapted from the original repository: https://github.com/Cranial-XIX/llm-pddl. We utilize Fast Downward (https://github.com/aibasel/downward) as the underlying planner. ## Citation If you use this code or dataset in your research, please cite our paper: ```bibtex @inproceedings{onishchenko2025lookplangraph, title={LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation}, author={Onishchenko, Anatoly O. and Kovalev, Alexey K. and Panov, Aleksandr I.}, year={2025} } ```