Skip to content

Onishenko-sci/LookPlanGraph

Repository files navigation

LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation

This repository contains the official implementation of LookPlanGraph, a method for embodied instruction following that leverages a scene graph composed of static assets and object priors. It also includes the GraSIF (Graph Scenes for Instruction Following) benchmark.

Overview

LookPlanGraph enables robots to plan and execute complex instructions in dynamic environments where object positions may change. It uses a Memory Graph to track the scene state and a Scene Graph Simulator (SGS) to validate actions. A Graph Augmentation Module utilizes a Vision Language Model (VLM) to dynamically update the graph based on the agent's observations.

Repository Structure

Code/
├── LookPlanGraph/      # Core implementation of the LookPlanGraph
├── baselines/          # Baseline methods implementations
├── benchmarks/         # Benchmarks
│   ├── grasif/         # GraSIF benchmark
│   └── dynamic_env/    # Data for dynamic environments
├── utils/              # Utility scripts
├── results/            # Directory for storing experiment results
│   ├── grasif/
│   ├── ablation/
│   ├── show_config.yaml
│   └── calculate_metrics.py
├── config_grasif.yaml  # Main configuration file
├── grasif_test.py      # Main entry point for running experiments
└── requirements.txt    # Python dependencies

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd LookPlanGraph
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up Environment Variables: You need an API key for the LLM provider (OpenRouter is used by default).

    export OPEN_ROUTER_KEY='your_key_here'

    Alternatively, you can set the key directly in config_grasif.yaml.

Usage

Configuration

The experiment settings are defined in config_grasif.yaml. You can modify this file to select the method, dataset, and LLM.

LLM:
  model_name: meta-llama/llama-3.3-70b-instruct # Model to use
  # ...

methods: 
  names: [LookPlanGraph] # Options: LookPlanGraph, ReAct, SayPlan, SayPlanLite, LLMasP, LLM+P
  mode: null             # Ablation modes: 'no_memory', 'no_corrections', etc.

dataset:
  subdatasets: [SayPlanOffice] # Options: SayPlanOffice, VirtualHome, Behaviour1k

Running Experiments

To run the evaluation on the GraSIF benchmark:

python grasif_test.py

This script will:

  1. Load the configuration from config_grasif.yaml.
  2. Initialize the selected dataset(s).
  3. Run the specified method(s) on the tasks.
  4. Save the results (success rates, plans, logs) to the results/ directory.

Running Baselines

  • ReAct, SayPlan, SayPlanLite: Can be run directly by adding them to the methods.names list in config_grasif.yaml.
  • LLM+P / LLMasP: These methods require a separate PDDL solver, which is implemented as a server inside a Docker container.
    1. Navigate to Code/baselines/llmpp/.
    2. Build and run the Docker container.
    3. Set the llmpp_url from Docker in config_grasif.yaml (default: http://localhost:8091).

Results

We evaluated LookPlanGraph against several baselines on the GraSIF dataset. The table below summarizes the performance in terms of Success Rate (SR), Average Plan Precision (APP), and Tokens Per Action (TPA).

Method SayPlan Office
(SR↑ / APP↑ / TPA↓)
BEHAVIOR-1K
(SR↑ / APP↑ / TPA↓)
RobotHow
(SR↑ / APP↑ / TPA↓)
LLM-as-P 0.47 / 0.59 / 1409 0.39 / 0.53 / 178 0.44 / 0.51 / 3417
LLM+P 0.07 / 0.21 / 1945 0.33 / 0.37 / 160 0.30 / 0.38 / 5396
SayPlan 0.46 / 0.59 / 3697 0.36 / 0.43 / 1888 0.86 / 0.87 / 5576
SayPlan Lite 0.53 / 0.68 / 1368 0.61 / 0.76 / 524 0.84 / 0.89 / 4641
ReAct 0.38 / 0.64 / 2503 0.47 / 0.61 / 1713 0.89 / 0.91 / 1322
LookPlanGraph 0.62 / 0.73 / 1989 0.60 / 0.77 / 1472 0.87 / 0.89 / 2653

Acknowledgements

The implementations of LLM as Planner and LLM+P were adapted from the original repository: https://github.com/Cranial-XIX/llm-pddl.

We utilize Fast Downward (https://github.com/aibasel/downward) as the underlying planner.

Citation

If you use this code or dataset in your research, please cite our paper:

@inproceedings{onishchenko2025lookplangraph,
  title={LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation},
  author={Onishchenko, Anatoly O. and Kovalev, Alexey K. and Panov, Aleksandr I.},
  year={2025}
}

About

Code and benchmark for the paper "LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation".

Resources

Stars

Watchers

Forks

Contributors

Languages