The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful robot experience. To address these challenges, we introduce Tether, a method for autonomous functional play involving structured, task-directed interactions. First, we design a novel open-loop policy that warps actions from a small set of source demonstrations (<=10) by anchoring them to semantic keypoint correspondences in the target scene. We show that this design is extremely data-efficient and robust even under significant spatial and semantic variations. Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is the first to perform many hours of autonomous multi-task play in the real world starting from only a handful of demonstrations. This produces a stream of data that consistently improves the performance of closed-loop imitation policies over time, ultimately yielding over 1000 expert-level trajectories and training policies competitive with those learned from human-collected demonstrations.
The following instructions will install everything three conda environments: one main environment for the tether code, and two conda environments for running GeoAware and Mast3r. We have tested on Ubuntu 20.04.
-
Create the Tether conda environment with:
conda create -n tether python=3.10 conda activate tether pip install -r requirements.txt -
Set up the GeoAware conda environment with the instructions here. Change the
<path_to_GeoAware-SC>here to the location you clone the GeoAware-SC repository. -
Set up the MASt3R conda environment with the instructions here.
-
Install our Eva Franka infra, or prepare your own (more details below).
-
Set your Gemini API key in
conf/config.yamlunder theapi_key_smartandapi_key_fastfields. -
Collect the initial set of demonstrations for your target tasks. Place your demonstrations under
data_real/demos.
We expect the following structure for demonstrations:
{demo_dir}/
├── trajectory.npz # Has "state" representing the robot's Cartesian End Effector Pose
├── calibration.json # Camera calibration; keyed by "{camera_id}_left"
│ # each entry has "extrinsics" (euler angles) and "intrinsics"
└── recordings/
└── {camera_name}.mp4 # One video per camera in cfg.setting.cameras
└── recordings/
└── frames/ # One folder of frames per camera in cfg.setting.cameras
└── {camera_name}/
└── 00000.jpg
└── 00001.jpg
...
└── {camera_name2}/
...
└── {camera_name}.mp4 # One video per camera in cfg.setting.cameras
...
-
Edit the demo_names list in the
conf/setting/real.yamlconfiguration to match your demonstration set. The format of this list is:<name of the subdirectory in demo folder>:<desired natural instruction for Gemini action planning and success evaluation> -
In
conf/setting/real.yaml, modify the camera parameters to be the ZED camera serial numbers in your setup. You can find the serial numbers for your cameras using these instructions. -
Adjust the
oob_boundsparameters to the desired workspace in your scene. If the robot exceeds the desired workspace parameters during the execution of a trajectory, it will stop. -
If using Eva, update the ip address in
robot_utils.pyto the Eva machine's ip. Otherwise, implementcollect_scene_image()andsend_trajectory()inrobot_utils.pyfollowing your robot infra.
-
In the respective conda environments from the last section, start the servers for GeoAware and Mast3r by running
serve_geoaware.pyandserve_mast3r.py. Wait for the servers to both print Serving ... before proceeding. -
Start the Eva server and runner here, or prepare your own robot infra.
-
To generate a single rollout, run
python runner.py mode=single action=<Name of action from setting config>. This will select a random demo for your specified action and warp it for the current scene. The rollout data will be saved underdata_real/runs/<run_name>/rollouts_single. -
To run the autonomous play procedure, run
python runner.py mode=cycle. This will begin by preprocessing the demos into the action library. It then will will run a cycle of action selection with the VLM, executing the selected action using Tether, and success evaluation using the VLM. The rollout data will be saved underdata_real/runs/<run_name>/rollouts_cycle.
We thank the following open-sourced projects:
- We compute correspondences using GeoAware-SC and MASt3R.
- Our deployment infrastructure, Eva, builds on DROID's software setup.
This codebase is released under MIT License.
@misc{liang2026tether,
title={Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping},
author={William Liang and Sam Wang and Hung-Ju Wang and Osbert Bastani and Yecheng Jason Ma and Dinesh Jayaraman},
year={2026},
eprint={2603.03278},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.03278},
}