Code and dataset for the paper "GUM-DiT: A Foundation Model for Generating Urban Morphological Layouts"
Layout generation has emerged as a key research frontier in computer vision and computational design, particularly in urban layout planning, where it significantly contributes to advancing the field. However, related research faces the problem of insufficient dataset resources, and previous approaches have not focused on capturing a variety of urban layout styles for generation. To address these challenges, we introduce a novel approach that embeds urban style information as learnable morphology tokens. This enables our framework to capture both the structural elements and stylistic nuances across different city layouts. To train our model, we curate a large-scale dataset containing urban layouts along with their stylistic descriptions. We propose a comprehensive multiscale data processing workflow and a Diffusion Transformer (DiT) framework, leveraging diffusion models with transformer architecture to generate diverse urban layout patterns. Extensive qualitative and quantitative evaluations demonstrate that our method outperforms baseline approaches in regenerating distinct and varied urban layouts.
Python 3.8.19
torch 2.4.1+cu121
opencv-python 4.10.0.8
First, download and set up the repo:
git clone https://github.com/xiaohangDong/GUMDiT.gitWe provide an requirements.txt file that can be used to create the environment.
Our dataset contains rich annotations, including urban style, urban names, urban labels, urban road layout, urban building layout and descriptions of each region.
The pre-process of our dataset is shown in create_dtaset.py
GUM-DiT utilizes urban names to generate urban layouts. We selected CLIP as the feautre extraction model and fine-tuned the official model.
python label2layout_train.py --model DiT-osm/s --image-size 1024python label2layout_sample.pyIf you have any questions, please feel free to contact us.

