M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation

1Tsinghua University 2Beihang University 3Migu Beijing Research Institute
*Equal contribution,
Corresponding author

Teaser

M3DLayout Teaser

Abstract

In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed geometric output. It not only provides a structural blueprint for ensuring physical plausibility but also supports semantic controllability and interactive editing.

However, the learning capabilities of current 3D indoor layout generation models are constrained by the limited scale, diversity, and annotation quality of existing datasets. To address this, we introduce M3DLayout, a large-scale, multi-source dataset for 3D indoor layout generation. M3DLayout comprises 15,080 layouts and over 258k object instances, integrating three distinct sources: real-world scans, professional CAD designs, and procedurally generated scenes. Each layout is paired with detailed structured text describing global scene summaries, relational placements of large furniture, and fine-grained arrangements of smaller items. This diverse and richly annotated resource enables models to learn complex spatial and semantic patterns across a wide variety of indoor environments.

To assess the potential of M3DLayout, we establish a benchmark using a text-conditioned diffusion model. Experimental results demonstrate that our dataset provides a solid foundation for training layout generation models. Its multi-source composition enhances diversity, notably through the Inf3DLayout subset which provides rich small-object information, enabling the generation of more complex and detailed scenes. We hope that M3DLayout can serve as a valuable resource for advancing research in text-driven 3D scene synthesis.

Dataset Generator

M3DLayout Generator

This is our pipeline for constructing the M3DLayout dataset. Our framework integrates multi-source data, including the pro- fessional designs dataset 3D-FRONT, real-world scans from Matterport3D, and procedurally generated scenes from Infinigen. The construction process involves: meticulously generating, partitioning, and filtering layouts to create the Inf3DLayout subset; per- forming template-based rules to produce formatted text; and employing global and local rendering for vision-language models (VLM) to produce structured descriptions. This pipeline results in a large-scale, richly-annotated text-3D layout paired dataset.

Scene Dataset Visualization

Scene 1
GIF

Livingroom

Scene 2
GIF

Bedroom

Scene 3
GIF

Livingroom

Scene 4
GIF

Bathroom

Scene 5
GIF

Diningroom

Scene 6
GIF

Kitchen

Click GIF to see scene descriptions!

These are instances of our scene datasets, including the pro- fessional designs dataset 3D-FRONT, real-world scans from Matterport3D, and procedurally generated scenes from Infinigen. The first two rows are from infinigen and the last row is from both 3D-FRONT and Matterport3D. Click GIF to see scene descriptions!

Generated Layout Visualization

Prompt
Generated GIF
Retrieved Scenes
Oblique View
Topdown View
“The room is a bedroom with a neatly arranged sleeping area. The bed is located near the wall corner, with matching furniture on either side. A wardrobe is positioned against the wall.”
Generated GIF - bedroom
Oblique view - bedroom
Topdown view - bedroom
Prompt
Generated GIF
Retrieved Scenes
Oblique View
Topdown View
“In the dining room, a rectangular table sits in the center surrounded by six chairs. Several sideboards are placed against each wall for extra storage.”
Generated GIF - diningroom
Oblique view - diningroom
Topdown view - diningroom
Prompt
Generated GIF
Retrieved Scenes
Oblique View
Topdown View
“The room is a living room featuring a central seating arrangement. A large sofa faces a coffee table with chairs on either side and a TV stand against the wall. Bookshelf with books and decorations line another wall.”
Generated GIF - livingroom
Oblique view - livingroom
Topdown view - livingroom

We utilize our dataset to train a diffusion-based layout generation model. These are instances of our generated layouts and retrieved scenes from corresponding layouts, which indicates extremely strong object diversity, representative ability and semantic understanding.

BibTeX

@misc{zhang2025m3dlayoutmultisourcedataset3d,
      title={M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation}, 
      author={Yiheng Zhang and Zhuojiang Cai and Mingdao Wang and Meitong Guo and Tianxiao Li and Li Lin and Yuwang Wang},
      year={2025},
      eprint={2509.23728},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.23728}, 
}