SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation

ICCV 2019

Abstract

In this paper we propose a neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings. Given an input, potentially incomplete, 3D scene and a query location, our method predicts a probability distribution over object types that fit well in that location. Our distribution is predicted though passing learned messages in a dense graph whose nodes represent objects in the input scene and edges represent spatial and structural relationships. By weighting messages through an attention mechanism, our method learns to focus on the most relevant surrounding scene context to predict new scene objects. We found that our method significantly outperforms state-of-the-art approaches in terms of correctly predicting objects missing in a scene based on our experiments in the SUNCG dataset. We also demonstrate other applications of our method, including context-based 3D object recognition and iterative scene generation.

Applications

Application I. Object recommendation for query locations (comparison with state-of-the-art methods)

Application II. Iterative scene synthesis

Application III. Context-based object recognition. Left: Object recognition using a multi-view CNN without considering the scene context. Right: Improved recognition by fusing the multi-view CNN and SceneGraphNet predictions based on scene context.




Paper

SceneGraphNet.pdf, 2.4MB

Citation

Yang Zhou, Zachary While, Evangelos Kalogerakis, "SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation", ICCV 2019



Poster

Coming soon!

Source Code & Data

Github code: https://github.com/yzhou359/3DIndoor-SceneGraphNet

SUNCG Dataset: http://suncg.cs.princeton.edu [currently is down :( ]

Acknowledgements

This project is partially collaborated with Wayfair Next Research team (now Wayfair computer vision team and Applied Tech team). We'd like to thank Rebecca Perry and Tim Zhang for their expert advice and encouragement throughout this project.

This research is funded by NSF(CHS-161733). Our experiments were performed in the UMass GPU cluster obtained under the Collaborative Fund managed by the Massachusetts Technology Collaborative.