Modelling indoor scenes from a single photograph

A research paper about semantic modelling of indoor scenes using a single photograph has been invited to be published in a special issue of the journal Computer Animation and Virtual Worlds (CAVW) by John Wiley. The paper will be presented first at the 31st Conference on Computer Animation and Social Agents (CASA2018) in Beijing in May 2018, the oldest international conference in computer animation and social agents in the world. This paper highlights Bournemouth University’s ongoing research work directly related to the efforts of generating digital models/assets for the VISTA AR project.

The work demonstrates an automatic approach, with the help of artificial intelligence, for semantic modelling of indoor scenes based on a single photograph, instead of relying on RGB-D cameras. We guide indoor scene modelling with feature maps which are extracted by Fully Convolutional Networks (FCNs). Three parallel FCNs are adopted to generate object instance masks, a depth map and an edge map of the room layout. This allows the computer to understand and interpret the scene content in an effective manner for semantic reconstruction. Based on these high-level features, support relationships between indoor objects can be efficiently inferred in a data driven approach.

This research shows an intelligent way to assist indoor digital content creation, which can be used to quickly synthesise indoor digital content and prototyping.