Generating bounding box supervision for semantic segmentation with deep learning

Published in IAPR Workshop on Artificial Neural Networks in Pattern Recognition, Springer, 2018

Recommended citation: Simone Bonechi, Paolo Andreini, Monica Bianchini, and Franco Scarselli. Generating bounding box supervision for semantic segmentation with deep learning. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pages 190–200, Springer, 2018. (BibTex)

Abstract

Most of the leading Convolutional Neural Network (CNN) models for semantic segmentation exploit a large number of pixel–level annotations. Such a human based labeling requires a considerable effort that complicates the creation of large–scale datasets. In this paper, we propose a deep learning approach that uses bounding box annotations to train a semantic segmentation network. Indeed, the bounding box supervision, even though less accurate, is a valuable alternative, effective in reducing the dataset collection costs. The proposed method is based on a two stage training procedure: first, a deep neural network is trained to distinguish the relevant object from the background inside a given bounding box; then, the output of the network is used to provide a weak supervision for a multi–class segmentation CNN. The performances of our approach have been assessed on the Pascal–VOC 2012 segmentation dataset, obtaining competitive results compared to a fully supervised setting.

You can find the full paper here