Abstract:
Instance segmentation, also known as simultaneous detection and segmentation (SDS) requires pixel-level instance masks during training. It makes data preparation for mask annotation a labor-intensive task. In this paper, we only use a one-point label for each object instance, which is easy to draw. Our training consists of box verification, which is based on appearance and voting of neighboring boxes, and segment verification on context information of proposal masks. This structure preserves pixel-wise instance information and helps prevent error accumulation compared with trivially training a single segmentation model iteratively. We conduct weakly- and semi-supervised experiments to manifest that this design is effective. Our approach surpasses the state-of-the-art methods.