Exploration of Deep Learning-based Multimodal Fusion for Semantic Road Scene Segmentation

Deep neural networks have been frequently used for semantic scene understanding in recent years. Effective and robust segmentation in outdoor scene is prerequisite for safe autonomous navigation of autonomous vehicles. In this paper, our aim is to find the best exploitation of different imaging modalities for road scene segmentation, as opposed to using a single RGB modality. We explore deep learning-based early and later fusion pattern for semantic segmentation, and propose a new multi-level feature fusion network. Given a pair of aligned multimodal images, the network can achieve faster convergence and incorporate more contextual information. In particular, we introduce the first-of-its-kind dataset, which contains aligned raw RGB images and polarimetric images, followed by manually labeled ground truth. The use of polarization cameras is a sensory augmentation that can significantly enhance the capabilities of image understanding, for the detection of highly reflective areas such as glasses and water. Experimental results suggest that our proposed multimodal fusion network outperforms unimodal networks and two typical fusion architectures.

Mots clés

Semantic Segmentation Multimodal Fusion Deep Learning Road Scenes

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

2019_yifei_visapp .pdf (2.94 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Désiré Sidibé : Connectez-vous pour contacter le contributeur

https://u-bourgogne.hal.science/hal-02060222

Soumis le : jeudi 7 mars 2019-11:43:59

Dernière modification le : mardi 23 janvier 2024-03:42:03

Archivage à long terme le : samedi 8 juin 2019-14:27:12

Dates et versions

hal-02060222 , version 1 (07-03-2019)

Identifiants

HAL Id : hal-02060222 , version 1
DOI : 10.5220/0007360403360343

Citer

Yifei Zhang, Olivier Morel, Marc Blanchon, Ralph Seulin, Mojdeh Rastgoo, et al.. Exploration of Deep Learning-based Multimodal Fusion for Semantic Road Scene Segmentation. VISAPP 2019 14th International Conference on Computer Vision Theory and Applications, Feb 2019, Prague, Czech Republic. ⟨10.5220/0007360403360343⟩. ⟨hal-02060222⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BOURGOGNE CNRS IMVIA VIBOT ANR

684 Consultations

753 Téléchargements