Semi-supervised triplet loss based learning of ambient audio embeddings

Nicolas Turpault; Romain Serizel; Emmanuel Vincent

Communication Dans Un Congrès Année : 2019

Semi-supervised triplet loss based learning of ambient audio embeddings

(1) , (1) , (1)

Nicolas Turpault

Fonction : Auteur
PersonId : 1042968

Speech Modeling for Facilitating Oral-Based Communication

Romain Serizel

Fonction : Auteur
PersonId : 10320
IdHAL : romain-serizel
IdRef : 223797391

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Deep neural networks are particularly useful to learn relevant repre-sentations from data. Recent studies have demonstrated the poten-tial of unsupervised representation learning for ambient sound anal-ysis using various flavors of the triplet loss. They have comparedthis approach to supervised learning. However, in real situations,it is common to have a small labeled dataset and a large unlabeledone. In this paper, we combine unsupervised and supervised tripletloss based learning into a semi-supervised representation learningapproach. We propose two flavors of this approach, whereby thepositive samples for those triplets whose anchors are unlabeled areobtained either by applying a transformation to the anchor, or byselecting the nearest sample in the training set. We compare ourapproach to supervised and unsupervised representation learning aswell as the ratio between the amount of labeled and unlabeled data.We evaluate all the above approaches on an audio tagging task usingthe DCASE 2018 Task 4 dataset, and we show the impact of thisratio on the tagging performance.

Mots clés

prototypical network audio embedding triplet loss Index Terms-weak labels audio tagging

Domaines

Son [cs.SD] Apprentissage [cs.LG] Intelligence artificielle [cs.AI] Traitement du signal et de l'image [eess.SP]

Fichier principal

ssl_triplet.pdf (302.87 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Turpault : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02025824

Soumis le : vendredi 22 février 2019-11:26:01

Dernière modification le : jeudi 1 février 2024-10:04:24

Archivage à long terme le : jeudi 23 mai 2019-14:41:26

Dates et versions

hal-02025824 , version 1 (22-02-2019)

Identifiants

HAL Id : hal-02025824 , version 1

Citer

Nicolas Turpault, Romain Serizel, Emmanuel Vincent. Semi-supervised triplet loss based learning of ambient audio embeddings. ICASSP 2019, May 2019, Brighton, United Kingdom. ⟨hal-02025824⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS ANR UR1-MATH-NUM

388 Consultations

1315 Téléchargements

Semi-supervised triplet loss based learning of ambient audio embeddings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager