Differentiable Time-Frequency Scattering On GPU

John Muradeli; Cyrus Vahidi; Changhong Wang; Han Han; Vincent Lostanlen; Mathieu Lagrange; George Fazekas

Communication Dans Un Congrès Année : 2022

Differentiable Time-Frequency Scattering On GPU

, (1) , (2) , (2) , (2) , (2) , (1)

1
2

John Muradeli

Fonction : Auteur

Cyrus Vahidi

Fonction : Auteur

Queen Mary University of London

Changhong Wang

Fonction : Auteur
PersonId : 1237245
IdHAL : changhong-wang

Laboratoire des Sciences du Numérique de Nantes

Han Han

Fonction : Auteur

Laboratoire des Sciences du Numérique de Nantes

Vincent Lostanlen

Fonction : Auteur
PersonId : 749246
IdHAL : lostanlen
ORCID : 0000-0003-0580-1651
IdRef : 203022769

Laboratoire des Sciences du Numérique de Nantes

Mathieu Lagrange

Fonction : Auteur
PersonId : 4329
IdHAL : mathieu-lagrange

Laboratoire des Sciences du Numérique de Nantes

George Fazekas

Fonction : Auteur

Queen Mary University of London

Résumé

Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.

Domaines

Machine Learning [stat.ML] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Multimédia [cs.MM] Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

DAFx20in22_paper_25.pdf (2.23 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Mathieu Lagrange : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03863423

Soumis le : mercredi 23 novembre 2022-12:27:51

Dernière modification le : mardi 23 avril 2024-10:18:03

Archivage à long terme le : vendredi 24 février 2023-18:13:01

Dates et versions

hal-03863423 , version 1 (23-11-2022)

Identifiants

HAL Id : hal-03863423 , version 1

Citer

John Muradeli, Cyrus Vahidi, Changhong Wang, Han Han, Vincent Lostanlen, et al.. Differentiable Time-Frequency Scattering On GPU. Digital Audio Effects Conference (DAFx), Sep 2022, Vienna, Austria. ⟨hal-03863423⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS INRIA EC-NANTES UNAM LS2N LS2N-SIMS NANTES-UNIVERSITE

68 Consultations

41 Téléchargements

Differentiable Time-Frequency Scattering On GPU

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager