Y. Belinkov and J. Glass, Analyzing hidden representations in end-to-end automatic speech recognition systems, Advances in Neural Information Processing Systems, pp.2438-2448, 2017.

Y. Belinkov, M. Lluís, H. Sajjad, N. Durrani, F. Dalvi et al., Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks, Proceedings of the Eighth International Joint Conference on Natural Language Processing, pp.1-10, 2017.

F. Chollet, Keras. https, 2015.

W. Dai, C. Dai, S. Qu, J. Li, and S. Das, Very deep convolutional neural networks for raw waveforms, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.421-425, 2017.
DOI : 10.1109/ICASSP.2017.7952190

URL : http://arxiv.org/pdf/1610.00087

Z. Elloumi, L. Besacier, O. Galibert, J. Kahn, and B. Lecouteux, Asr performance prediction on unseen broadcast programs using convolutional neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01709779

S. Galliano, E. Geoffrois, D. Mostefa, K. Choukri, J. Bonastre et al., The ester phase ii evaluation campaign for the rich transcription of french broadcast news, Interspeech, pp.1149-1152, 2005.

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., The etape corpus for the evaluation of speech-based tv content processing in the french language, LREC-Eighth international conference on Language Resources and Evaluation, p.page na, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

J. Kahn, O. Galibert, L. Quintard, M. Carré, A. Giraudel et al., A presentation of the REPERE challenge, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2012.
DOI : 10.1109/CBMI.2012.6269851

Y. Kim, Convolutional neural networks for sentence classification. arXiv preprint, 2014.
DOI : 10.3115/v1/d14-1181

URL : https://doi.org/10.3115/v1/d14-1181

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization. CoRR, abs/1412, 2014.

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research, vol.9, pp.2579-2605, 2008.

G. Abdel-rahman-mohamed, G. Hinton, and . Penn, Understanding how deep belief networks perform acoustic modelling, Acoustics , Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp.4273-4276, 2012.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 workshop on automatic speech recognition and understanding, EPFL- CONF-192584, 2011.

X. Shi, I. Padhi, and K. Knight, Does String-Based Neural MT Learn Source Syntax?, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.1526-1534, 2016.
DOI : 10.18653/v1/D16-1159

URL : https://doi.org/10.18653/v1/d16-1159

S. Wang, Y. Qian, and K. Yu, What does the speaker embedding encode? In Interspeech, pp.1497-1501, 2017.

Z. Wu and S. King, Investigating gated recurrent neural networks for speech synthesis . CoRR, abs/1601, 2016.
DOI : 10.1109/icassp.2016.7472657