Skip to Main content Skip to Navigation
Book sections

Impact of textual data augmentation on linguistic pattern extraction to improve the idiomaticity of extractive summaries

Abstract : The present work aims to develop a text summarisation system for financial texts with a focus on the fluidity of the target language. Linguistic analysis shows that the process of writing summaries should take into account not only terminological and collocational extraction, but also a range of linguistic material referred to here as the "support lexicon", that plays an important role in the cognitive organisation of the field. On this basis, this paper highlights the relevance of pre-training the CamemBERT model on a French financial dataset to extend its domainspecific vocabulary and fine-tuning it on extractive summarisation. We then evaluate the impact of textual data augmentation, improving the performance of our extractive text summarisation model by up to 6%-11%.
Document type :
Book sections
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03271380
Contributor : Laurent Gautier Connect in order to contact the contributor
Submitted on : Friday, June 25, 2021 - 4:33:54 PM
Last modification on : Thursday, July 1, 2021 - 3:31:33 AM
Long-term archiving on: : Sunday, September 26, 2021 - 10:26:35 PM

File

Dawak_VFinale_Laifa.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03271380, version 1

Collections

Citation

Abdelghani Laifa, Laurent Gautier, Christophe Cruz. Impact of textual data augmentation on linguistic pattern extraction to improve the idiomaticity of extractive summaries. Matteo Golfarelli; Robert Wrembel. Lecture Notes in Computer Science, Springer, In press, Lecture Notes in Computer Science. ⟨hal-03271380⟩

Share

Metrics

Record views

39

Files downloads

37