Ontology-based approach for unsupervised and adaptive focused crawling

Abstract : Information from the web is a key resource exploited in the domain of competitive intelligence. These sources represent important volumes of information to process everyday. As the amount of information available grows rapidly, this process becomes overwhelming for experts. To leverage this challenge, this paper presents a novel approach to process such sources and extract only the most valuable pieces of information. The approach is based on an unsupervised and adaptive ontology-learning process. The resulting ontology is used to enhance the performance of a focused crawler. The combination of Big Data and Semantic Web technologies allows to classify information precisely according to domain knowledge, while maintaining optimal performances. The approach and its implementation are described, and an presents the feasibility and performance of the approach.
Type de document :
Communication dans un congrès
MOD International Conference on Management of Data , May 2017, Chicago, United States. ACM Press, SBD '17 Proceedings of The International Workshop on Semantic Big Data, 2, 2017, MOD International Conference on Management of Data. 〈http://dl.acm.org/citation.cfm?doid=3066911.3066912〉. 〈10.1145/3066911.3066912〉
Domaine :
Liste complète des métadonnées

https://hal-univ-bourgogne.archives-ouvertes.fr/hal-01564206
Contributeur : Le2i - Université de Bourgogne <>
Soumis le : mardi 18 juillet 2017 - 15:55:07
Dernière modification le : mardi 6 février 2018 - 15:56:21

Identifiants

Collections

Citation

Hassan Thomas, Christophe Cruz, Aurélie Bertaux. Ontology-based approach for unsupervised and adaptive focused crawling. MOD International Conference on Management of Data , May 2017, Chicago, United States. ACM Press, SBD '17 Proceedings of The International Workshop on Semantic Big Data, 2, 2017, MOD International Conference on Management of Data. 〈http://dl.acm.org/citation.cfm?doid=3066911.3066912〉. 〈10.1145/3066911.3066912〉. 〈hal-01564206〉

Partager

Métriques

Consultations de la notice

84