Skip to Main Content (Press Enter)

Logo UNIECAMPUS
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Competenze

UNI-FIND
Logo UNIECAMPUS

|

UNI-FIND

uniecampus.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Competenze
  1. Pubblicazioni

BLAST2: An Efficient Technique for Loose Schema Information Extraction from Heterogeneous Big Data Sources

Articolo
Data di Pubblicazione:
2020
Abstract:
We present BLAST2 a novel technique to efficiently extract loose schema information, i.e., metadata that can serve as a surrogate of the schema alignment task within the Entity Resolution (ER) process — to identify records that refer to the same real-world entity — when integrating multiple, heterogeneous and voluminous data sources. The loose schema information is exploited for reducing the overall complexity of ER, whose naïve solution would imply O(n^2) comparisons, where is the number of entity representations involved in the process and can be extracted by both structured and unstructured data sources. BLAST2 is completely unsupervised yet able to achieve almost the same precision and recall of supervised state-of-the-art schema alignment techniques when employed for Entity Resolution tasks, as shown in our experimental evaluation performed on two real-world data sets (composed of 7 and 10 data sources, respectively).
Tipologia CRIS:
1.1 Articolo in rivista
Keywords:
Information systems; Entity resolution; Data integration; Big Data
Elenco autori:
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Simonini, Giovanni
Autori di Ateneo:
GAGLIARDELLI LUCA
Link alla scheda completa:
https://iris.uniecampus.it/handle/11389/69803
Pubblicato in:
ACM JOURNAL OF DATA AND INFORMATION QUALITY
Journal
  • Dati Generali

Dati Generali

URL

https://dl.acm.org/journal/jdiq
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.6.0.0