Details
Original language | English |
---|---|
Title of host publication | SIGMOD '23 |
Subtitle of host publication | Companion of the 2023 International Conference on Management of Data |
Publisher | Association for Computing Machinery (ACM) |
Pages | 119-122 |
Number of pages | 4 |
ISBN (electronic) | 9781450395076 |
Publication status | Published - 5 Jun 2023 |
Event | 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, United States Duration: 18 Jun 2023 → 23 Jun 2023 |
Publication series
Name | Proceedings of the ACM SIGMOD International Conference on Management of Data |
---|---|
ISSN (Print) | 0730-8078 |
Abstract
One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.
Keywords
- data discovery for ML, data integration, index structures
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Information Systems
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM), 2023. p. 119-122 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Demonstrating MATE and COCOA for Data Discovery
AU - Becktepe, Jannis
AU - Esmailoghli, Mahdi
AU - Koch, Maximilian
AU - Abedjan, Ziawasch
N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.
PY - 2023/6/5
Y1 - 2023/6/5
N2 - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.
AB - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.
KW - data discovery for ML
KW - data integration
KW - index structures
UR - http://www.scopus.com/inward/record.url?scp=85162848351&partnerID=8YFLogxK
U2 - 10.1145/3555041.3589716
DO - 10.1145/3555041.3589716
M3 - Conference contribution
AN - SCOPUS:85162848351
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 119
EP - 122
BT - SIGMOD '23
PB - Association for Computing Machinery (ACM)
T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
Y2 - 18 June 2023 through 23 June 2023
ER -