Details
Original language | English |
---|---|
Title of host publication | 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems, CBMS 2019 |
Subtitle of host publication | Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 563-566 |
Number of pages | 4 |
ISBN (electronic) | 978-1-7281-2286-1 |
ISBN (print) | 978-1-7281-2287-8 |
Publication status | Published - Jun 2019 |
Event | 32nd IEEE International Symposium on Computer-Based Medical Systems, CBMS 2019 - Cordoba, Spain Duration: 5 Jun 2019 → 7 Jun 2019 |
Publication series
Name | Proceedings - IEEE Symposium on Computer-Based Medical Systems |
---|---|
Volume | 2019-June |
ISSN (Print) | 1063-7125 |
ISSN (electronic) | 2372-9198 |
Abstract
FAIR principles and the Open Data initiatives have motivated the publication of large volumes of data. Specifically, in the biomedical domain, the size of the data has increased exponentially in the last decade, and with the advances in the technologies to collect and generate data, a faster growth rate is expected for the next years. The available collections of data are characterized by the dominant dimensions of big data, i.e., they are not only large in volume, but they can be also heterogeneous and present quality issues. These data complexity problems impact on the typical tasks of data management, and particularly, in the task of integrating big biomedical data sources. We tackle the problem of big data integration and present a knowledge-driven framework able to extract and integrate data collected from structured and unstructured data sources. The proposed framework resorts to Natural Language Processing techniques to extract knowledge from unstructured data and short text. Furthermore, ontologies and controlled vocabularies, e.g., UMLS, are utilized to annotate the extracted entities and relations with terms from the ontology or controlled vocabulary. The annotated data is integrated into a knowledge graph. A unified schema is used to describe the meaning of the integrated data as well as the main properties and relations. As proof of concept, we show the results of applying the proposed framework to integrate clinical records from lung cancer patients with data extracted from open data sources like Drugbank and PubMed. The created knowledge graph enables the discovery of interactions between drugs in the treatments prescribed to lung cancer patients.
Keywords
- Big Data, Biomedical Data, Knowledge Graph, Natural Language Processing, Semantic Data Integration
ASJC Scopus subject areas
- Medicine(all)
- Radiology Nuclear Medicine and imaging
- Computer Science(all)
- Computer Science Applications
Sustainable Development Goals
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2019 IEEE 32nd International Symposium on Computer-Based Medical Systems, CBMS 2019: Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 563-566 8787394 (Proceedings - IEEE Symposium on Computer-Based Medical Systems; Vol. 2019-June).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Semantic data integration techniques for transforming big biomedical data into actionable knowledge
AU - Vidal, Maria Esther
AU - Jozashoori, Samaneh
N1 - Funding information: This work has been supported by the European Union’s Horizon 2020 Research and Innovation Program for the project iASiS with grant agreement No 727658.
PY - 2019/6
Y1 - 2019/6
N2 - FAIR principles and the Open Data initiatives have motivated the publication of large volumes of data. Specifically, in the biomedical domain, the size of the data has increased exponentially in the last decade, and with the advances in the technologies to collect and generate data, a faster growth rate is expected for the next years. The available collections of data are characterized by the dominant dimensions of big data, i.e., they are not only large in volume, but they can be also heterogeneous and present quality issues. These data complexity problems impact on the typical tasks of data management, and particularly, in the task of integrating big biomedical data sources. We tackle the problem of big data integration and present a knowledge-driven framework able to extract and integrate data collected from structured and unstructured data sources. The proposed framework resorts to Natural Language Processing techniques to extract knowledge from unstructured data and short text. Furthermore, ontologies and controlled vocabularies, e.g., UMLS, are utilized to annotate the extracted entities and relations with terms from the ontology or controlled vocabulary. The annotated data is integrated into a knowledge graph. A unified schema is used to describe the meaning of the integrated data as well as the main properties and relations. As proof of concept, we show the results of applying the proposed framework to integrate clinical records from lung cancer patients with data extracted from open data sources like Drugbank and PubMed. The created knowledge graph enables the discovery of interactions between drugs in the treatments prescribed to lung cancer patients.
AB - FAIR principles and the Open Data initiatives have motivated the publication of large volumes of data. Specifically, in the biomedical domain, the size of the data has increased exponentially in the last decade, and with the advances in the technologies to collect and generate data, a faster growth rate is expected for the next years. The available collections of data are characterized by the dominant dimensions of big data, i.e., they are not only large in volume, but they can be also heterogeneous and present quality issues. These data complexity problems impact on the typical tasks of data management, and particularly, in the task of integrating big biomedical data sources. We tackle the problem of big data integration and present a knowledge-driven framework able to extract and integrate data collected from structured and unstructured data sources. The proposed framework resorts to Natural Language Processing techniques to extract knowledge from unstructured data and short text. Furthermore, ontologies and controlled vocabularies, e.g., UMLS, are utilized to annotate the extracted entities and relations with terms from the ontology or controlled vocabulary. The annotated data is integrated into a knowledge graph. A unified schema is used to describe the meaning of the integrated data as well as the main properties and relations. As proof of concept, we show the results of applying the proposed framework to integrate clinical records from lung cancer patients with data extracted from open data sources like Drugbank and PubMed. The created knowledge graph enables the discovery of interactions between drugs in the treatments prescribed to lung cancer patients.
KW - Big Data
KW - Biomedical Data
KW - Knowledge Graph
KW - Natural Language Processing
KW - Semantic Data Integration
UR - http://www.scopus.com/inward/record.url?scp=85070971867&partnerID=8YFLogxK
U2 - 10.1109/CBMS.2019.00116
DO - 10.1109/CBMS.2019.00116
M3 - Conference contribution
AN - SCOPUS:85070971867
SN - 978-1-7281-2287-8
T3 - Proceedings - IEEE Symposium on Computer-Based Medical Systems
SP - 563
EP - 566
BT - 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems, CBMS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Symposium on Computer-Based Medical Systems, CBMS 2019
Y2 - 5 June 2019 through 7 June 2019
ER -