Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification

Ming Jiang; Jennifer D’Souza; Sören Auer; J. Stephen Downie

doi:10.1007/978-3-030-64452-9_1

Details

Originalsprache	Englisch
Titel des Sammelwerks	Digital Libraries at Times of Massive Societal Transition
Untertitel	22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings
Herausgeber/-innen	Emi Ishita, Natalie Lee Pang, Lihong Zhou
Erscheinungsort	Cham
Herausgeber (Verlag)	Springer Science and Business Media Deutschland GmbH
Seiten	3-19
Seitenumfang	17
ISBN (elektronisch)	978-3-030-64452-9
ISBN (Print)	9783030644512
Publikationsstatus	Veröffentlicht - 2020
Veranstaltung	22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020 - Kyoto, Japan Dauer: 30 Nov. 2020 → 1 Dez. 2020

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	12504 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)
Allgemeine Computerwissenschaft

Zitieren

Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification. / Jiang, Ming; D’Souza, Jennifer; Auer, Sören et al.
Digital Libraries at Times of Massive Societal Transition : 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings. Hrsg. / Emi Ishita; Natalie Lee Pang; Lihong Zhou. Cham: Springer Science and Business Media Deutschland GmbH, 2020. S. 3-19 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12504 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Jiang, M, D’Souza, J, Auer, S & Downie, JS 2020, Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification. in E Ishita, NL Pang & L Zhou (Hrsg.), Digital Libraries at Times of Massive Societal Transition : 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12504 LNCS, Springer Science and Business Media Deutschland GmbH, Cham, S. 3-19, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, 30 Nov. 2020. https://doi.org/10.1007/978-3-030-64452-9_1

Jiang, M., D’Souza, J., Auer, S., & Downie, J. S. (2020). Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification. In E. Ishita, N. L. Pang, & L. Zhou (Hrsg.), Digital Libraries at Times of Massive Societal Transition : 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings (S. 3-19). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12504 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-64452-9_1

Jiang M, D’Souza J, Auer S, Downie JS. Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification. in Ishita E, Pang NL, Zhou L, Hrsg., Digital Libraries at Times of Massive Societal Transition : 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings. Cham: Springer Science and Business Media Deutschland GmbH. 2020. S. 3-19. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2020 Nov 26. doi: 10.1007/978-3-030-64452-9_1

Jiang, Ming ; D’Souza, Jennifer ; Auer, Sören et al. / Improving Scholarly Knowledge Representation : Evaluating BERT-Based Models for Scientific Relation Classification. Digital Libraries at Times of Massive Societal Transition : 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings. Hrsg. / Emi Ishita ; Natalie Lee Pang ; Lihong Zhou. Cham : Springer Science and Business Media Deutschland GmbH, 2020. S. 3-19 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{6c085a08c0e34d97a5bad7c12b281716,

title = "Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification",

abstract = "With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.",

keywords = "Digital library, Information extraction, Knowledge graphs, Neural machine learning, Scholarly text mining, Semantic relation classification",

author = "Ming Jiang and Jennifer D{\textquoteright}Souza and S{\"o}ren Auer and Downie, {J. Stephen}",

year = "2020",

doi = "10.1007/978-3-030-64452-9_1",

language = "English",

isbn = "9783030644512",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "3--19",

editor = "Emi Ishita and Pang, {Natalie Lee} and Lihong Zhou",

booktitle = "Digital Libraries at Times of Massive Societal Transition",

address = "Germany",

note = "22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020 ; Conference date: 30-11-2020 Through 01-12-2020",

}

Download

TY - GEN

T1 - Improving Scholarly Knowledge Representation

T2 - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020

AU - Jiang, Ming

AU - D’Souza, Jennifer

AU - Auer, Sören

AU - Downie, J. Stephen

PY - 2020

Y1 - 2020

N2 - With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

AB - With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

KW - Digital library

KW - Information extraction

KW - Knowledge graphs

KW - Neural machine learning

KW - Scholarly text mining

KW - Semantic relation classification

UR - http://www.scopus.com/inward/record.url?scp=85097538751&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-64452-9_1

DO - 10.1007/978-3-030-64452-9_1

M3 - Conference contribution

AN - SCOPUS:85097538751

SN - 9783030644512

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 19

BT - Digital Libraries at Times of Massive Societal Transition

A2 - Ishita, Emi

A2 - Pang, Natalie Lee

A2 - Zhou, Lihong

PB - Springer Science and Business Media Deutschland GmbH

CY - Cham

Y2 - 30 November 2020 through 1 December 2020

ER -

Research@Leibniz University

Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence

WSDM 2025 General Chairs' Welcome

Leveraging GPT Models For Semantic Table Annotation

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence

WSDM 2025 General Chairs' Welcome

Leveraging GPT Models For Semantic Table Annotation

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence