Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationTowards Open and Trustworthy Digital Societies
Subtitle of host publication23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
EditorsHao-Ren Ke, Chei Sian Lee, Kazunari Sugiyama
Place of PublicationCham
PublisherSpringer Nature Switzerland AG
Pages401-410
Number of pages10
ISBN (electronic)978-3-030-91669-5
ISBN (print)9783030916688
Publication statusPublished - 2021
Event23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 - Virtual, Online
Duration: 1 Dec 20213 Dec 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13133
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.

Keywords

    Natural language processing, Rule-based system, Scholarly knowledge graphs, Semantic publishing, Terminology extraction

ASJC Scopus subject areas

Cite this

Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. / D’Souza, Jennifer; Auer, Sören.
Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. ed. / Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Cham: Springer Nature Switzerland AG, 2021. p. 401-410 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13133).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

D’Souza, J & Auer, S 2021, Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. in H-R Ke, CS Lee & K Sugiyama (eds), Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13133, Springer Nature Switzerland AG, Cham, pp. 401-410, 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual, Online, 1 Dec 2021. https://doi.org/10.1007/978-3-030-91669-5_31
D’Souza, J., & Auer, S. (2021). Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. In H.-R. Ke, C. S. Lee, & K. Sugiyama (Eds.), Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings (pp. 401-410). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13133). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-91669-5_31
D’Souza J, Auer S. Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. In Ke HR, Lee CS, Sugiyama K, editors, Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. Cham: Springer Nature Switzerland AG. 2021. p. 401-410. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2021 Nov 30. doi: 10.1007/978-3-030-91669-5_31
D’Souza, Jennifer ; Auer, Sören. / Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. editor / Hao-Ren Ke ; Chei Sian Lee ; Kazunari Sugiyama. Cham : Springer Nature Switzerland AG, 2021. pp. 401-410 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{6bdbedd9112c424590bfbf576862da82,
title = "Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles",
abstract = "We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article{\textquoteright}s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.",
keywords = "Natural language processing, Rule-based system, Scholarly knowledge graphs, Semantic publishing, Terminology extraction",
author = "Jennifer D{\textquoteright}Souza and S{\"o}ren Auer",
note = "Funding Information: Supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536).; 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 ; Conference date: 01-12-2021 Through 03-12-2021",
year = "2021",
doi = "10.1007/978-3-030-91669-5_31",
language = "English",
isbn = "9783030916688",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature Switzerland AG",
pages = "401--410",
editor = "Hao-Ren Ke and Lee, {Chei Sian} and Kazunari Sugiyama",
booktitle = "Towards Open and Trustworthy Digital Societies",
address = "Switzerland",

}

Download

TY - GEN

T1 - Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles

AU - D’Souza, Jennifer

AU - Auer, Sören

N1 - Funding Information: Supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536).

PY - 2021

Y1 - 2021

N2 - We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.

AB - We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.

KW - Natural language processing

KW - Rule-based system

KW - Scholarly knowledge graphs

KW - Semantic publishing

KW - Terminology extraction

UR - http://www.scopus.com/inward/record.url?scp=85121912565&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-91669-5_31

DO - 10.1007/978-3-030-91669-5_31

M3 - Conference contribution

AN - SCOPUS:85121912565

SN - 9783030916688

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 401

EP - 410

BT - Towards Open and Trustworthy Digital Societies

A2 - Ke, Hao-Ren

A2 - Lee, Chei Sian

A2 - Sugiyama, Kazunari

PB - Springer Nature Switzerland AG

CY - Cham

T2 - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021

Y2 - 1 December 2021 through 3 December 2021

ER -

By the same author(s)