Multimodal analytics for real-world news using measures of cross-modal entity consistency

Eric Müller-Budack; Jonas Theiner; Sebastian Diering; Maximilian Idahl; Ralph Ewerth

doi:10.1145/3372278.3390670

Details

Original language	English
Title of host publication	ICMR 2020
Subtitle of host publication	Proceedings of the 2020 International Conference on Multimedia Retrieval
Place of Publication	New York
Pages	16-25
Number of pages	10
ISBN (electronic)	9781450370875
Publication status	Published - 8 Jun 2020
Event	10th ACM International Conference on Multimedia Retrieval, ICMR 2020 - Dublin, Ireland Duration: 8 Jun 2020 → 11 Jun 2020

Abstract

The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.

Keywords

Cross-modal consistency, Cross-modal entity verification, Deep learning, Image repurposing detection, Multimodal retrieval

ASJC Scopus subject areas

Computer Science(all)
Computer Networks and Communications
Computer Science(all)
Computer Science Applications
Computer Science(all)
Computer Graphics and Computer-Aided Design

Cite this

Multimodal analytics for real-world news using measures of cross-modal entity consistency. / Müller-Budack, Eric; Theiner, Jonas; Diering, Sebastian et al.
ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York, 2020. p. 16-25.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Müller-Budack, E, Theiner, J, Diering, S, Idahl, M & Ewerth, R 2020, Multimodal analytics for real-world news using measures of cross-modal entity consistency. in ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York, pp. 16-25, 10th ACM International Conference on Multimedia Retrieval, ICMR 2020, Dublin, Ireland, 8 Jun 2020. https://doi.org/10.1145/3372278.3390670

Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., & Ewerth, R. (2020). Multimodal analytics for real-world news using measures of cross-modal entity consistency. In ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 16-25). https://doi.org/10.1145/3372278.3390670

Müller-Budack E, Theiner J, Diering S, Idahl M, Ewerth R. Multimodal analytics for real-world news using measures of cross-modal entity consistency. In ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York. 2020. p. 16-25 doi: 10.1145/3372278.3390670

Müller-Budack, Eric ; Theiner, Jonas ; Diering, Sebastian et al. / Multimodal analytics for real-world news using measures of cross-modal entity consistency. ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York, 2020. pp. 16-25

Download

@inproceedings{459f31b0e9914e84a4bf1f175e2b900f,

title = "Multimodal analytics for real-world news using measures of cross-modal entity consistency",

abstract = "The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.",

keywords = "Cross-modal consistency, Cross-modal entity verification, Deep learning, Image repurposing detection, Multimodal retrieval",

author = "Eric M{\"u}ller-Budack and Jonas Theiner and Sebastian Diering and Maximilian Idahl and Ralph Ewerth",

note = "Funding information: This project has received funding from the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the Marie Sk?odowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universit{\"a}t Hannover) for his valuable comments that improved the quality of the paper. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universitt Hannover) for his valuable comments that improved the quality of the paper. ; 10th ACM International Conference on Multimedia Retrieval, ICMR 2020 ; Conference date: 08-06-2020 Through 11-06-2020",

year = "2020",

month = jun,

day = "8",

doi = "10.1145/3372278.3390670",

language = "English",

pages = "16--25",

booktitle = "ICMR 2020",

}

Download

TY - GEN

T1 - Multimodal analytics for real-world news using measures of cross-modal entity consistency

AU - Müller-Budack, Eric

AU - Theiner, Jonas

AU - Diering, Sebastian

AU - Idahl, Maximilian

AU - Ewerth, Ralph

N1 - Funding information: This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk?odowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universität Hannover) for his valuable comments that improved the quality of the paper. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universitt Hannover) for his valuable comments that improved the quality of the paper.

PY - 2020/6/8

Y1 - 2020/6/8

N2 - The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.

AB - The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.

KW - Cross-modal consistency

KW - Cross-modal entity verification

KW - Deep learning

KW - Image repurposing detection

KW - Multimodal retrieval

UR - http://www.scopus.com/inward/record.url?scp=85086904454&partnerID=8YFLogxK

U2 - 10.1145/3372278.3390670

DO - 10.1145/3372278.3390670

M3 - Conference contribution

AN - SCOPUS:85086904454

SP - 16

EP - 25

BT - ICMR 2020

CY - New York

T2 - 10th ACM International Conference on Multimedia Retrieval, ICMR 2020

Y2 - 8 June 2020 through 11 June 2020

ER -

Research@Leibniz University

Multimodal analytics for real-world news using measures of cross-modal entity consistency

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this