Multimodal analytics for real-world news using measures of cross-modal entity consistency

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Eric Müller-Budack
  • Jonas Theiner
  • Sebastian Diering
  • Maximilian Idahl
  • Ralph Ewerth

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationICMR 2020
Subtitle of host publicationProceedings of the 2020 International Conference on Multimedia Retrieval
Place of PublicationNew York
Pages16-25
Number of pages10
ISBN (electronic)9781450370875
Publication statusPublished - 8 Jun 2020
Event10th ACM International Conference on Multimedia Retrieval, ICMR 2020 - Dublin, Ireland
Duration: 8 Jun 202011 Jun 2020

Abstract

The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.

Keywords

    Cross-modal consistency, Cross-modal entity verification, Deep learning, Image repurposing detection, Multimodal retrieval

ASJC Scopus subject areas

Cite this

Multimodal analytics for real-world news using measures of cross-modal entity consistency. / Müller-Budack, Eric; Theiner, Jonas; Diering, Sebastian et al.
ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York, 2020. p. 16-25.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Müller-Budack, E, Theiner, J, Diering, S, Idahl, M & Ewerth, R 2020, Multimodal analytics for real-world news using measures of cross-modal entity consistency. in ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York, pp. 16-25, 10th ACM International Conference on Multimedia Retrieval, ICMR 2020, Dublin, Ireland, 8 Jun 2020. https://doi.org/10.1145/3372278.3390670
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., & Ewerth, R. (2020). Multimodal analytics for real-world news using measures of cross-modal entity consistency. In ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 16-25). https://doi.org/10.1145/3372278.3390670
Müller-Budack E, Theiner J, Diering S, Idahl M, Ewerth R. Multimodal analytics for real-world news using measures of cross-modal entity consistency. In ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York. 2020. p. 16-25 doi: 10.1145/3372278.3390670
Müller-Budack, Eric ; Theiner, Jonas ; Diering, Sebastian et al. / Multimodal analytics for real-world news using measures of cross-modal entity consistency. ICMR 2020: Proceedings of the 2020 International Conference on Multimedia Retrieval. New York, 2020. pp. 16-25
Download
@inproceedings{459f31b0e9914e84a4bf1f175e2b900f,
title = "Multimodal analytics for real-world news using measures of cross-modal entity consistency",
abstract = "The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.",
keywords = "Cross-modal consistency, Cross-modal entity verification, Deep learning, Image repurposing detection, Multimodal retrieval",
author = "Eric M{\"u}ller-Budack and Jonas Theiner and Sebastian Diering and Maximilian Idahl and Ralph Ewerth",
note = "Funding information: This project has received funding from the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the Marie Sk?odowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universit{\"a}t Hannover) for his valuable comments that improved the quality of the paper. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universitt Hannover) for his valuable comments that improved the quality of the paper. ; 10th ACM International Conference on Multimedia Retrieval, ICMR 2020 ; Conference date: 08-06-2020 Through 11-06-2020",
year = "2020",
month = jun,
day = "8",
doi = "10.1145/3372278.3390670",
language = "English",
pages = "16--25",
booktitle = "ICMR 2020",

}

Download

TY - GEN

T1 - Multimodal analytics for real-world news using measures of cross-modal entity consistency

AU - Müller-Budack, Eric

AU - Theiner, Jonas

AU - Diering, Sebastian

AU - Idahl, Maximilian

AU - Ewerth, Ralph

N1 - Funding information: This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk?odowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universität Hannover) for his valuable comments that improved the quality of the paper. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no 812997, and the the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: 388420599). We are very grateful to Avishek Anand (L3S Research Center, Leibniz Universitt Hannover) for his valuable comments that improved the quality of the paper.

PY - 2020/6/8

Y1 - 2020/6/8

N2 - The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.

AB - The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. The photos can be decorative, depict additional details, or even contain misleading information. Quantifying the cross-modal consistency of entity representations can assist human assessors in evaluating the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of these entities with the news photo, using state-of-the-art computer vision approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. The feasibility is demonstrated on two novel datasets that cover different languages, topics, and domains.

KW - Cross-modal consistency

KW - Cross-modal entity verification

KW - Deep learning

KW - Image repurposing detection

KW - Multimodal retrieval

UR - http://www.scopus.com/inward/record.url?scp=85086904454&partnerID=8YFLogxK

U2 - 10.1145/3372278.3390670

DO - 10.1145/3372278.3390670

M3 - Conference contribution

AN - SCOPUS:85086904454

SP - 16

EP - 25

BT - ICMR 2020

CY - New York

T2 - 10th ACM International Conference on Multimedia Retrieval, ICMR 2020

Y2 - 8 June 2020 through 11 June 2020

ER -