Identification of Speaker Roles and Situation Types in News Videos

Gullal S. Cheema; Judi Arafat; Chiao I. Tseng; John A. Bateman; Ralph Ewerth; Eric Müller-Budack

doi:10.1145/3652583.3658101

Details

Original language	English
Title of host publication	ICMR '24
Subtitle of host publication	Proceedings of the 2024 International Conference on Multimedia Retrieval
Pages	506-514
Number of pages	9
ISBN (electronic)	9798400706028
Publication status	Published - 7 Jun 2024
Event	2024 International Conference on Multimedia Retrieval, ICMR 2024 - Phuket, Thailand Duration: 10 Jun 2024 → 14 Jun 2024

Abstract

The proliferation of news sources on the web amplifies the problem of disinformation and misinformation, impacting public perception and societal stability. These issues necessitate the identification of bias in news broadcasts, whereby the analysis and understanding of speaker roles and news contexts are essential prerequisites. Although there is prior research on multimodal speaker role recognition (mostly) in the news domain, modern feature representations have not been explored yet, and no comprehensive public dataset is available. In this paper, we propose novel approaches to classify speaker roles (e.g., “anchor," “reporter," “expert") and categorise scenes into news situations (e.g., “report," “interview") in news videos, to enhance the understanding of news content. To bridge the gap of missing datasets, we present a novel annotated dataset for various speaker roles and news situations from diverse (national) media outlets. Furthermore, we suggest a rich set of features and employ aggregation and post-processing techniques. In our experiments, we compare classifiers like Random Forest and XGBoost for identifying speaker roles and news situations in video segments. Our approach outperforms recent state-of-the-art methods, including end-to-end multimodal deep network and unimodal transformer-based models. Through detailed feature combination analysis, generalisation and explainability insights, we underscore our models’ capabilities and set new directions for future research.

Keywords

news situations, news videos, speaker roles, video classification

ASJC Scopus subject areas

Computer Science(all)
Computer Graphics and Computer-Aided Design
Computer Science(all)
Human-Computer Interaction
Computer Science(all)
Software

Cite this

Identification of Speaker Roles and Situation Types in News Videos. / Cheema, Gullal S.; Arafat, Judi; Tseng, Chiao I. et al.
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval. 2024. p. 506-514.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Cheema, GS, Arafat, J, Tseng, CI, Bateman, JA, Ewerth, R & Müller-Budack, E 2024, Identification of Speaker Roles and Situation Types in News Videos. in ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval. pp. 506-514, 2024 International Conference on Multimedia Retrieval, ICMR 2024, Phuket, Thailand, 10 Jun 2024. https://doi.org/10.1145/3652583.3658101

Cheema, G. S., Arafat, J., Tseng, C. I., Bateman, J. A., Ewerth, R., & Müller-Budack, E. (2024). Identification of Speaker Roles and Situation Types in News Videos. In ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval (pp. 506-514) https://doi.org/10.1145/3652583.3658101

Cheema GS, Arafat J, Tseng CI, Bateman JA, Ewerth R, Müller-Budack E. Identification of Speaker Roles and Situation Types in News Videos. In ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval. 2024. p. 506-514 doi: 10.1145/3652583.3658101

Cheema, Gullal S. ; Arafat, Judi ; Tseng, Chiao I. et al. / Identification of Speaker Roles and Situation Types in News Videos. ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval. 2024. pp. 506-514

Download

@inproceedings{771cdd7b89354b07b839c08ba7edc63b,

title = "Identification of Speaker Roles and Situation Types in News Videos",

abstract = "The proliferation of news sources on the web amplifies the problem of disinformation and misinformation, impacting public perception and societal stability. These issues necessitate the identification of bias in news broadcasts, whereby the analysis and understanding of speaker roles and news contexts are essential prerequisites. Although there is prior research on multimodal speaker role recognition (mostly) in the news domain, modern feature representations have not been explored yet, and no comprehensive public dataset is available. In this paper, we propose novel approaches to classify speaker roles (e.g., “anchor,{"} “reporter,{"} “expert{"}) and categorise scenes into news situations (e.g., “report,{"} “interview{"}) in news videos, to enhance the understanding of news content. To bridge the gap of missing datasets, we present a novel annotated dataset for various speaker roles and news situations from diverse (national) media outlets. Furthermore, we suggest a rich set of features and employ aggregation and post-processing techniques. In our experiments, we compare classifiers like Random Forest and XGBoost for identifying speaker roles and news situations in video segments. Our approach outperforms recent state-of-the-art methods, including end-to-end multimodal deep network and unimodal transformer-based models. Through detailed feature combination analysis, generalisation and explainability insights, we underscore our models{\textquoteright} capabilities and set new directions for future research.",

keywords = "news situations, news videos, speaker roles, video classification",

author = "Cheema, {Gullal S.} and Judi Arafat and Tseng, {Chiao I.} and Bateman, {John A.} and Ralph Ewerth and Eric M{\"u}ller-Budack",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright held by the owner/author(s).; 2024 International Conference on Multimedia Retrieval, ICMR 2024 ; Conference date: 10-06-2024 Through 14-06-2024",

year = "2024",

month = jun,

day = "7",

doi = "10.1145/3652583.3658101",

language = "English",

pages = "506--514",

booktitle = "ICMR '24",

}

Download

TY - GEN

T1 - Identification of Speaker Roles and Situation Types in News Videos

AU - Cheema, Gullal S.

AU - Arafat, Judi

AU - Tseng, Chiao I.

AU - Bateman, John A.

AU - Ewerth, Ralph

AU - Müller-Budack, Eric

PY - 2024/6/7

Y1 - 2024/6/7

N2 - The proliferation of news sources on the web amplifies the problem of disinformation and misinformation, impacting public perception and societal stability. These issues necessitate the identification of bias in news broadcasts, whereby the analysis and understanding of speaker roles and news contexts are essential prerequisites. Although there is prior research on multimodal speaker role recognition (mostly) in the news domain, modern feature representations have not been explored yet, and no comprehensive public dataset is available. In this paper, we propose novel approaches to classify speaker roles (e.g., “anchor," “reporter," “expert") and categorise scenes into news situations (e.g., “report," “interview") in news videos, to enhance the understanding of news content. To bridge the gap of missing datasets, we present a novel annotated dataset for various speaker roles and news situations from diverse (national) media outlets. Furthermore, we suggest a rich set of features and employ aggregation and post-processing techniques. In our experiments, we compare classifiers like Random Forest and XGBoost for identifying speaker roles and news situations in video segments. Our approach outperforms recent state-of-the-art methods, including end-to-end multimodal deep network and unimodal transformer-based models. Through detailed feature combination analysis, generalisation and explainability insights, we underscore our models’ capabilities and set new directions for future research.

AB - The proliferation of news sources on the web amplifies the problem of disinformation and misinformation, impacting public perception and societal stability. These issues necessitate the identification of bias in news broadcasts, whereby the analysis and understanding of speaker roles and news contexts are essential prerequisites. Although there is prior research on multimodal speaker role recognition (mostly) in the news domain, modern feature representations have not been explored yet, and no comprehensive public dataset is available. In this paper, we propose novel approaches to classify speaker roles (e.g., “anchor," “reporter," “expert") and categorise scenes into news situations (e.g., “report," “interview") in news videos, to enhance the understanding of news content. To bridge the gap of missing datasets, we present a novel annotated dataset for various speaker roles and news situations from diverse (national) media outlets. Furthermore, we suggest a rich set of features and employ aggregation and post-processing techniques. In our experiments, we compare classifiers like Random Forest and XGBoost for identifying speaker roles and news situations in video segments. Our approach outperforms recent state-of-the-art methods, including end-to-end multimodal deep network and unimodal transformer-based models. Through detailed feature combination analysis, generalisation and explainability insights, we underscore our models’ capabilities and set new directions for future research.

KW - news situations

KW - news videos

KW - speaker roles

KW - video classification

UR - http://www.scopus.com/inward/record.url?scp=85199157357&partnerID=8YFLogxK

U2 - 10.1145/3652583.3658101

DO - 10.1145/3652583.3658101

M3 - Conference contribution

AN - SCOPUS:85199157357

SP - 506

EP - 514

BT - ICMR '24

T2 - 2024 International Conference on Multimedia Retrieval, ICMR 2024

Y2 - 10 June 2024 through 14 June 2024

ER -

Research@Leibniz University

Identification of Speaker Roles and Situation Types in News Videos

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this