Details
Original language | English |
---|---|
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | IEEE Transactions on Computational Social Systems |
Early online date | 13 May 2024 |
Publication status | E-pub ahead of print - 13 May 2024 |
Abstract
Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.
Keywords
- Blogs, Classification, Computational modeling, Data mining, Diseases, epidemic, Feature extraction, health crisis, microblogs, Social networking (online), Transformers, trustworthy systems
ASJC Scopus subject areas
- Mathematics(all)
- Modelling and Simulation
- Social Sciences(all)
- Social Sciences (miscellaneous)
- Computer Science(all)
- Human-Computer Interaction
Sustainable Development Goals
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: IEEE Transactions on Computational Social Systems, 13.05.2024, p. 1-13.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs
AU - Nguyen, Thi Huyen
AU - Fisichella, Marco
AU - Rudra, Koustav
N1 - Publisher Copyright: IEEE
PY - 2024/5/13
Y1 - 2024/5/13
N2 - Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.
AB - Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.
KW - Blogs
KW - Classification
KW - Computational modeling
KW - Data mining
KW - Diseases
KW - epidemic
KW - Feature extraction
KW - health crisis
KW - microblogs
KW - Social networking (online)
KW - Transformers
KW - trustworthy systems
UR - http://www.scopus.com/inward/record.url?scp=85193299864&partnerID=8YFLogxK
U2 - 10.1109/TCSS.2024.3391395
DO - 10.1109/TCSS.2024.3391395
M3 - Article
AN - SCOPUS:85193299864
SP - 1
EP - 13
JO - IEEE Transactions on Computational Social Systems
JF - IEEE Transactions on Computational Social Systems
ER -