Details
Original language | English |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | International Journal of Data Science and Analytics |
Volume | 9 |
Issue number | 1 |
Early online date | 22 Feb 2019 |
Publication status | Published - Feb 2020 |
Abstract
Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
Keywords
- Entity similarity, kNN, Stream classification
ASJC Scopus subject areas
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Computational Theory and Mathematics
- Computer Science(all)
- Information Systems
- Mathematics(all)
- Modelling and Simulation
- Mathematics(all)
- Applied Mathematics
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: International Journal of Data Science and Analytics, Vol. 9, No. 1, 02.2020, p. 1-15.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Entity-level stream classification
T2 - exploiting entity similarity to label the future observations referring to an entity
AU - Unnikrishnan, Vishnu
AU - Beyer, Christian
AU - Matuszyk, Pawel
AU - Niemann, Uli
AU - Pryss, Rüdiger
AU - Schlee, Winfried
AU - Ntoutsi, Eirini
AU - Spiliopoulou, Myra
N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.
PY - 2020/2
Y1 - 2020/2
N2 - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
AB - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
KW - Entity similarity
KW - kNN
KW - Stream classification
UR - http://www.scopus.com/inward/record.url?scp=85078408482&partnerID=8YFLogxK
U2 - 10.1007/s41060-019-00177-1
DO - 10.1007/s41060-019-00177-1
M3 - Article
AN - SCOPUS:85078408482
VL - 9
SP - 1
EP - 15
JO - International Journal of Data Science and Analytics
JF - International Journal of Data Science and Analytics
SN - 2364-415X
IS - 1
ER -