Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Vishnu Unnikrishnan; Christian Beyer; Pawel Matuszyk; Uli Niemann; Rüdiger Pryss; Winfried Schlee; Eirini Ntoutsi; Myra Spiliopoulou

doi:10.1007/s41060-019-00177-1

Details

Original language	English
Pages (from-to)	1-15
Number of pages	15
Journal	International Journal of Data Science and Analytics
Volume	9
Issue number	1
Early online date	22 Feb 2019
Publication status	Published - Feb 2020

Abstract

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

Keywords

Entity similarity, kNN, Stream classification

ASJC Scopus subject areas

Computer Science(all)
Computer Science Applications
Computer Science(all)
Computational Theory and Mathematics
Computer Science(all)
Information Systems
Mathematics(all)
Modelling and Simulation
Mathematics(all)
Applied Mathematics

Cite this

Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. / Unnikrishnan, Vishnu; Beyer, Christian; Matuszyk, Pawel et al.
In: International Journal of Data Science and Analytics, Vol. 9, No. 1, 02.2020, p. 1-15.

Research output: Contribution to journal › Article › Research › peer review

Unnikrishnan, V, Beyer, C, Matuszyk, P, Niemann, U, Pryss, R, Schlee, W, Ntoutsi, E & Spiliopoulou, M 2020, 'Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity', International Journal of Data Science and Analytics, vol. 9, no. 1, pp. 1-15. https://doi.org/10.1007/s41060-019-00177-1

Unnikrishnan, V., Beyer, C., Matuszyk, P., Niemann, U., Pryss, R., Schlee, W., Ntoutsi, E., & Spiliopoulou, M. (2020). Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics, 9(1), 1-15. https://doi.org/10.1007/s41060-019-00177-1

Unnikrishnan V, Beyer C, Matuszyk P, Niemann U, Pryss R, Schlee W et al. Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics. 2020 Feb;9(1):1-15. Epub 2019 Feb 22. doi: 10.1007/s41060-019-00177-1

Unnikrishnan, Vishnu ; Beyer, Christian ; Matuszyk, Pawel et al. / Entity-level stream classification : exploiting entity similarity to label the future observations referring to an entity. In: International Journal of Data Science and Analytics. 2020 ; Vol. 9, No. 1. pp. 1-15.

Download

@article{0afbd8e82e7b4a71887cf2b273482a08,

title = "Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity",

abstract = "Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.",

keywords = "Entity similarity, kNN, Stream classification",

author = "Vishnu Unnikrishnan and Christian Beyer and Pawel Matuszyk and Uli Niemann and R{\"u}diger Pryss and Winfried Schlee and Eirini Ntoutsi and Myra Spiliopoulou",

note = "Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project{\textquoteright}s principal investigators.",

year = "2020",

month = feb,

doi = "10.1007/s41060-019-00177-1",

language = "English",

volume = "9",

pages = "1--15",

number = "1",

}

Download

TY - JOUR

T1 - Entity-level stream classification

T2 - exploiting entity similarity to label the future observations referring to an entity

AU - Unnikrishnan, Vishnu

AU - Beyer, Christian

AU - Matuszyk, Pawel

AU - Niemann, Uli

AU - Pryss, Rüdiger

AU - Schlee, Winfried

AU - Ntoutsi, Eirini

AU - Spiliopoulou, Myra

N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.

PY - 2020/2

Y1 - 2020/2

N2 - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

AB - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

KW - Entity similarity

KW - kNN

KW - Stream classification

UR - http://www.scopus.com/inward/record.url?scp=85078408482&partnerID=8YFLogxK

U2 - 10.1007/s41060-019-00177-1

DO - 10.1007/s41060-019-00177-1

M3 - Article

AN - SCOPUS:85078408482

VL - 9

SP - 1

EP - 15

JO - International Journal of Data Science and Analytics

JF - International Journal of Data Science and Analytics

SN - 2364-415X

IS - 1

ER -

Research@Leibniz University

Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this