Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Vishnu Unnikrishnan
  • Christian Beyer
  • Pawel Matuszyk
  • Uli Niemann
  • Rüdiger Pryss
  • Winfried Schlee
  • Eirini Ntoutsi
  • Myra Spiliopoulou

Research Organisations

External Research Organisations

  • Otto-von-Guericke University Magdeburg
  • Ulm University
  • University of Regensburg
View graph of relations

Details

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalInternational Journal of Data Science and Analytics
Volume9
Issue number1
Early online date22 Feb 2019
Publication statusPublished - Feb 2020

Abstract

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

Keywords

    Entity similarity, kNN, Stream classification

ASJC Scopus subject areas

Cite this

Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. / Unnikrishnan, Vishnu; Beyer, Christian; Matuszyk, Pawel et al.
In: International Journal of Data Science and Analytics, Vol. 9, No. 1, 02.2020, p. 1-15.

Research output: Contribution to journalArticleResearchpeer review

Unnikrishnan, V, Beyer, C, Matuszyk, P, Niemann, U, Pryss, R, Schlee, W, Ntoutsi, E & Spiliopoulou, M 2020, 'Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity', International Journal of Data Science and Analytics, vol. 9, no. 1, pp. 1-15. https://doi.org/10.1007/s41060-019-00177-1
Unnikrishnan, V., Beyer, C., Matuszyk, P., Niemann, U., Pryss, R., Schlee, W., Ntoutsi, E., & Spiliopoulou, M. (2020). Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics, 9(1), 1-15. https://doi.org/10.1007/s41060-019-00177-1
Unnikrishnan V, Beyer C, Matuszyk P, Niemann U, Pryss R, Schlee W et al. Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics. 2020 Feb;9(1):1-15. Epub 2019 Feb 22. doi: 10.1007/s41060-019-00177-1
Unnikrishnan, Vishnu ; Beyer, Christian ; Matuszyk, Pawel et al. / Entity-level stream classification : exploiting entity similarity to label the future observations referring to an entity. In: International Journal of Data Science and Analytics. 2020 ; Vol. 9, No. 1. pp. 1-15.
Download
@article{0afbd8e82e7b4a71887cf2b273482a08,
title = "Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity",
abstract = "Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.",
keywords = "Entity similarity, kNN, Stream classification",
author = "Vishnu Unnikrishnan and Christian Beyer and Pawel Matuszyk and Uli Niemann and R{\"u}diger Pryss and Winfried Schlee and Eirini Ntoutsi and Myra Spiliopoulou",
note = "Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project{\textquoteright}s principal investigators.",
year = "2020",
month = feb,
doi = "10.1007/s41060-019-00177-1",
language = "English",
volume = "9",
pages = "1--15",
number = "1",

}

Download

TY - JOUR

T1 - Entity-level stream classification

T2 - exploiting entity similarity to label the future observations referring to an entity

AU - Unnikrishnan, Vishnu

AU - Beyer, Christian

AU - Matuszyk, Pawel

AU - Niemann, Uli

AU - Pryss, Rüdiger

AU - Schlee, Winfried

AU - Ntoutsi, Eirini

AU - Spiliopoulou, Myra

N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.

PY - 2020/2

Y1 - 2020/2

N2 - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

AB - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

KW - Entity similarity

KW - kNN

KW - Stream classification

UR - http://www.scopus.com/inward/record.url?scp=85078408482&partnerID=8YFLogxK

U2 - 10.1007/s41060-019-00177-1

DO - 10.1007/s41060-019-00177-1

M3 - Article

AN - SCOPUS:85078408482

VL - 9

SP - 1

EP - 15

JO - International Journal of Data Science and Analytics

JF - International Journal of Data Science and Analytics

SN - 2364-415X

IS - 1

ER -