Estimating PageRank deviations in crawled graphs

Helge Holzmann; Avishek Anand; Megha Khosla

doi:10.1007/s41109-019-0201-9

Details

Originalsprache	Englisch
Aufsatznummer	86
Fachzeitschrift	Applied Network Science
Jahrgang	4
Ausgabenummer	1
Frühes Online-Datum	22 Okt. 2019
Publikationsstatus	Elektronisch veröffentlicht (E-Pub) - 22 Okt. 2019

Abstract

Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.

ASJC Scopus Sachgebiete

Allgemein
Informatik (insg.)
Computernetzwerke und -kommunikation
Mathematik (insg.)
Computational Mathematics

Zitieren

Estimating PageRank deviations in crawled graphs. / Holzmann, Helge; Anand, Avishek; Khosla, Megha.
in: Applied Network Science, Jahrgang 4, Nr. 1, 86, 22.10.2019.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Holzmann, H, Anand, A & Khosla, M 2019, 'Estimating PageRank deviations in crawled graphs', Applied Network Science, Jg. 4, Nr. 1, 86. https://doi.org/10.1007/s41109-019-0201-9, https://doi.org/10.15488/8793

Holzmann, H., Anand, A., & Khosla, M. (2019). Estimating PageRank deviations in crawled graphs. Applied Network Science, 4(1), Artikel 86. Vorabveröffentlichung online. https://doi.org/10.1007/s41109-019-0201-9, https://doi.org/10.15488/8793

Holzmann H, Anand A, Khosla M. Estimating PageRank deviations in crawled graphs. Applied Network Science. 2019 Okt 22;4(1):86. Epub 2019 Okt 22. doi: 10.1007/s41109-019-0201-9, 10.15488/8793

Holzmann, Helge ; Anand, Avishek ; Khosla, Megha. / Estimating PageRank deviations in crawled graphs. in: Applied Network Science. 2019 ; Jahrgang 4, Nr. 1.

Download

@article{f3ae6e301b2d4b67b569aa9c163b1089,

title = "Estimating PageRank deviations in crawled graphs",

abstract = "Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.",

keywords = "Crawls, PageRank, Ranking deviations",

author = "Helge Holzmann and Avishek Anand and Megha Khosla",

note = "Funding information: We thank the anonymous reviewers for their valuable comments which helped in improving the manuscript. The publication of this article was funded by the Open Access Fund of the Leibniz Universit{\"a}t Hannover. This work is partially funded by SoBigData (European Union{\textquoteright}s Horizon 2020 research and innovation programme under grant agreement No. 654024). Acknowledgements",

year = "2019",

month = oct,

day = "22",

doi = "10.1007/s41109-019-0201-9",

language = "English",

volume = "4",

number = "1",

}

Download

TY - JOUR

T1 - Estimating PageRank deviations in crawled graphs

AU - Holzmann, Helge

AU - Anand, Avishek

AU - Khosla, Megha

N1 - Funding information: We thank the anonymous reviewers for their valuable comments which helped in improving the manuscript. The publication of this article was funded by the Open Access Fund of the Leibniz Universität Hannover. This work is partially funded by SoBigData (European Union’s Horizon 2020 research and innovation programme under grant agreement No. 654024). Acknowledgements

PY - 2019/10/22

Y1 - 2019/10/22

N2 - Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.

AB - Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.

KW - Crawls

KW - PageRank

KW - Ranking deviations

UR - http://www.scopus.com/inward/record.url?scp=85074136590&partnerID=8YFLogxK

U2 - 10.1007/s41109-019-0201-9

DO - 10.1007/s41109-019-0201-9

M3 - Article

AN - SCOPUS:85074136590

VL - 4

JO - Applied Network Science

JF - Applied Network Science

IS - 1

M1 - 86

ER -

Research@Leibniz University

Estimating PageRank deviations in crawled graphs

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren