Load Balancing in Compute Clusters With Delayed Feedback

Anam Tahir; Bastian Alt; Amr Rizk; Heinz Koeppl

doi:10.1109/TC.2022.3215907

Details

Original language	English
Pages (from-to)	1610-1622
Number of pages	13
Journal	IEEE transactions on computers
Volume	72
Issue number	6
Early online date	19 Oct 2022
Publication status	Published - 1 Jun 2023
Externally published	Yes

Abstract

Load balancing arises as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, Big Data and queueing systems. In essence, the decision-making agent maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal load balancing policies here is that the agent only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the load balancing decisions in parallel buffered systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a load balancing policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information load balancing strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the-Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show that our approach can optimise the real-time parallel processing by using network data provided by Kaggle.

Keywords

Parallel systems, load balancing, partial observability

ASJC Scopus subject areas

Computer Science(all)
Software
Mathematics(all)
Theoretical Computer Science
Computer Science(all)
Hardware and Architecture
Computer Science(all)
Computational Theory and Mathematics

Cite this

Load Balancing in Compute Clusters With Delayed Feedback. / Tahir, Anam; Alt, Bastian; Rizk, Amr et al.
In: IEEE transactions on computers, Vol. 72, No. 6, 01.06.2023, p. 1610-1622.

Research output: Contribution to journal › Article › Research › peer review

Tahir, A, Alt, B, Rizk, A & Koeppl, H 2023, 'Load Balancing in Compute Clusters With Delayed Feedback', IEEE transactions on computers, vol. 72, no. 6, pp. 1610-1622. https://doi.org/10.1109/TC.2022.3215907

Tahir, A., Alt, B., Rizk, A., & Koeppl, H. (2023). Load Balancing in Compute Clusters With Delayed Feedback. IEEE transactions on computers, 72(6), 1610-1622. https://doi.org/10.1109/TC.2022.3215907

Tahir A, Alt B, Rizk A, Koeppl H. Load Balancing in Compute Clusters With Delayed Feedback. IEEE transactions on computers. 2023 Jun 1;72(6):1610-1622. Epub 2022 Oct 19. doi: 10.1109/TC.2022.3215907

Tahir, Anam ; Alt, Bastian ; Rizk, Amr et al. / Load Balancing in Compute Clusters With Delayed Feedback. In: IEEE transactions on computers. 2023 ; Vol. 72, No. 6. pp. 1610-1622.

Download

@article{f016c1bc5d6e45b3aebc4f422f8c879d,

title = "Load Balancing in Compute Clusters With Delayed Feedback",

abstract = "Load balancing arises as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, Big Data and queueing systems. In essence, the decision-making agent maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal load balancing policies here is that the agent only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the load balancing decisions in parallel buffered systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a load balancing policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information load balancing strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the-Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show that our approach can optimise the real-time parallel processing by using network data provided by Kaggle.",

keywords = "Parallel systems, load balancing, partial observability",

author = "Anam Tahir and Bastian Alt and Amr Rizk and Heinz Koeppl",

note = "Publisher Copyright: {\textcopyright} 1968-2012 IEEE.",

year = "2023",

month = jun,

day = "1",

doi = "10.1109/TC.2022.3215907",

language = "English",

volume = "72",

pages = "1610--1622",

journal = "IEEE transactions on computers",

issn = "0018-9340",

publisher = "IEEE Computer Society",

number = "6",

}

Download

TY - JOUR

T1 - Load Balancing in Compute Clusters With Delayed Feedback

AU - Tahir, Anam

AU - Alt, Bastian

AU - Rizk, Amr

AU - Koeppl, Heinz

PY - 2023/6/1

Y1 - 2023/6/1

N2 - Load balancing arises as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, Big Data and queueing systems. In essence, the decision-making agent maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal load balancing policies here is that the agent only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the load balancing decisions in parallel buffered systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a load balancing policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information load balancing strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the-Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show that our approach can optimise the real-time parallel processing by using network data provided by Kaggle.

AB - Load balancing arises as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, Big Data and queueing systems. In essence, the decision-making agent maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal load balancing policies here is that the agent only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the load balancing decisions in parallel buffered systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a load balancing policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information load balancing strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the-Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show that our approach can optimise the real-time parallel processing by using network data provided by Kaggle.

KW - Parallel systems

KW - load balancing

KW - partial observability

UR - http://www.scopus.com/inward/record.url?scp=85140789519&partnerID=8YFLogxK

U2 - 10.1109/TC.2022.3215907

DO - 10.1109/TC.2022.3215907

M3 - Article

VL - 72

SP - 1610

EP - 1622

JO - IEEE transactions on computers

JF - IEEE transactions on computers

SN - 0018-9340

IS - 6

ER -

Research@Leibniz University

Load Balancing in Compute Clusters With Delayed Feedback

Authors

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this