Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning

Felix Mohr; Marcel Wever; Alexander Tornede; Eyke Hullermeier

doi:10.1109/tpami.2021.3056950

Details

Original language	English
Article number	9347828
Pages (from-to)	3055-3066
Number of pages	12
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	43
Issue number	9
Publication status	Published - 1 Sept 2021
Externally published	Yes

Abstract

Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.

Keywords

Automated machine learning, hierarchical runtime prediction, runtime prediction for classifiers and pipelines

ASJC Scopus subject areas

Computer Science(all)
Software
Computer Science(all)
Computer Vision and Pattern Recognition
Computer Science(all)
Computational Theory and Mathematics
Computer Science(all)
Artificial Intelligence
Mathematics(all)
Applied Mathematics

Cite this

Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning. / Mohr, Felix; Wever, Marcel; Tornede, Alexander et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, No. 9, 9347828, 01.09.2021, p. 3055-3066.

Research output: Contribution to journal › Article › Research › peer review

Mohr, F, Wever, M, Tornede, A & Hullermeier, E 2021, 'Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 9, 9347828, pp. 3055-3066. https://doi.org/10.1109/tpami.2021.3056950

Mohr, F., Wever, M., Tornede, A., & Hullermeier, E. (2021). Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3055-3066. Article 9347828. https://doi.org/10.1109/tpami.2021.3056950

Mohr F, Wever M, Tornede A, Hullermeier E. Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021 Sept 1;43(9):3055-3066. 9347828. doi: 10.1109/tpami.2021.3056950

Mohr, Felix ; Wever, Marcel ; Tornede, Alexander et al. / Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021 ; Vol. 43, No. 9. pp. 3055-3066.

Download

@article{49316f6e9f9d449f846952458027bee8,

title = "Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning",

abstract = "Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.",

keywords = "Automated machine learning, hierarchical runtime prediction, runtime prediction for classifiers and pipelines",

author = "Felix Mohr and Marcel Wever and Alexander Tornede and Eyke Hullermeier",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2021",

month = sep,

day = "1",

doi = "10.1109/tpami.2021.3056950",

language = "English",

volume = "43",

pages = "3055--3066",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "9",

}

Download

TY - JOUR

T1 - Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning

AU - Mohr, Felix

AU - Wever, Marcel

AU - Tornede, Alexander

AU - Hullermeier, Eyke

PY - 2021/9/1

Y1 - 2021/9/1

N2 - Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.

AB - Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.

KW - Automated machine learning

KW - hierarchical runtime prediction

KW - runtime prediction for classifiers and pipelines

UR - http://www.scopus.com/inward/record.url?scp=85100855731&partnerID=8YFLogxK

U2 - 10.1109/tpami.2021.3056950

DO - 10.1109/tpami.2021.3056950

M3 - Article

C2 - 33539291

VL - 43

SP - 3055

EP - 3066

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 9

M1 - 9347828

ER -

Research@Leibniz University

Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning

Authors

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Position: Why We Must Rethink Empirical Research in Machine Learning

A Survey of Methods for Automated Algorithm Configuration (Extended Abstract)

Annotation uncertainty in the context of grammatical change

Hyperparameter optimization of two-branch neural networks in multi-target prediction

Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO

Position: Why We Must Rethink Empirical Research in Machine Learning

A Survey of Methods for Automated Algorithm Configuration (Extended Abstract)

Annotation uncertainty in the context of grammatical change

Hyperparameter optimization of two-branch neural networks in multi-target prediction

Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO

Position: Why We Must Rethink Empirical Research in Machine Learning