Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer

Oskar Pusz

doi:10.15488/17924

Details

Originalsprache	Englisch
Qualifikation	Doktor der Ingenieurwissenschaften
Gradverleihende Hochschule	Leibniz Universität Hannover
Betreut von	Daniel Lohmann, Betreuer*in
Datum der Verleihung des Grades	19 Aug. 2024
Erscheinungsort	Hannover
Publikationsstatus	Veröffentlicht - 27 Aug. 2024

Abstract

Aufgrund immer kleinerer Transistorgrößen und Betriebsspannungen wird Hardware immer anfäl- liger für transiente Hardwarefehler. Im Bereich sicherheitskritischer Systeme haben sich Fehlerin- jektionskampagnen auf der Ebene der Befehlssatzarchitektur zu einem weit verbreiteten Ansatz entwickelt, um die Funktionssicherheit eines Systems im Hinblick auf diese Art von Fehlern zu be- werten. Vollständige Fehlerinjektionskampagnen sind ein Ansatz zur systematischen Bewertung der Zuverlässigkeit eines Systems einerseits und der Effektivität von implementierten softwarebasierten Härtungstechniken auf im Vorhinein spezifizierter Hardware andererseits. Eine naiv ausgeführte Fehlerinjektionskampagne führt durch die schiere Menge an Injektionen allerdings schnell zu praktisch nicht realisierbaren Laufzeiten, insbesondere wenn eine umfassende und vollständige Funktionssicherheitsanalyse des zu testenden Systems angestrebt wird. Um dem entgegenzuwirken, sind etablierte Beschleunigungsmethoden üblich, die entweder die Anzahl not- wendiger Fehlerinjektionen reduzieren oder individuelle Injektionen beschleunigen, was letztlich die Gesamtlaufzeit einer gesamten Kampagne reduziert. Es kann jedoch vorkommen – trotz der Effektivität von etablierten Methoden –, dass die Kampa- gnen immer noch zu lange Laufzeiten besitzen, dass die Ergebnisse nicht präzise genug sind oder dass der Fokus nur auf bestimmte Aspekte des zu testenden Systems beschränkt ist. Diese Dissertation stellt drei neue Ansätze vor, die darauf abzielen, diese Herausforderungen zu bewältigen. Dazu verwenden sie Programmstrukturen, die aus gegebener Software extrahiert werden. Diese sind auf das laufende Programm zugeschnitten und unabhängig vom zu evaluierenden Systemverhalten. Der erste Beitrag extrahiert Datenflüsse und Instruktionssemantik, um Propagations- und Mas- kierungseffekte von Instruktionen über einen Datenflussgrafen zu nutzen. Im Vergleich zur Referenz- methode reduziert meine datenflusssensitive Beschleunigungsmethode die Anzahl der notwendigen Injektionen für eine vollständige Funktionssicherheitsanalyse präzise um bis zu 18,4 Prozent. Der zweite Beitrag nutzt extrahierte dynamische Sprungadressen als Repräsentanten des Kon- trollflusses und partitioniert die Programmausführung in zeitliche Segmente. Diese Segmente heißen Fehlerraum-Regionen und agieren als eigenständige Entitäten, von denen jede ihren eigenen Daten- fluss hat, der potenziell von einer zur nächsten fließt. Die Injektion der austretenden Datenflüsse und die Approximation ihrer Ergebnisse auf die anderen Datenflüsse, die eine Region nicht ver- lassen, führt zu einer systemweiten Reduktion der Injektionen von bis zu 77,5 Prozent mit einem Approximationsfehler von nur 2 Prozent und einer starken Lokalität der Ergebnisse. Der letzte Beitrag konzentriert sich auf die Beschleunigung individueller Injektionen, die nicht zur Terminierung eines Systems führen, und daher nach einer festgelegten Frist (Timeout) beendet werden. Dazu präsentiere ich eine Analyse von Timeouts in diesem Zusammenhang und erste Ansätze zur Vorhersage solcher während der Laufzeit. Der finale Teil dieses Beitrags ist der Timeout- Detektor ACTOR. Dieser Detektor nutzt Autokorrelation, um Muster in den genommenen Sprüngen des Programms zu erkennen, um damit approximativ zu bestimmen, ob sich das Programm in einer Schleife befindet. ACTOR ist in der Lage, durch Timeout-Vorhersagen für individuelle Injektionen Ende-zu-Ende Kampagnenlaufzeitbeschleunigungen von bis zu 27,6 Prozent zu erreichen. Dabei ist der absolute Fehler in den Vorhersagen durchgehend bei unter 0,5 Prozent. Die in dieser Arbeit entwickelten Methoden erweitern das Gesamtportfolio potenzieller Beschleu- nigungsmethoden in der Forschungsgemeinschaft der Fehlerinjektion. Diese generisch konzipierten Methoden, die in der Ebene der Befehlssatzarchitektur implementiert und evaluiert wurden, können konzeptionell auch auf andere Systemebenen angewendet werden. Sie sind vielseitig einsetzbar und lassen sich sowohl untereinander als auch mit etablierten Beschleunigungsmethoden nahtlos kombinieren.

Zitieren

Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer. / Pusz, Oskar.
Hannover, 2024. 207 S.

Publikation: Qualifikations-/Studienabschlussarbeit › Dissertation

Pusz, O 2024, 'Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer', Doktor der Ingenieurwissenschaften, Gottfried Wilhelm Leibniz Universität Hannover, Hannover. https://doi.org/10.15488/17924

Pusz, O. (2024). Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer. [Dissertation, Gottfried Wilhelm Leibniz Universität Hannover]. https://doi.org/10.15488/17924

Pusz O. Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer. Hannover, 2024. 207 S. doi: 10.15488/17924

Pusz, Oskar. / Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer. Hannover, 2024. 207 S.

Download

@phdthesis{b195d784d94540b09ecd1508707ec9b5,

title = "Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer",

abstract = "Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions{\textquoteright} propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program{\textquoteright}s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program{\textquoteright}s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.",

author = "Oskar Pusz",

year = "2024",

month = aug,

day = "27",

doi = "10.15488/17924",

language = "English",

school = "Leibniz University Hannover",

}

Download

TY - BOOK

T1 - Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer

AU - Pusz, Oskar

PY - 2024/8/27

Y1 - 2024/8/27

N2 - Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions’ propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program’s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program’s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.

AB - Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions’ propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program’s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program’s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.

U2 - 10.15488/17924

DO - 10.15488/17924

M3 - Doctoral thesis

CY - Hannover

ER -

Research@Leibniz University

Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer

Autorschaft

Organisationseinheiten

Details

Abstract

Zitieren