Details
Original language | English |
---|---|
Title of host publication | 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) |
Subtitle of host publication | MSR |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 334-346 |
Number of pages | 13 |
ISBN (electronic) | 9798350311846 |
ISBN (print) | 979-8-3503-1185-3 |
Publication status | Published - 2023 |
Event | 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023 - Melbourne, Australia Duration: 15 May 2023 → 16 May 2023 |
Abstract
Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.
Keywords
- bytecode, graph transformer, heterogeneous graph learning, smart contracts, source code, vulnerability detection
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Engineering(all)
- Safety, Risk, Reliability and Quality
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): MSR. Institute of Electrical and Electronics Engineers Inc., 2023. p. 334-346.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - MANDO-HGT
T2 - 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023
AU - Nguyen, Hoang H.
AU - Nguyen, Nhat Minh
AU - Xie, Chunyao
AU - Ahmadi, Zahra
AU - Kudendo, Daniel
AU - Doan, Thanh Nam
AU - Jiang, Lingxiao
N1 - Funding Information: Acknowledgments. This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022) and by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the Lee Kong Chian Fellowship. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the authors and do not reflect the views of any of the grantors. We also thank all the anonymous reviewers for their insightful feedback on our paper.
PY - 2023
Y1 - 2023
N2 - Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.
AB - Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.
KW - bytecode
KW - graph transformer
KW - heterogeneous graph learning
KW - smart contracts
KW - source code
KW - vulnerability detection
UR - http://www.scopus.com/inward/record.url?scp=85166351291&partnerID=8YFLogxK
U2 - 10.1109/MSR59073.2023.00052
DO - 10.1109/MSR59073.2023.00052
M3 - Conference contribution
AN - SCOPUS:85166351291
SN - 979-8-3503-1185-3
SP - 334
EP - 346
BT - 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 May 2023 through 16 May 2023
ER -