NEVILLE GRECH, University of Malta, Malta and Dedaub Ltd
SIFIS LAGOUVARDOS, University of Athens, Greece and Dedaub Ltd
ILIAS TSATIRIS, University of Athens, Greece and Dedaub Ltd
YANNIS SMARAGDAKIS, University of Athens, Greece and Dedaub Ltd
Smart contracts on the Ethereum blockchain greatly benefit from cutting-edge analysis techniques and pose significant challenges. A primary challenge is the extremely low-level representation of deployed contracts. We present Elipmoc, a decompiler for the next generation of smart contract analyses. Elipmoc is an evolution of Gigahorse, the top research decompiler, dramatically improving over it and over other state-of-the-art tools, by employing several high-precision techniques and making them scalable. Among these techniques are a new kind of context sensitivity (termed “transactional sensitivity”) that provides a more effective static abstraction of distinct dynamic executions; a path-sensitive (yet scalable, through path merging) algorithm for inference of function arguments and returns; and a fully context sensitive private function reconstruction process. As a result, smart contract security analyses and reverse-engineering tools built on top of Elipmoc achieve high scalability, precision and completeness. Elipmoc improves over all notable past decompilers, including its predecessor, Gigahorse, and the state-of-the-art industrial tool, Panoramix, integrated into the primary Ethereum blockchain explorer, Etherscan. Elipmoc produces decompiled contracts with fully resolved operands at a rate of 99.5% (compared to 62.8% for Gigahorse), and achieves much higher completeness in code decompilation than Panoramix—e.g., up to 67% more coverage of external call statements—while being over 5x faster. Elipmoc has been the enabler for recent (independent) discoveries of several exploitable vulnerabilities on popular protocols, over funds in the many millions of dollars.
Additional Key Words and Phrases: Program Analysis, Smart Contracts, Decompilation, Datalog, Security, Ethereum, Blockchain
1 INTRODUCTION
Decentralized financial applications, built using smart contracts running on programmable blockchains (most notably Ethereum), are starting to rival traditional financial systems. Therefore coding or logical errors in smart contracts can have large financial implications. This makes smart contracts primary targets for automated analysis and verification tasks.
More specifically, the analysis of smart contracts as-deployed, i.e., by taking their binary form as input, is attractive for several reasons. First, it offers significant generality: many deployed smart contracts have no publicly released source code. Security analysts routinely need to inspect such contracts (e.g., bots) to determine their functionality. Second, analyzing contracts at the bytecode level offers a uniform platform for analysis, not distinguishing between source languages and source language versions. This is a major factor, given that (the dominant Ethereum language)
Authors’ addresses: Neville Grech, University of Malta, Malta and Dedaub Ltd, me@nevillegrech.com; Sifis Lagouvardos, University of Athens, Greece and Dedaub Ltd, sifis.lag@di.uoa.gr; Ilias Tsatiris, University of Athens, Greece and Dedaub Ltd, i.tsatiris@di.uoa.gr; Yannis Smaragdakis, University of Athens, Greece and Dedaub Ltd, smaragd@di.uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2022 Copyright held by the owner/author(s). 2475-1421/2022/4-ART77 https://doi.org/10.1145/3527321 Proc. ACM Program. Lang., Vol. 6, No. OOPSLA1, Article 77. Publication date: April 2022. 77:2 Neville Grech, Sifis Lagouvardos, Ilias Tsatiris, and Yannis Smaragdakis Solidity does not maintain source compatibility across versions. Source-level tools very quickly become obsolete.
Bytecode decompilation offers by far the most general substrate for automated smart contract analyses. Decompilation of the Ethereum Virtual Machine (EVM) bytecode is an open problem for the security community. Existing approaches lack in completeness, precision, or scalability. For in-stance, the most-used decompiler in practice, etherscan.io’s Panoramix [Kolinko and Palkeo 2020], decompiles complex contracts only partially, with much of the low-level code often not reflected in the decompiled version. Past academic research tools, such as the Gigahorse decompiler [Grech et al. 2019a], sacrifice precision and occasionally even scalabily. Lower precision results in low-quality decompilation, for instance, operations with unknown operands, or jumps with many infeasible targets.
To illustrate such shortcomings, our experiments show that these leading past decompilers manage to decompile under 20% (Panoramix: 18.2%, Gigahorse: 19.8%) of Ethereum contracts of large size (over 20KB). Technically, the low-level nature of the EVM poses a challenge for any decompilation effort. The EVM uses a single stack for calls and local operations, without any guarantees about its well-formedness. The only control flow operators are unstructured jumps. All calls, returns, loops, and conditionals are implemented as jumps to whichever address is currently at the top of the stack. Compilers producing EVM bytecode often perform aggressive optimization, since instructions have a monetary cost to execute. A major obfuscating factor, for instance, is the aggressive merging of identical instruction sequences belonging to different internal, a.k.a. private, functions.
Control and data flow are, as a result, very hard to discern without sophisticated modeling techniques. In this paper, we introduce Elipmoc (“compile”, backwards), a decompiler for Ethereum smart con-tracts. Elipmoc advances the state of the art in EVM bytecode decompilation, offering significantly increased precision and completeness. Elipmoc is the substrate of a successful analysis framework that has flagged numerous exploitable vulnerabilities on contracts with or without source (but with millions of dollars in locked value). All analyses operate over the three-address code exported by Elipmoc. We have made several vulnerability disclosures, some of which resulted in major rescue efforts [Dedaub 2021b,c,d,f; Michales, Jonah 2021; Primitive Finance 2021].
In addition, the Elipmoc team has been commissioned to perform three separate studies for the Ethereum Foundation, for the assessment of the impact of Ethereum Improvement Proposals EIP-1884, EIP-3074, and a future EIP that proposes a rearchitecting of storage gas cost metering. The Elipmoc decompiler allows answering questions such as “what will be the impact of change-X on the entire set of currently deployed contracts?” and is becoming a crucial tool for the evolution of the Ethereum blockchain and many of its applications.
In technical terms,
Elipmoc makes the following contributions:
● Transactional Context Sensitivity. Elipmoc proposes an effective form of context sensitivity for smart contract execution. Context sensitivity offers a way to condense actual executions into short static signatures. Devising good signatures is crucial for both precision and scalability throughout the decompiler. Elipmoc’s transactional context sensitivity is based on insights about EVM low-level control flow (esp. continuation-based optimizations) and overall execution.
● Scalable, context & path-sensitive function reconstruction. Since compilers tend to reuse EVM bytecode sequences for the benefit of multiple functions, a direct inference of private functions is not straightforward. Additionally, high-level control flow is compiled away to the point of near-irreversibility, using continuation-passing-style patterns. We address such issues with a static analysis maintaining the current contents of a continuation stack, and a path-sensitive analysis to determine with high precision the maximum number of stack elements consumed, i.e.,