YANNIS SMARAGDAKIS, University of Athens, Greece
NEVILLE GRECH, University of Malta, Malta
SIFIS LAGOUVARDOS, University of Athens, Greece
KONSTANTINOS TRIANTAFYLLOU, University of Athens, Greece
ILIAS TSATIRIS, University of Athens, Greece
We present a static analysis approach that combines concrete values and symbolic expressions. This symbolic value-flow (“symvalic”) analysis models program behavior with high precision, e.g., full path sensitivity. To achieve deep modeling of program semantics, the analysis relies on a symbiotic relationship between a traditional static analysis fixpoint computation and a symbolic solver: the solver does not merely receive a complex “path condition” to solve, but is instead invoked repeatedly (often tens or hundreds of thousands of times), in close cooperation with the flow computation of the analysis.
The result of the symvalic analysis architecture is a static modeling of program behavior that is much more complete than symbolic execution, much more precise than conventional static analysis, and domain-agnostic: no special-purpose definition of anti-patterns is necessary in order to compute violations of safety conditions with high precision. We apply the analysis to the domain of Ethereum smart contracts. This domain represents a fundamental challenge for program analysis approaches: despite numerous publications, research work has not been effective at uncovering vulnerabilities of high real-world value. In systematic comparison of symvalic analysis with past tools, we find significantly increased completeness (shown as 83-96% statement coverage and more true error reports) combined with much higher precision, as measured by rate of true positive reports. In terms of real-world impact, since the beginning of 2021, the analysis has resulted in the discovery and disclosure of several critical vulnerabilities, over funds in the many millions of dollars. Six separate bug bounties totaling over $350K have been awarded for these disclosures. CCS Concepts:
• Theory of computation → Program analysis;
• Software and its engineering → General programming languages;
• Security and privacy → Software and application security.
Additional Key Words and Phrases: Program Analysis, Smart Contracts, Security, Ethereum, Blockchain ACM Reference Format: Yannis Smaragdakis, Neville Grech, Sifis Lagouvardos, Konstantinos Triantafyllou, and Ilias Tsatiris. 2021.
Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts. Proc. ACM Program. Lang. 5, OOPSLA, Article 163 (October 2021), 30 pages. https://doi.org/10.1145/3485540 Authors’ addresses: Yannis Smaragdakis, University of Athens, Greece, smaragd@di.uoa.gr; Neville Grech, University of Malta, Malta, me@nevillegrech.com; Sifis Lagouvardos, University of Athens, Greece, sifis.lag@di.uoa.gr; Konstantinos Triantafyllou, University of Athens, Greece, kotriant@di.uoa.gr; Ilias Tsatiris, University of Athens, Greece, i.tsatiris@di. uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
1 INTRODUCTION
Static analysis is distinguished among program analysis techniques (e.g., testing [Czech et al. 2016; Meyer 2008], model checking [Clarke et al. 1986; Jhala and Majumdar 2009], or symbolic execution [Baldoni et al. 2018; King 1976]) by its emphasis on completeness, i.e., its attempt to model all (or as many as possible) program behaviors. To achieve this goal under a realistic time budget, static analysis often has to sacrifice some precision: the analysis may consider combinations of values that may never appear in a real execution. Maintaining high precision, while achieving completeness and scalability, remains a formidable challenge. In this work, we present an analysis that attempts to meet this challenge, under realistic domain assumptions. The analysis maintains high precision (closely analogous to that of model checking or full program execution) while achieving very high completeness, as measured in terms of coverage of program behaviors.
For conciseness purposes, we give to the analysis architecture the name symvalic analysis, for “symbolic+value-flow static analysis”. A symvalic analysis computes for each program variable a set of possible concrete values as well as symbolic expressions. The analysis is path-sensitive: values propagate to a variable at a program point only if they satisfy (modulo natural analysis over-approximation) the conditions under which the program point would be reachable. In order to satisfy conditions, the analysis needs to solve equalities and inequalities over both concrete and symbolic values. To do this, the analysis appeals to a symbolic solver and simplifier tightly integrated with the analysis core. At the same time, most of the analysis coverage of the program is done inexpensively, based on concrete values—e.g., the analysis tries selected small integers, large integers, and constants from the program text as values of variables at every external entry point.
The above description is perhaps reminiscent of dynamic-symbolic (a.k.a. concolic) execution [Cadar et al. 2008; Godefroid 2007; Godefroid et al. 2005; Sen and Agha 2006; Sen et al. 2005; Tillmann and de Halleux 2008; Tillmann and Schulte 2006]. Some of the insights are indeed similar. The use of concrete values whenever possible (and, conversely, the appeal to symbolic solving selectively) is responsible for the symvalic analysis scalability, much as it has been for dynamic-symbolic execution. However, symvalic analysis also has significant differences from dynamic-symbolic execution. Most notably, it is a static analysis and not full program execution. To model a program statement, symvalic analysis does not need to create a full context of values for every other program variable and (heap) storage location. Indeed, symvalic analysis treats concrete values highly efficiently, manipulating them using set-at-a- ime reasoning, instead of individually. This is the value-flow aspect of symvalic analysis: sets of values are propagated using mass operations (table joins) using a standard fixpoint (Datalog) analysis engine. We apply the idea of symvalic analysis to Ethereum smart contracts [Wood 2014]: a domain where high-precision, high-completeness analyses are in demand. Smart contracts are programs that handle monetary assets whose value occasionally rises to the many millions of dollars.
Tens of security researchers pore over the code of these contracts daily. Discovering an important bug is a source of both fame and reward. Conventional analysis techniques are rarely effective for high-value contracts: Perez and Livshits [2021] recently find that only 0.27% of the funds reported vulnerable by some of the most prominent research tools have truly been subsequently exploited. Simply put, program analysis can help with vulnerabilities in contracts that have not received much scrutiny, but rarely helps with high-value vulnerabilities that have not been found through other means. Partly addressing this need, symvalic analysis has already enjoyed significant success in the analysis of smart contracts, with several high-value vulnerabilities discovered. In brief, our work claims the following contributions: