Symbolic Value-flow Static Analysis of Ethereum Smart Contracts

We present a static analysis approach that combines concrete values and symbolic expressions. This symbolic value-flow (“symvalic”) analysis models program behavior with high precision, e.g., full path sensitivity. To achieve deep modeling of program semantics, the analysis relies on a symbiotic relationship between a traditional static analysis fixpoint computation and a symbolic solver: the solver does not merely receive a complex “path condition” to solve, but is instead invoked repeatedly (often tens or hundreds of thousands of times), in close cooperation with the flow computation of the analysis.

The result of the symvalic analysis architecture is a static modeling of program behavior that is much more complete than symbolic execution, much more precise than conventional static analysis, and domain-agnostic: no special-purpose definition of anti-patterns is necessary in order to compute violations of safety conditions with high precision.

We apply the analysis to the domain of Ethereum smart contracts. This domain represents a fundamental challenge for program analysis approaches: despite numerous publications, research work has not been effective at uncovering vulnerabilities of high real-world value.

In systematic comparison of symvalic analysis with past tools, we find significantly increased completeness (shown as 83-96% statement coverage and more true error reports) combined with much higher precision, as measured by rate of true positive reports. In terms of real-world impact, since the beginning of 2021, the analysis has resulted in the discovery and disclosure of several critical vulnerabilities, over funds in the many millions of dollars. Six separate bug bounties totaling over $350K have been awarded for these disclosures.

Read more

Verkle Tree Gas Metering Impact

Dedaub was commissioned by the Ethereum Foundation to investigate the impact of Vitalik Buterin’s Verkle tree gas metering proposal on existing smart contracts.

The impact is not negligible, with a 26% gas cost increase on average. However, this will be gauged against the impact to the security and scalability of the consensus of the network, and Verkle trees could be a game-changer in this regard.

INTRODUCTION

Dedaub was commissioned by the Ethereum Foundation to investigate the impact of Vitalik Buterin’s Verkle tree gas metering proposal on existing smart contracts. In order to appraise the impact of the proposed change, we performed extensive simulations of the proposed changes over past transactions; wrote a static analysis and applied it over most contracts deployed to the mainnet; examined bytecode, source, decompiled code, and low level traces of past transactions.

Vitalik Buterin’s proposal introduces new gas changes to state access operations that closely reflect the worst case witness size that needs to be downloaded by a stateless client that needs to validate a block. As described in the original proposal, a Verkle tree is a cryptographically secure data structure with a high branching factor. Unlike Merkle trees, trace witnesses in Verkle trees do not need to include the siblings of each node in a path to the root node, which means that trees with a high branching factor are more efficient. Verkle trees can still retain the security properties by making use of a polynomial commitment scheme. By leveraging these properties, gas costs can therefore more easily reflect the upper bound of the cost of transmitting these witnesses across stateless clients, assuming these are appropriately designed.

In addition to gas changes that the proposal introduces, the Verkle tree proposal may break protocols that are too tied to the existing implementation, such as protocols that use Keydonix Oracles like Rune, but this consideration is outside the scope of this report.

In the report we briefly summarize the proposal with respect to gas costs. We then delve into the impact to existing contracts using two techniques: path-sensitive static program analysis, and dynamic analysis by modifying an Erigon node with the new gas semantics and analyzing the data per internal transaction. Finally we state our recommendations for lessening the impact of this proposal.

BACKGROUND

The goal of the proposal [1] aims to make Ethereum _stateless clients _sustainable in terms of data transmission requirements.

Stateless clients should be able to verify the correctness of any individual block without any extra information except for the block’s header and a small file, called witness, that contains the portion of the state accessed by the block along with proofs of correctness. Witnesses can be produced by any state-holding node in the network. The benefits of stateless verification include allowing clients to run in low-disk-space environments, enabling semi-light-client setups where clients trust blocks by default but stand ready to verify any specific block in the case of an alarm, and secure sharding setups where clients jump between shards frequently without the need to keep up with the state of all shards.

Large witness sizes are the key problem for enabling stateless clients for Ethereum and the adoption of Verkle trees can reduce the witness sizes needed. More specifically, a witness accessing an account in the hexary Patricia tree is, in the average case, close to 3 kB, and in the worst case it may be three times larger. Assuming a worst case of 6000 accesses per block (15m gas / 2500 gas per access), this corresponds to a witness size of ~18 MB, which is too large to safely broadcast through a p2p network within a 12-second slot. Verkle trees reduce witness sizes to ~200 bytes per account in the average case, allowing stateless client witnesses to be acceptably small.

Finally, contract code needs to be included in a witness. A contract’s code can contain up to 24000 bytes, and so a 2600 gas CALL can add ~24200 bytes to the witness size. This implies a worst-case witness size of over 100 MB. The proposal suggests breaking up contract code into chunks that can be proven separately; this can be done simultaneously with a move to a Verkle tree. Since a contract’s code can contain up to 24000 bytes, and so a 2600 gas CALL can add ~24200 bytes to the witness size. This implies a worst-case witness size of over 100 MB. The solution is to move away from storing code as a single monolithic hash, and instead break it up into chunks that can be proven separately and adding gas rules1 that accounts for these costs2.

The small witness sizes described above are possible because Verkle trees rely on an efficient cryptographic commitment scheme, called Polynomial Commitments3 4. Polynomial Commitments allow for logarithmic-sized (in respect to the tree’s height) proof-of-inclusion of any number of leaves in a subtree – and this is independent to the branching factor of the tree. In comparison, traditional Merkle tree’s proofs depend on the branching factor of the tree in a linear fashion, since all siblings of a node-to-be-proved must be included in the proof.

SUMMARY OF GAS CHANGES

The current proposal suggests new gas costs which have no counterpart in the previous versions of Ethereum. These is a gas cost for every chunk of 31 bytes of bytecode which are accessed and some additional access events that apply to the entire transaction. An improvement in gas cost can however be also achieved by lowering the cost for SLOAD/SSTORE for when fewer subtrees of the tree structure underpinning the state are accessed. The following is a summary of the gas cost changes between EIP-2929, which introduces the notion of “access lists” and the current proposal.

SLOAD, but location previously accessed within the same tx using SLOAD or SSTORE

VerkleWARM_STORAGE_READ_COST (100)
EIP-2929WARM_STORAGE_READ_COST (100)

SLOAD, but location previously unread within the tx using SLOAD or SSTORE

Verkle200 if previously visited subtree 2100 if subtree has not been visited (storage location is up to 64 / 256 elements away from the closest visited)
EIP-2929COLD_SLOAD_COST = 2100

CALL an address for an account that has not been previously accessed within the tx

VerkleAccess list costs (typically 1900 + 200 + 200 + more if value bearing)
EIP-2929COLD_ACCOUNT_ACCESS_COST = 2600

EIP-2929 mentions that precompiles are not charged COLD_ACCOUNT_ACCESS_COST but the Verkle specification omits this case at the time of writing.


Read more

Yield Skimming: Forcing Bad Swaps on Yield Farming

By the Dedaub team

Yield Skimming

Last week we received bug bounties for disclosing smart contract vulnerabilities to Vesper Finance and BT Finance, via immunefi.com. Thank you, all!

(Notice for clients of these services: None of the vulnerabilities drain the original user funds. An attack would have financial impact, but not an overwhelming one. The maximum proceeds in the past few months would have been around $150K, and, once performed, the attack would be likely to alert the service to the vulnerability, making the attack non-repeatable. The vulnerabilities have since been mitigated and, to our knowledge, no funds are currently threatened.)

Both vulnerabilities follow the same pattern and many other services could potentially be susceptible to such attacks (though all others that we checked are not, by design or by circumstance — it will soon be clear what this means). It is, therefore, a good idea to document the pattern and draw some attention to it, as well as to its underlying financials.

Yield Skimming | The Attack

A common pattern in yield farming services is to have strategies that, upon a harvest, swap tokens on an exchange, typically Uniswap. A simplified excerpt from actual deployed code looks like this:

function harvest() public {
  withdrawTokenA(); 
  uint256 reward = TokenA.balanceOf(address(this));
  unirouter.swapExactTokensForTokens(reward, 0, pathTokenAB, this, now.add(1800));
  depositTokenB();
}

Example harvest function, with swapping.

Similar code is deployed in hundreds (if not thousands) of contracts. Typical uses of the pattern are a little more complex, with the harvest and the swap happening in different functions. But the essence remains unchanged. Similar code may also be found at the point where the service rebalances its holdings, rather than at the harvest point. We discuss harvest next, as it is rather more common.

[Short detour: you see that now.add(1800)for the “deadline” parameter of the swap? The add(1800) has no effect whatsoever. Inside a contract, the swap will always happen at time now, or not at all. The deadline parameter is only meaningful if you can give it a constant number.]

Back to our main pattern, the problem with the above code is that the harvest can be initiated by absolutely anyone! “What’s the harm?” — you may ask — “Whoever calls it pays gas, only to have the contract collect its rightful yield.”

The problem, however, is that the attacker can call harvest after fooling the Uniswap pool into giving bad prices for the yield. In this way, the victim contract distorts the pool even more, and the attacker can restore it for a profit: effectively the attacker can steal almost all of the yield, if its value is high enough.

In more detail, the attack goes like this:

a) the attacker distorts the Uniswap pool (the AssetA-to-AssetB pool) by selling a lot of the asset A that the strategy will try to swap. This makes the asset very cheap.

b) the attacker calls harvest. The pool does a swap at very bad prices for the asset.

c) the attacker swaps back the Asset B they got in the first step (plus a tiny bit more for an optimal attack) and gets the original asset A at amounts up to the original swapped (of step (a)) plus what the victim contract put in.

Yield Skimming

Yield Skimming | Example

For illustration, consider some concrete, and only slightly simplified, numbers. (If you are familiar with Uniswap and the above was all you needed to understand the attack, you can skip ahead to the parametric analysis.)

Say the harvest is in token A and the victim wants to swap that to token B. The Uniswap pool initially has
1000 A tokens and 500 B tokens. The “fair” price of an A denominated in Bs is 500/1000 = 0.5. The product k of the amounts of tokens is 500,000: this is a key quantity in Uniswap — the system achieves automatic pricing by keeping this product constant while swaps take place.

In step (a) the attacker swaps 1000 A tokens into Bs. This will give back to the attacker 250 B tokens, since the Uniswap pool now has
2000 A tokens and 250 B tokens (in order to keep the product k constant). The price of an A denominated in Bs has now temporarily dropped to a quarter of its earlier value: 0.125, as far as Uniswap is concerned.

In step (b) the victim’s harvest function tries to swap, say, 100 A tokens into Bs. However, the price the victim will get is now nowhere near a fair price. Instead, the Uniswap pool goes to
2100 A tokens and 238 B tokens, giving back to the victim just 12 B tokens from the swap.

In step (c) the attacker swaps back the 250 B tokens they got in step (a), or, even better, adds another 12 to reap maximum benefit from the pool skew. The pool is restored to balance at the initial
1000 A tokens and 500 B tokens. The attacker gets back 1100 A tokens for a price of 1000 A tokens and 12 B tokens. The attacker effectively got the 100 As that the victim swapped at 1/4th of the fair price.

Yield Skimming | Parametric Analysis

The simplistic example doesn’t capture an important element. The attacker is paying Uniswap fees for every swap they perform, at steps (a) and (c). Uniswap currently charges 0.3% of the swapped amount in fees for a direct swap. The net result is that the attack makes financial sense only when the amounts swapped by the victim are large. How large, you may ask? If the initial amount of token A in the pool is a and the victim will swap a quantity d of A tokens, when can an attacker make a profit, and what values x of A tokens does the attacker need to swap in step (a)? If you crunch the numbers, the cost-benefit analysis comes down to a cubic inequality. Instead of boring you with algebra, let’s ask Wolfram Alpha.

The result that Alpha calculates is that the attack is profitable as long as the number d of A tokens that the victim will swap is more than 0.3% of the number a of A tokens that the pool had initially. In the worst case, is significant (e.g., 10% of a, as in our example) and the attacker’s maximum profit is very close to the entire swapped amount.

Another consideration is gas prices, which we currently don’t account for. For swaps in the thousands of dollars, gas prices will be a secondary cost, anyway.

Yield Skimming | Mitigation

In practice, yield farming services protect against such attacks in one of the following ways:

  • They limit the callers of harvest or rebalance. This also needs care. Some services limit the direct callers of harvest but the trusted callers include contracts that have themselves public functions that call harvest.
  • They have bots that call harvest regularly, so that the swapped amounts never grow too much. Keep3r seems to be doing this consciously. This is fine but costly, since the service incurs gas costs even for harvests that don’t produce much yield.
  • They check the slippage suffered in the swap to ensure that the swap itself is not too large relative to the liquidity of the pool. We mention this to emphasize that it is not valid protection! Note the numbers in our above example. The problem with the victim’s swap in step (b) is not high slippage: the victim gets back 12 B tokens (11.9 to be exact) whereas with zero slippage they would have gotten back 12.5. This difference, of about 5%, may certainly pass a slippage check. The problem is not the 5% slippage but the 4x lower-than-fair price of the asset, to begin with!

There are other factors that can change the economics of this swap. For instance, the attacker could be already significantly vested in the Uniswap pool, thus making the swap fee effectively smaller for them. Also, Uniswap v3 was announced right at the time of this writing, and promises 0.05% fees for some price ranges (i.e., one-sixth of the current fees). This may make similar future attacks a lot more economical even for small swaps.

Conclusion

The pattern we found in different prominent DeFi services offers opportunities for interesting financial manipulation. It is an excellent representative of the joint code analysis (e.g., swap functionality reachable by untrusted callers) and financial analysis that are both essential in the modern Ethereum/DeFi security landscape.

Killing a Bad (Arbitrage) Bot … To Save Its Owner

Following the previous white-hat hacks (12), on contracts flagged by our analysis tools, today we’ll talk about another interesting contract. It’s hackable for about $80K, or rather its users are: the contract is just an enabler, having approvals from users and acting on their commands. However, a vulnerability in the enabler allows stealing all the users’ funds. (Of course, we have mitigated the vulnerability before posting the article.)

The vulnerable contract is a sophisticated arbitrage bot, with no source on Etherscan. Being an arbitrage bot, it’s not surprising that we were unable to identify either the contract owner/deployer or its users.

One may question whether we should have expended effort just to save an arbitrageur. However our mission is to secure the smart contract ecosystem — via our free contract-library service, research, consulting, and audits. Furthermore, arbitrage bots do have a legitimate function in the Ethereum space: the robustness of automated market makers (e.g., Uniswap) depends on the existence of bots. By having bots define a super-efficient trading market, price manipulators have no expected benefit from biasing a price: the bots will eat their profits. (Security guaranteed by the presence of relentless competition is an enormously cool element of the Ethereum ecosystem, in our book.)

Also, thankfully, this hack is a great learning opportunity. It showcases at least three interesting elements:

  • Lack of source code, or general security-by-obscurity, won’t save you for long in this space.
  • There is a rather surprising anti-pattern/bad smell in Solidity programming: the use of this.function(...) instead of just function(...).
  • It’s a lucky coincidence when an attack allows destroying the means of attack itself! In fact, it is the most benign mitigation possible, especially when trying to save someone who is trying to stay anonymous.

Following a Bad Smell

The enabler contract has no source code available. It is not even decompiled perfectly, with several low-level elements (e.g., use of memory) failing to be converted to high-level operations. Just as an example of the complexity, here is the key function for the attack and a crucial helper function (don’t pay too close attention yet — we’ll point you at specific lines later):

function 0xf080362c(uint256 varg0, uint256 varg1) public nonPayable { 
    require(msg.data.length - 4 >= 64);
    require(varg1 <= 0xffffffffffffffff);
    v0, v1 = 0x163d(4 + varg1, msg.data.length);
    assert(v0 + 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff < v0);
    v2 = 0x2225(v1, v1 + (v0 + 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff << 5));
    v3 = v4 = 0x16b6(96 + v2, v2 + 128);
    v5 = v6 = 0;
    while (v5 < v0) {
        if (varg0 % 100 >= 10) {
            assert(v5 < v0);
            v7 = 0x2225(v1, v1 + (v5 << 5));
            v8 = 0x16b6(64 + v7, v7 + 96);
            MEM[MEM[64]] = 0xdd62ed3e00000000000000000000000000000000000000000000000000000000;
            v9 = 0x1cbe(4 + MEM[64], v8, this);
            require((address(v3)).code.size);
            v10 = address(v3).staticcall(MEM[(MEM[64]) len (v9 - MEM[64])], MEM[(MEM[64]) len 32]).gas(msg.gas);
            if (v10) {
                MEM[64] = MEM[64] + (RETURNDATASIZE() + 31 & ~0x1f);
                v11 = 0x1a23(MEM[64], MEM[64] + RETURNDATASIZE());
                if (v11 < 0x8000000000000000000000000000000000000000000000000000000000000000) {
                    0x1150(0, v8, address(v3));
                    0x1150(0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff, v8, address(v3));
                }
            } else {
                RETURNDATACOPY(0, 0, RETURNDATASIZE());
                revert(0, RETURNDATASIZE());
            }
        }
        assert(v5 < v0);
        v12 = 0x2225(v1, v1 + (v5 << 5));
        v13 = 0x16b6(v12, v12 + 32);
        assert(v5 < v0);
        v14 = 0x2225(v1, v1 + (v5 << 5));
        v15 = 0x1a07(32 + v14, v14 + 64);
        assert(v5 < v0);
        v16 = 0x2225(v1, v1 + (v5 << 5));
        v17 = 0x16b6(64 + v16, v16 + 96);
        assert(v5 < v0);
        v18 = 0x2225(v1, v1 + (v5 << 5));
        v19 = 0x16b6(96 + v18, v18 + 128);
        assert(v5 < v0);
        v20 = 0x2225(v1, v1 + (v5 << 5));
        v21, v22 = 0x21c2(v20, v20 + 128);
        MEM[36 + MEM[64]] = address(v17);
        MEM[36 + MEM[64] + 32] = address(v3);
        MEM[36 + MEM[64] + 64] = v23;
        MEM[36 + MEM[64] + 96] = address(v19);
        MEM[36 + MEM[64] + 128] = 160;
        v24 = 0x1bec(v22, v21, 36 + MEM[64] + 160);
        MEM[MEM[64]] = v24 - MEM[64] + 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe0;
        MEM[64] = v24;
        MEM[MEM[64] + 32] = v15 & 0xffffffff00000000000000000000000000000000000000000000000000000000 | 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffff & MEM[MEM[64] + 32];
        v25 = 0x1c7e(MEM[64], MEM[64]);
        v26 = address(v13).delegatecall(MEM[(MEM[64]) len (v25 - MEM[64])], MEM[(MEM[64]) len 0]).gas(msg.gas);
        if (RETURNDATASIZE() == 0) {
            v27 = v28 = 96;
        } else {
            v27 = v29 = MEM[64];
            MEM[v29] = RETURNDATASIZE();
            RETURNDATACOPY(v29 + 32, 0, RETURNDATASIZE());
        }
        require(v26, 'delegatecall fail');
        v23 = v30 = 0x1a23(32 + v27, 32 + v27 + MEM[v27]);
        assert(v5 < v0);
        v31 = 0x2225(v1, v1 + (v5 << 5));
        v3 = v32 = 0x16b6(96 + v31, v31 + 128);
        v5 += 1;
    }
    v33 = 0x20bd(MEM[64], v23);
    return MEM[(MEM[64]) len (v33 - MEM[64])];
}

function 0x16b6(uint256 varg0, uint256 varg1) private { 
    require(varg1 - varg0 >= 32);
    v0 = msg.data[varg0];
    0x235c(v0);  // no-op
    return v0;
}

Key function decompiled. Unintelligible, right?

Faced with this kind of low-level complexity, one might be tempted to give up. However, there are many red flags. What we have in our hands is a publicly called function that performs absolutely no checks on who calls it. No msg.sender check, no checks to storage locations to establish the current state it’s called under, none of the common ways one would protect a sensitive piece of code.

And this code is not just sensitive, it is darn sensitive. It does a delegatecall (line 55) on an address that it gets from externally-supplied data (line 76)! Maybe this is worth a few hours of reverse engineering?

Vulnerable code in contracts is not rare, but most of these contracts are not used with real money. A query of token approvals and balances shows that this one is! There is a victim account that has approved the vulnerable enabler contract for all its USDT, all its WETH, and all its USDC.

Victim token approvals, including to the enabler (0x15cb5c845b…).

And how much exactly is the victim’s USDT, USDC, and WETH? Around $77K at the time of the snapshot below.

Victim’s balances.

Reverse Engineering

The above balances and suspicious code prompted us to do some manual reverse engineering. While also checking past transactions, the functionality of the vulnerable code was fairly easy to discern. At the end of our reverse-engineering session, here’s the massaged code that matters for the attack:

pragma experimental ABIEncoderV2;

contract VulnerableArbitrageBot is Ownable {

    struct Trade {
        address executorProxy;
        address fromToken;
        address toToken;
        ...
    }
    
    function performArbitrage(address initialToken, uint256 amt, ..., Trade[] trades memory) onlyOwner external {
        ...
        IERC20(initialToken).transferFrom(address(this), amt);
        ...
        this.performArbitrageInternal(..., trades); // notice the use of 'this'
    }
    
    function performArbitrageInternal(..., Trade[] trades memory) external {
        Trade memory trade = trades[i];
        for (uint i = 0; i < trades.length; i++) {
            // ...
            IERC20(trade.fromToken).approve(...);
            // ...
            trades[i].executorProxy.delegatecall(
              abi.encodeWithSignature("trade(address,address...)", trade.fromToken, trade.toToken, ...)
            );
        }
    }
}

interface TradeExecutor {
   function trade(...) external returns (uint) {
}

contract UniswapExecutor is TradeExecutor {
    function trade(address fromToken, address toToken, ... ) returns (uint) {
        // perform trade
        ...
    }
}

This function, 0xf080362c, or performArbitrageInternal as we chose to name it (since the hash has no publicly known reversal), is merely doing a series of trades, as instructed by its caller. Examining past transactions shows that the code is exploiting arbitrage opportunities.

Our enabler is an arbitrage bot and the victim account is the beneficiary of the arbitrage!

Since we did not fully reverse engineer the code, we cannot be sure what is the fatal flaw in the design. Did the programmers consider that the obscurity of bytecode-only deployment was enough protection? Did they make function 0xf080362c/performArbitrageInternal accidentally public? Is the attack prevented when this function is only called by others inside the contract?

We cannot be entirely sure, but we speculate that the function was accidentally made public. Reviewing the transactions that call 0xf080362c reveals that it is never called externally, only as an internal transaction from the contract to itself.

The function being unintentionally public is an excellent demonstration of a Solidity anti-pattern.

Whenever you see the code pattern <strong>this.function(...)</strong> in Solidity, you should double-check the code.

In most object-oriented languages, prepending this to a self-call is a good pattern. It just says that the programmer wants to be unambiguous as to the receiver object of the function call. In Solidity, however, a call of the form this.function() is an external call to one’s own functionality! The call starts an entirely new sub-transaction, suffers a significant gas penalty, etc. There are some legitimate reasons for this.function() calls, but nearly none when the function is defined locally and when it has side-effects.

Even worse, writing this.function() instead of just function() means that the function has to be public! It is not possible to call an internal function by writing this.function(), although just function() is fine.

This encourages making public something that probably was never intended to be.

The Operation

Armed with our reverse-engineered code, we could now put together attack parameters that would reach the delegatecall statement with our own callee. Once you reach a delegatecall, it’s game over! The callee gains full control of the contract’s identity and storage. It can do absolutely anything, including transferring the victim’s funds to an account of our choice.

But, of course, we don’t want to do that! We want to save the victim. And what’s the best way? Well, just destroy the means of attack, of course!

So, our actual attack does not involve the victim at all. We merely call selfdestruct on the enabler contract: the bot. The bot had no funds of its own, so nothing is lost by destroying it. To prevent re-deployment of an unfixed bot, we left a note on the Etherscan entry for the bot contract.

To really prevent deployment of a vulnerable bot, of course, one should commission the services of Dedaub. 🙂

“look Ma’, No Source!” Hacking a Defi Service With No Source Code Available

By the Dedaub team

This story describes a cool hack, for over $300K (even nearly $600K, if done at the right time). It is a white-hat hack. We performed it off-chain, demonstrated to Dinngo, the authors of the vulnerable service, and they reproduced it and applied it to rescue the funds of exposed accounts, securing them.

The hack is among the most instructive we have encountered, which is why we wanted to document it clearly. There’s something in it for everyone: it showcases the danger of token approvals, interesting financial manipulation, the use of different DeFi services (Aave, Compound, Uniswap) as part of the attack, and much more.

Furthermore, this is a rare, if not the first, case of hacking a fairly complex smart contract without any source code available. (At the time of implementing and confirming the attack, we had no idea who was the owner of the vulnerable contract, so we were going by available bytecode only.)

Let’s start from the high level, and we’ll get more and more technical, both in the finances and in the coding.

The End-User’s View

The hack affects two parties: the victim account (a wallet, not a contract) which holds the funds, and the enabler contract, which contains the vulnerable code. The vulnerability in the enabler allowed us to drain the victim’s funds, because the victim had approved the enabler for all of its cUSDC (about $580K). In fact, there were several victims, but in the rest we only discuss the one we targeted, having an exposure 100x higher than the next closest.

If you are a DeFi end-user and want to get just one useful thing out of this article, this is it: be very careful with token approvals from your accounts. You are giving the approved spender contract the ability to do anything with your tokens. A vulnerability in the contract can drain your account. As something actionable, check out the new (in beta) Etherscan token approval feature (here demonstrated on our victim account).

Here’s what the victim’s account approvals looked like at the time of the hack:

See the highlighted approval: a contract with no source code.

Notice something strange? We highlighted one of the approvals. Of 110 token approvals, 109 were done to contracts with source code, which anyone can inspect. And one approval is to 0x936de89…: our enabler. Our enabler is also a public service: DeFlast.finance, created by Dinngo.

But the lack of source code for the contract should give you pause. See how it sticks out in the list above!

To be clear, this is not how we found out about the victim and the enabler. Instead, we are regularly running automated analyses on the entire blockchain that warn us about contracts worth inspecting closely. But the above is a likely way in which a black-hat hacker would identify that something is fishy about our victim and that the attack vector involves the enabler: some funds have been trusted to code that will likely be checked by very few people.

So, if you have accounts that interact with DeFi protocols or other token services, do yourself a favor and inspect your approvals. Your hacker may not be white-hat.

Attack: High-Level View

The vulnerable contract (our enabler), decompiled by contract-library, has a bit of complexity. We will analyze it a little later, but, even if reverse engineering is not your cup of tea, the high-level description is interesting.

The contract’s executeOperation (called after an Aave flash loan, normally) takes as parameter a client account, two Compound cTokens, the flash loan balance, and some amounts. It then does the following:

  • mints new cToken up to the specified amount
  • liquidates (“redeems”) the client’s original cTokens (e.g., cUSDC) and transfers the underlying tokens to itself, the enabler
  • swaps the tokens from the previous step on Uniswap v1 into the token of the loan
  • repays the flash loan.

In the attack, the client is the victim account. But the code does not let anyone directly get the victim’s funds, it only forces a swap of the victim’s tokens from one kind of cToken into another.

So, how can this be exploited?

If you think about it in real-life terms, you already know the answer. You have someone forced to buy goods of your choice. How can you drain their funds?

By selling them worthless goods for a high price, of course!

Therefore, in order to attack, we did the following:

  • create our own ERC20 token
  • create a fake cToken (dummy methods, just returning the expected return codes) for this ERC20 token
  • create a Uniswap v1 exchange and liquidity pool for our ERC20 token, so that it can be traded
  • call the function, supply our parameters. The victim’s tokens (USDC) were transferred into our liquidity pool (after being converted to ETH), the victim got worthless tokens in exchange
  • exit the liquidity pool, get ETH.

A cute element of the attack is that we don’t even need a sizeable liquidity pool to begin with — we can exploit Uniswap’s constant-product price calculation. That is, we don’t just make the victim buy worthless tokens, we make them buy 99.99+% of the worthless tokens’ supply, in order to drive the price up so much that the victim needs to spend all their assets! The exact percentage was carefully calculated based on the victim’s cUSDC balance.

If you think this is complex, consider this: we had never created either a cToken or a Uniswap v1 liquidity pool in code before, yet it took us only half a day to implement the basic attack. The steps are certainly well within reach of a sophisticated hacker.

The reality got complicated by a few nasty details, such as outstanding loans, extra swaps to counter slippage, etc. But the heart of the attack is well-captured in this summary.

Attack: Technical View

The first (but not foremost) complication in this attack is that the enabler contract (DeFlast’s) has no source code available. However, contract-library.com offers a reasonably good decompilation of it. Starting from the public executeOperation function (typically the callback of an Aave flash loan) we can understand a lot of the code. Here are two key functions of the decompiled code, before any effort to manually improve:

function executeOperation(address _reserve, uint256 _amount, uint256 _fee, bytes _params) public nonPayable { 
    require(msg.data.length - 4 >= 128);
    require(_params <= 0x100000000);
    require(4 + _params + 32 <= 4 + (msg.data.length - 4));
    require(!((_params.length > 0x100000000) | (36 + _params + _params.length > 4 + (msg.data.length - 4))));
    v0 = new bytes[](_params.length);
    CALLDATACOPY(v0.data, 36 + _params, _params.length);
    MEM[v0.data + _params.length] = 0;
    v1 = 0x148f(_reserve, this);
    require(_amount <= v1, 'Invalid balance for the contract');
    require(v0.length >= 128);
    v2 = 0xdad(MEM[v0.data]);
    v3 = 0xdad(MEM[v0.data + 32]);
    0x13f6(MEM[v0.data + 96], _amount, MEM[v0.data + 32]);
    0xe5b(MEM[v0.data + 96], MEM[v0.data + 64], MEM[v0.data]);
    v4 = 0x10f2(this, v2);
    v5 = _SafeAdd(_fee, _amount);
    v6 = 0x11b5(v5, v4, v3, v2);
    v7 = 0x10f2(this, _reserve);
    v8 = _SafeAdd(_fee, _amount);
    require(v7 >= v8, 'Token balance not enough for repaying flashloan.');
    v9 = _SafeAdd(_fee, _amount);
    0x15b1(v9, _reserve);
}
...
function 0xe5b(uint256 varg0, uint256 varg1, uint256 varg2) private { 
    v0 = address(varg0);
    MEM[v1.data] = varg1;
    v2 = address(varg2);
    require(v2.code.size);
    v3, v4 = v2.transferFrom(v0, this).gas(msg.gas);
    require(v3); // checks call status, propagates error data on error
    require(RETURNDATASIZE() >= 32);
    require(1 == v4, 'Failed to transfer cToken from user when redeeming');
    v5 = address(varg2);
    v6 = v1.data;
    require(v5.code.size);
    v7, v8 = v5.approve(v5, varg1).gas(msg.gas);
    require(v7); // checks call status, propagates error data on error
    require(RETURNDATASIZE() >= 32);
    require(1 == v8, 'Failed to approve cToken to Token Contract when redeeming');
    v9 = address(varg2);
    require(v9.code.size);
    v10, v11 = v9.redeem(varg1).gas(msg.gas);
    require(v10); // checks call status, propagates error data on error
    require(RETURNDATASIZE() >= 32);
    require(!v11, 'Failed to redeem underlying token.');
    v12 = 0xdad(varg2);
    v13 = 0x10f2(this, v12);
    v14 = address(varg2);
    emit 0xaface4c9957b8058dd049dc2a148905af00a14f8ef10dc658a81d03f527ab906(v14, v13);
    return ;
}

After an afternoon of manual polishing, here’s the result of our reverse engineering for the same two functions:

// _reserve is the underlying token of ctoken1, or they both pretend it is
// ctoken0 has to be a true CToken: CUSDC
// numTokens is the amount of the victim's CTokens we want to/can get
function executeOperation(address _reserve, uint256 _amount, uint256 _fee, bytes _params) public nonPayable { 
    require(_params.length <= 256);
    require(_amount <= getBalance(_reserve, this), 'Invalid balance for the contract');
          // need to have a balance with token _reserve
    ctoken0 = _params[0]; // certain ctoken
    ctoken1 = _params[1];
    numTokens = _params[2];
    owner = _params[3];
    token0 = getUnderlyingForCToken(ctoken0);
    token1 = getUnderlyingForCToken(ctoken1);
    mintCTokenForOwner(owner, _amount, ctoken1);  // mint amount of ctoken and transfer to owner
    redeemCTokenReceiveUnderlying(owner, numTokens, ctoken0);
       // get owner's ctoken, redeem it, get underlying token in "this" contract
    v4 = getBalance(this, token0);
    amountPlusFee = _SafeAdd(_fee, _amount);
    v6 = swapTokens(amountPlusFee, v4, token1, token0);
       // swaps (on Uniswap v1) the tokens this contract got, to have enough to repay the loan
    v7 = getBalance(this, _reserve);
    v8 = _SafeAdd(_fee, _amount);
    require(v7 >= v8, 'Token balance not enough for repaying flashloan.');
    v9 = _SafeAdd(_fee, _amount);
    repayFlashLoan(v9, _reserve);
}

function redeemCTokenReceiveUnderlying(uint256 owner, uint256 numTokens, uint256 ctoken) private { 
    ok, v4 = ctoken.transferFrom(owner, this, numTokens).gas(msg.gas);
    require(1 == v4, 'Failed to transfer cToken from user when redeeming');
    v5 = ctoken;
    ok, v8 = ctoken.approve(v5, numTokens).gas(msg.gas);
    require(1 == v8, 'Failed to approve cToken to Token Contract when redeeming');
    ok, v11 = ctoken.redeem(numTokens).gas(msg.gas);
    require(!v11, 'Failed to redeem underlying token.');
    v12 = getUnderlyingForCToken(ctoken);
    v13 = getBalance(this, v12);
    emit 0xaface4c9957b8058dd049dc2a148905af00a14f8ef10dc658a81d03f527ab906(ctoken, v13);
    return ;
}

Keep in mind that, at the time of doing this, we had no idea what high-level service uses this contract — we had not linked it to DeFlast, nor even knew what DeFlast was. But the contract’s intent is not too hard to discern from the code: a user’s cTokens are swapped for different cTokens (specified in the signature) with the help of a flash loan. First, the flash loan funds allow minting the new cToken. Then, the old cTokens are redeemed. The proceeds of the redemption are swapped on Uniswap v1 to get enough underlying “old tokens” to repay the loan.

However, there is no safeguard to ensure that this code is indeed called after a flash loan. But even that alone would not have been safe: one could get a minuscule flash loan and call the contract with the desired parameters. More importantly, the code does not check that the flash loan “reserve” token is the same as the “underlying” of the new cToken, nor that what the user gets back is real cTokens (and not merely something pretending to be a cToken).

So, we have a forced swap in our hands. All we need to do is make sure the code doesn’t crash from underneath us. We can create our own worthless token, wrap it in a cToken, and we can build our own market for trading them. In fact, our cToken can be entirely fake: it just needs to return the right underlying token (our worthless token) and provide the expected return values: return 0 for mint and redeem, true for transfer and approve, etc.

pragma solidity ^0.7.0;

contract CMyToken {
    address private _underlying;
    constructor (address underlying) public {
        _underlying = underlying;
    }
    function underlying() public view returns (address) {
        return _underlying;
    }
    // funny how you think this matters
    function exchangeRateCurrent() public pure returns (uint256) {
        return 10 ** 18;
    }
    function mint(uint ) public pure returns (uint256) {
        return 0;  // means no error
    }
    function transfer(address, uint) public pure returns (bool) {
        return true; // whatever you say, boss
    }
    function transferFrom(address, address, uint256) public pure returns (bool) {
        return true; // at your command
    }
    function approve(address, uint256) public pure returns (bool) {
        return true;
    }   
    function redeem(uint) public pure returns (uint) {
        return 0;
    }
}

We then created an exchange for our token on Uniswap (v1, since that’s what the vulnerable code uses) and added a little bit of liquidity to it — about 0.001ETH against a tiny amount of our worthless token.

The beauty of Uniswap’s model is that it is so amazingly general, yet robust. It allows anyone to create an exchange and provide liquidity. Prices are determined entirely on-chain. However, the reliability of Uniswap prices depends on others jumping in and correcting exchange rate anomalies. Yet in our forced swap, there are no “others”! The market never gets a chance to adjust the price and restore our worthless token to its … worthlessness. (Even if a bot had been tempted to trade with us, we installed a trap in our worthless token, not allowing it to be traded outside the attack transaction.)

By instructing the enabler contract to trade the victim’s cTokens for our cTokens we can perform a successful attack. As mentioned earlier, we deliberately caused enormous slippage: our pool initially had just 0.001ETH against 0.0000001 of our worthless token. Still, we instructed the enabler to swap for over 99.9996% of the worthless token’s supply — the exact number being computed so that it would exhaust the victim’s funds.

A further complication is that the victim was using their cUSDC as collateral for Compound loans. The loan view of the account looked like this:

Victim had $580K in vulnerable assets, securing loans of about $280K.

The total value of outstanding loans at the moment of the attack was around $280K, with collateral at $580K. A direct attack cannot get the $300K difference but only about two-thirds of that, since the Compound Comptroller would not allow transferring out money that would violate the loan collateralization limits. But this is easy to address: we just take $280K in flash loans, repay the victim’s loans, drain the $580K and pay off the flash loans.

A final complication is that the Uniswap v1 pools are too shallow nowadays. The USDC pool has around $650K liquidity at the time of this writing. Since the vulnerable code forces a swap of the proceeds on Uniswap v1, we suffer tremendous slippage. A Uniswap v1 swap between USDC and our worthless token is really two swaps with ETH in the middle: first USDC to ETH, then ETH to our token. The first of these swaps, for $580K out of the $650K available, nets a lot less ETH than it should.

However, this is easily countered: once we exit our own liquidity pool, before the end of the transaction, we perform an inverse swap of ETH for USDC and exploit all the slippage we just caused. In the end, we are left with the right amount of the victim’s USDC.

Actual Rescue Operation

The above is the attack we performed locally last week (last of Jan. 2021), confirming the vulnerability. We then made an effort to locate the owner of the victim account, but a couple of messages (speculative, based on past activity) yielded nothing.

Only at that point did we search for the owners of the enabler contract and got a link to DeFlast.finance! This was a relief. Not only did we now have a contact that could authorize a white-hat attack, but the contact was a high-quality team —also behind other projects that we had recently inspected thoroughly.

We contacted Hsuan-Ting Chu, the CEO of Dinngo, since he was the most obvious point of contact for escalating the report of a critical vulnerability. Within a few hours we were in a meeting with Hsuan-Ting and Dinngo engineers where we presented the attack.

The Dinngo team took over the rescue operation, following the blueprint of our attack, and moved the victim’s positions to another wallet. Other victims were similarly moved in the past 48hours. The operation was done very smoothly and professionally, especially considering the complexity of the attack (check out the transaction for the main victim)!

Concluding

This was a cool hack. It started from a bad smell: code that didn’t seem to be checking that it’s used only in its intended scenarios. Despite not having source code, we followed a hunch and spent some time reverse engineering. The vulnerability then required financial manipulation. Creating an exchange. Exploiting slippage. Getting flash loans. Paying off Compound loans. Countering slippage.

All in a day’s work…

Ethereum Pawn Stars: “$5.7m in Hard Assets? Best I Can Do is $2.3m”

defi saver

Saving DeFi Saver with Static Contract Analysis

By the Dedaub team

A little after midnight on Jan.5, we contacted the DeFi Saver team with news of a critical vulnerability we discovered in one of their deployed smart contracts and that we had just managed to (offline-)exploit. They responded immediately and we got on a channel with several DeFi Saver people within 5 minutes. Less than 20 hours later, client funds have been migrated to safety via a white-hack exploit.

There were some interesting elements in this vulnerability.

  • It affected major clients of the service. We initially demonstrated by exploiting one client for $1.2M. Another client had $2.2M exploitable and several more had smaller positions. There were over 200 clients that had deposited money in the vulnerable service within the past two months so the overall exploit potential was possibly even higher at different times.
  • The vulnerability was originally flagged by a sophisticated static analysis, not by human inspection. This is rare. Automated analyses typically yield low-value warnings in monetary terms. We have submitted (back in Nov.) a technical paper on the analysis techniques.
  • Beyond the static analysis, the vulnerability requires significant combination of dynamic information and careful orchestration. To exploit, one needs to find clients that have still-outstanding approvals (granted to the vulnerable contract) and an active balance for the same ERC-20 token. Then one needs to retrieve the loans that the victim holds on Compound (on different currencies) and pay them off (via a flash loan or otherwise). At that point, all the victim’s funds in the ERC-20 token are available for transfer to the attacker.
    For instance, the prototype victim had $2M in assets that could be acquired by paying off a $735K loan. The even larger victim had $3.7M in assets and a $1.5M outstanding loan.
  • Salvaging the users’ funds was highly elegant, by using precisely the flash loan and proxy authorization functionality of DeFi Saver.

Next we give some more technical detail on the above. For the service-level picture, there is a writeup by the DeFi Saver team.

Static Analysis | The Vulnerability

The vulnerable code could be found in two different DeFi Saver contracts. You can see the vulnerable function from one of the contracts in the snippet below:

defi saver
Vulnerable code, one instance

This is helper functionality — a small, deeply-buried cog in a much larger machine. The comments reveal the intent. This is a function that gets called upon receiving an Aave flash loan, repays a Compound loan on behalf of a user, lets a caller-defined proxy execute arbitrary code, and then repays the flash loan with the money received from the proxy. However, all of this is irrelevant. “Ignore comments, debug only code” as the saying goes for the security-sensitive. And this code allows a lot more than the comments say.

Static Analysis | Automated Analysis and Finding the Vulnerability

Our main job is developing program analysis technology (including contract-library.com and the decompiler behind it). In the past half year we have started deploying a new analysis architecture that combines static analysis and symbolic execution. (We call it “symbolic value-flow analysis” and we will soon have full technical papers about it.) We found the DeFi Saver vulnerability while testing a new client for this analysis: a precise detector of “unrestricted transferFrom proxy” functionality.

Basically, when our analysis looked at the above code, it only saw it like this:

defi saver
Analysis view of the vulnerable functionality. We can control all parameters of the transferFrom but the last

All the red-highlighted elements are completely caller-controllable. There are few to no restrictions on what _reservecBorrowTokenuserproxy, etc. can be. Basically, our analysis did not see this piece of code as an “Aave callback after a flash-loan operation” but as a general-purpose lever for doing transferFrom calls on behalf of any contract unfortunate enough to have authorized the vulnerable contract.

Small tangent: You may say, this doesn’t look like it needs a very sophisticated analysis. It is pretty clear that the caller can set all these variables and they end up in sensitive positions in the transferFrom call. Indeed, even a naive static analysis would flag this instance. What made our symbolic-value flow analysis useful was not that it captured this instance but that it avoided warning about others that were not vulnerable. The analysis gave us just 27 warnings about such vulnerabilities out of the 40 thousand most-recently deployed contracts! This is an incredibly precise analysis and most of these warnings were correct (although typically no tokens were at risk).

Back to the vulnerability: Finding a transferFrom statically does not imply an exploitable vulnerability. (If it did, we would have tens more vulnerabilities in our hands — the analysis issued 27 reports, as we mentioned, and most were correct.) Indeed, to perform the transferFrom there are three more dynamic requirements, based on the current state of the contracts. First, the vulnerable contract needs to have a current allowance to transfer the tokens of a victim. Second, the victim needs to have tokens. As it turns out, users of the DeFi Saver service were in exactly that state relative to the vulnerable contract. Our prototype victim shows both a balance and an allowance for the vulnerable contract:

defi saver

The victim has (at the moment of the snapshot) some $2M in underlying assets (in the cWBTC coin). So, since we can do an uncontrolled transferFrom we can get all of that, right? Well, not quite. The transferFrom on a Compound CToken goes through the Compound Comptroller service, which checks the outstanding loans over the underlying assets. If the transferFrom would make the account liquidity negative, it is not allowed. Our prototype victim indeed has outstanding Compound loans — this is in fact the reason they are in this state of balances and allowances.

defi saver
Etherscan Loans view of one of the vulnerable clients. $735K of oustanding loans, $2M in collateral.

The victim has $2M in assets and $735K in outstanding loans. So, could we just ask for less money and do the transferFrom? Actually, no. If you check the vulnerable code from before, the last parameter, cTokenBalance, of the transferFrom is not caller-controllable! It is instead the full balance of the victim.

This brings us to the third dynamic requirement for exploiting the vulnerability. In order to call this transferFrom and get the victim’s assets, we first need to pay off their loans!

This exploit is precisely what we demonstrated to the DeFi Saver team upon disclosing the vulnerability.

The Salvage Operation

Our prototype exploit ran on a private fork of the blockchain. For the real salvaging operation, we collaborated with the DeFi Saver team. Once we discussed the plan, they took the lead in the implementation.

The salvage operation was a thing of beauty, if we may say so. The DeFi Saver team performed it very professionally, with simpler code than our original exploit. The very same vulnerable functionality (the “cog”) was used after a flash loan in order to empty the victims’ accounts and transfer the vulnerable funds to new accounts that were then assigned to the original owner.
[Relevant transactions for the two victims with the largest holdings here and here.]

defi saver

Part of the elegance of the solution was that, in the end, the owners of the victim contracts held exactly the same positions as before, only now in two contracts instead of one. They had as much in underlying assets as before, and exactly as much in outstanding loans as before.

Wrapping Up

This was a very interesting vulnerability to us, although the root cause was simple (insufficient protection against hostile callers). It has many of the elements that we think are going to be central in future vulnerability detection work:

  • Combinations of static and dynamic analysis to find the vulnerable instance. Human eyes cannot be inspecting all code in great depth, even when the stakes are so high. A mundane piece of functionality can be security-critical. Static analysis is essential. Yet it’s not enough. The results will have to be cross-referenced with the current dynamic state to see if the contract is actually used in a vulnerable manner.
  • Future vulnerabilities may often follow the pattern of using existing pieces of code in unexpected ways. The more this happens, the more exploit generation will need to take current state into account. In this case, to exploit a contract, the attacker needs to pay off the contract’s loans. In the DeFi space, understanding of such state constraints will be crucial for future security work.

PS. If we might have saved you funds and/or you want to show support for our security efforts, we’ll be happy to receive donations at 0xACcE1553C83185a293e8B4865307aF8309af9407 .

Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities

Smart contracts on permissionless blockchains are exposedto inherent security risks due to interactions with untrusted entities. Static analyzers are essential for identifying security risks and avoiding millions of dollars worth of damage.
We introduce Ethainter, a security analyzer checking information flow with data sanitization in smart contracts. Ethainter identifies composite attacks that involve an escalation of tainted information, through multiple transactions, leading to severe violations. The analysis scales to the entire blockchain, consisting of hundreds of thousands of unique smart contracts, deployed over millions of accounts. Ethainter is more precise than previous approaches, as we confirm by automatic exploit generation (e.g., destroying over 800 contracts on the Ropsten network) and by manual inspection, showing a very high precision of 82.5% valid warnings for end-to-end vulnerabilities. Ethainter’s balance of precision and completeness offers significant advantages over other tools such as Securify, Securify2, and teEther.

Read more

Precise Static Modeling of Ethereum ‘memory’

Static analysis of smart contracts as-deployed on the Ethereum blockchain has received much recent attention. However, high-precision analyses currently face significant challenges when dealing with the Ethereum VM (EVM) execution model. A major such challenge is the modeling of low-level, transient “memory” (as opposed to persistent, on-blockchain “storage”) that smart contracts employ. We offer an analysis that models EVM memory, recovering high-level concepts (e.g., arrays, buffers, call arguments) via deep modeling of the flow of values. Our analysis opens the door to Ethereum static analyses with drastically increased precision. One such analysis detects the extraction of ERC20 tokens by unauthorized users. For another practical vulnerability (redundant calls, possibly used as an attack vector), our memory modeling yields analysis precision of 89%, compared to 16% for a state-of-the-art tool without precise memory modeling. Additionally, precise memory modeling enables the static computation of a contract’s gas cost. This gas-cost analysis has recently been instrumental in the evaluation of the impact of the EIP-1884 repricing (in terms of gas costs) of EVM operations, leading to a reward and significant publicity from the Ethereum Foundation.

Extract from “Precise Static Modeling of Ethereum Memory” by SIFIS LAGOUVARDOS, University of Athens, Greece NEVILLE GRECH, University of Athens, Greece
ILIAS TSATIRIS, University of Athens, Greece, and YANNIS SMARAGDAKIS, University of Athens, Greece – Read more

Ethereum static analyses | INTRODUCTION

The Ethereum blockchain has enabled the management of digital assets via unsupervised au- tonomous agents called smart contracts. Smart contracts are among the computer programs with the most dire needs of high correctness assurance, due to their managing of high-value assets, as well as their public and immutable nature. Therefore, static analysis for Ethereum smart contracts has captured the attention of the research community in recent years [Albert et al. 2018; Feist et al. 2019; Grech et al. 2018; Mueller 2018; Tsankov et al. 2018]. The first generation of Ethereum

Most static analysis tools for Ethereum operate at the binary level of contracts, as-deployed on the blockchain. This ensures that the analysis operates on all contracts in existence, regardless of whether source code is available. (Source code is not always present but not negligible either: it is available for under 25% of deployed contracts, yet for more than half of the high-value contracts.) Furthermore, operating on low-level binaries offers source-language and language- version independence, completeness in the presence of inline assembly (which is common), and, perhaps most importantly, uniform treatment of complex language features: as analysis authors often argue, innocuous-looking source-level expressions can incur looping behavior or implicit overflow [Grech et al. 2018].

Operating at the binary level necessitates contract decompilation [eth [n. d.]; Brent et al. 2018; Grech et al. 2019; Kolinko 2018] in order to recover high-level information from the very-low-level Ethereum VM (EVM) bytecode format. This decompilation effort is non-trivial: the EVM is a stack machine with no structured information (no types or functions). Control-flow (i.e., calls, returns, conditionals) is implemented as forward jumps over run-time values obtained from the stack, hence even producing a control-flow graph requires a deep static analysis [Grech et al. 2018]. Despite these challenges, sophisticated decompilers mostly succeed in recovering high-level control flow and have been the basis of the most successful Ethereum static analyses.

Despite the relative success of Ethereum decompilers, some low-level aspects of the deployed contract remain unaddressed. The most major such aspect is the precise modeling of transient EVM memory. “Memory” in the context of the EVM (and of this paper) refers to a data store that is transient and transaction-private, used by the runtime primarily to store values whose size is statically unknown—e.g., arrays, strings, or encoded buffers. (Memory is to be contrasted with storage: the on-blockchain persistent value store of an Ethereum smart contract.)

Memory is crucial in the execution of smart contracts. It is used for many cryptographic hash operations (SHA3 is a native instruction, operating over arbitrary-length buffers), for the arguments of external calls (i.e., calls to other contracts), for logging operations, and for returns from a public function. Crucially, any complex mapping data structure (i.e., the most common Ethereum data structures) stores and retrieves information (from storage) after hashing keys in memory buffers.

Modeling EVM memory is unlike other memory management settings. There is no speed premium for locality, yet the compiler attempts to keep memory tightly packed because the contract’s execution cost (in terms of Ethereum gas consumed for memory-related operations) depends on the highest initialized memory address. For the vast majority of smart contracts, written in the Solidity language, the compiler uses a simple strategy: every allocated buffer is written after the end of the last one, updating a “free-memory” pointer. This memory management policy is a low-level aspect, introduced entirely during the compilation process. At the source-code level, memory is implicit: the values that end up stored in memory merely have dynamic-length array or string types.
The goal of a precise modeling of EVM memory is to recover such source-level information (expressed in terms of arrays or strings) from the low-level address manipulation code that the compiler produce”

Ethereum static analyses

Rising Gas Prices Are Threatening Our Security (No, It’s Not the Saudi Attack)

Mr. Out of gas exception

EIP 1884 is set to be implemented into the upcoming Ethereum ‘Istanbul’ hard fork. It:

  • increases the cost of opcode SLOAD from 200 to 800 gas
  • increases the cost of BALANCE and EXTCODEHASH from 400 to 700 gas
  • adds a new opcode SELFBALANCE with cost 5.

Due to a fixed gas limit (2300) imposed by the .send(..) and .transfer(..) Solidity functions, fallback functions that use these opcodes may now start to fail due to an out-of-gas exception.

Analysis by Contract-library.com team

Contract-library.com, an automated security service, performs sophisticated static analysis on all deployed smart contracts (over 20 million of them). As static analysis is a technique that takes into account all (or almost all) possible program executions, it is expected to return the most comprehensive list of smart contracts affected by security vulnerabilities.

On Friday August 16th Martin Holst Swende of the Ethereum foundation asked a question on the ETHSecurity channel on telegram about how to go about finding smart contracts whose fallback function may fail due to EIP-1884. Since contract-library.com already had gas consumption analysis built into its core static analyses, we reached out on the same day with a list of contracts (continuously updated) that may be affected.

Over the subsequent days, also with the input of Martin Holst Swende, the gas cost analysis computation was updated and improved, over several iterations. The analysis currently reveals over 800 contracts that are highly likely to fail if called with 2300 gas (whereas they would succeed prior to EIP-1884). A subsequent, more general, analysis was also developed. This would be the most comprehensive list of possibly affected smart contracts for this particular issue, but also contains many false positives. This more general “may” analysis reveals that 7000 currently deployed smart contracts may fail under some execution paths with 2300 gas.

In addition, since our analysis is fully automated, we have also performed experiments to see whether these issues can be simply avoided by repricing the LOG0, LOG1 ... opcodes. Note that these opcodes tend to occur quite often in fallback functions. By halving the Glog and Glogtopic gas costs (refer to the yellow paper), the number of flagged contracts is reduced by approximately half!

Although repricing opcodes can always break contracts, the EVM should be able to evolve too. Clearly, a decent number of contracts will be broken due to this change, so care must be taken to lessen the impact on the overall ecosystem. In this case, we recommend repricing the LOGx opcodes, which seem to be mispriced anyway. This way, there will be fewer contracts affected.

A more interesting, but perhaps equally serious side-effect of EIP-1884 and EIP-2200 combined is that it lowers the cost of performing an unbounded mass iteration attack, which is currently quite high. This attack is described in MadMax. In summary, this is an attack carried out by an unauthorized user, to increase the size of an array or data structure, which is iterated upon by any other user, rendering the functionality inaccessible by increasing gas cost beyond the block gas limit. The combined effect of EIP-1884 and EIP-2200 make this kind of attack around 7 times cheaper on average, rendering it much more feasible. This attack requires 2 SSTOREs per array element that is added by the attacker. This array is then iterated upon by the victim, requiring an additional SLOAD. For a list of contracts that may be susceptible to unbounded iteration, we have you covered. The list contains approximately 15k contracts.

Which contracts will be affected? What about the one I’m currently developing?

If your contract does not have fallbacks which may fail with 2300 gas) or is not susceptible to unbounded iteration, then you’re most probably fine. If it is, you may still be ok, but further investigation is necessary. If you would like to see whether the contract you are developing may be affected, deploy it to one of the Ethereum testnets and check your results at contract-library.com.

Below are sample contracts with a non-zero Ether balance that are affected by the repricing of SLOAD operations, so that their fallback is no longer runnable under the send/transfer gas allowance of 2300.

KyberNetwork

function() public payable {
        require(reserveType[msg.sender] != ReserveType.NONE);
        EtherReceival(msg.sender, msg.value);
    }

NEXXO crowdsale :

modifier onlyICO() {
        require(now >= icoStartDate && now < icoEndDate, "CrowdSale is not running");
        _;
    }
    function () public payable onlyICO{
        require(!stopped, "CrowdSale is stopping");
    }

For NEXXO, it checks three slots, icoStartDateicoEndDate and stopped, totalling 2400 with new gas rules.

Crowd Machine Compute Token crowdsale:

modifier onlyIfRunning
  {
    require(running);
    _;
  }
  function () public onlyIfRunning payable {
    require(isApproved(msg.sender));
    LogEthReceived(msg.sender, msg.value);
  }

Important reminder: The crowdsales above do not inherently break, it just means that callers need to add some more gas than 2300 to partake in the ICO contracts.

CappedVault

  • Fallback function:
function () public payable {
        require(total() + msg.value <= limit);
    }

Unknown Harvester with 5 ETH

require((msg.value >= stor___function_selector__));
  emit 0xafd096c64445a293507447c2ecb78f03b4f5459ec28b8e9bfe113c35b75d624a(address(msg.sender), msg.value, 0x447);
  exit();

No source code available. Note that this contract would work if LOGx gas cost is reduced.

Aragon’s DepositableDelegateProxy

function isDepositable() public view returns (bool) {
        return DEPOSITABLE_POSITION.getStorageBool();
    }
    event ProxyDeposit(address sender, uint256 value);
    function () external payable {
        // send / transfer
        if (gasleft() < FWD_GAS_LIMIT) {
            require(msg.value > 0 && msg.data.length == 0);
            require(isDepositable());
            emit ProxyDeposit(msg.sender, msg.value);
        } else { // all calls except for send or transfer
            address target = implementation();
            delegatedFwd(target, msg.data);
        }
    }
}

Note that this contract would work if LOGx gas cost is reduced. According to the contract-library analysis, the fallback function may fail due to anything between 2308 and 2438 gas. Issue at Aragon

How does the static analysis on contract-library.com work?

Static program analysis is a technique that considers all of a program’s behaviors without having to execute the program. Static analysis is generally thought to be expensive, but over the years we have developed techniques to counter this. Firstly, we developed new techniques in the area of “declarative program analysis”, which simplifies analysis implementations. Secondly, we have applied our analyses at scale, which makes them worth the effort. Contract-library’s internal analysis framework decompiles all smart contracts on the main Ethereum network and most popular testnets to an IR representation, amenable to analysis. The decompilation framework is described in a 2019 research paper. Following this analysis, many “client analyses”, are applied. These analyses all benefit from a rich suite of analysis primitives, such as gas cost analysis (similar to worst-case execution analysis), memory contents analysis, etc. These are instantiated and customized in each client analysis. Finally, we encode all our analyses, decompilers, etc. in a declarative language, and automatically synthesize a fast C++ implementation using Soufflé.

For illustration, the FALLBACK_WILL_FAIL static analysis is encoded in the following simplified datalog spec, deployed on contract-library.com:

% Restrict the edges that form the possible paths to those in fallback functions
FallbackFunctionBlockEdge(from, to) :-
   GlobalBlockEdge(from, to), 
   InFunction(from, f), FallbackFunction(f),
   InFunction(to, g), FallbackFunction(g).
% Analyze the fallback function paths with the
% conventional gas semantics, taking shortest paths
GasCostAnalysis = new CostAnalysis(
  Block_Gas, FallbackFunctionBlockEdge, 2300, min
).
% Analyze the fallback function paths with the
% updated gas semantics, taking shortest paths
EIP1884GasCostAnalysis = new CostAnalysis(
  EIP1884Block_Gas, FallbackFunctionBlockEdge, 2300, min
).
FallbackWillFailAnyway(n - 2300) :-
   GasCostAnalysis(*, n), n > 2300.
% fallback will fail with n - m additional gas
EIP1884FallbackWillFail(n - m) :-
   EIP1884GasCostAnalysis(block, n), n > 2300,
   GasCostAnalysis(block, m),
   !FallbackWillFailAnyway(*).

The analysis performs a gas cost computation over all possible paths in the fallback functions, using the gas cost semantics of both PRE and POST EIP-1884. In cases where there is a path that can complete in the former semantics but not the latter, we flag the smart contract.