What if I told you that over one-third of recently-deployed Ethereum smart contracts consist mostly of unusable junk?
We recently identified a bug in the solidity compiler, resulting in the inclusion of dead code in the deployed bytecode of contracts.
What we did not know (and did not expect!) was how pervasive the bug was, affecting (almost certainly) many tens of thousands of contracts and the majority of their deployed bytecode.
We discovered the bug when evaluating Gigahorse, our open source binary-lifter underpinning the decompiler and analyzer at https://library.dedaub.com/, οn recently deployed contracts.
We originally reported this bug back in November. Following the Solidity team’s acknowledgement of the issue, we performed experiments at scale to investigate the impact.
We had no idea…
Issue description
The bug manifests itself when library methods are only called by a contract’s constructor. The EVM bytecode produced by these methods should only be part of the contract’s creation bytecode (initcode). However, the low-level code implementing these “dead” library methods makes it to the contract’s runtime bytecode, increasing its size for no benefit.
Simple real world example
Looking at an EIP1967 proxy contract deployed at Ethereum address 0x5bbb007b32828f4550cd2a3d16c8dbe3c555ef90. The decompilation output of Dedaub’s contract library shows what the on chain code really does:
// Decompiled by library.dedaub.com
// 2023.01.16 02:06 UTC
// Data structures and variables inferred from the use of storage instructions
address _fallback; // STORAGE[0x360894a13ba1a3210667c828492db98dca3e2076cc3735a920a3ca505d382bbc] bytes 0 to 19
function () public payable {
v0 = _fallback.delegatecall(MEM[0 len msg.data.length], MEM[0 len 0]).gas(msg.gas);
require(v0); // checks call status, propagates error data on error
return MEM[0 len (RETURNDATASIZE())];
}
function implementation() public nonPayable {
return _fallback;
}
// Note: The function selector is not present in the original solidity code.
// However, we display it for the sake of completeness.
function __function_selector__(bytes4 function_selector) public payable {
MEM[64] = 128;
if (msg.data.length < 4) {
if (!msg.data.length) {
();
}
} else if (0x5c60da1b == function_selector >> 224) {
implementation();
}
v0 = _fallback.delegatecall(MEM[0 len msg.data.length], MEM[0 len 0]).gas(msg.gas);
require(v0); // checks call status, propagates error data on error
return MEM[0 len (RETURNDATASIZE())];
}
Our lifter reports that 30 of the 46 low-level code blocks can never be executed when transacting with the contract. By closely inspecting the results we realized that the call to Address.functionDelegateCall() at line 260 (of the source code) results in the code of library methods declared in lines 170, 180, 195. 36, 231 being part of the deployed bytecode although they are never used in the logic of the runtime code.
We can confirm the issue by hand-tweaking the source code, declaring the library methods inside the compiled contract instead of a library. As a result the deployed bytecode size drops from 808 to 298 bytes.
Impact on deployed contracts
We assembled a total of 10,000 unique bytecodes of contracts compiled using solc 0.8.17 deployed on the Ethereum mainnet. Analyzing them using gigahorse we can tell that at least 35% of the above have some dead code and 33% of them have dead code taking up the majority of their runtime bytecode! These results are dominated by manifold NFT proxies but other proxy contracts also face the same issue.
The most extreme example we found was a contract using the diamond proxy pattern. Its deployed bytecode is 4908 bytes, but removing the dead libraries results in a bytecode of just 302 bytes!
On large contracts, thankfully, the impact of this issue is usually negligible. But most deployed contracts by number are small contracts, and they are heavily burdened by dead code.
And this was a surprise, to say the least.