CVE-2017-12842: Trusted Merkle Tree Depth for Safe Tx Inclusion Proofs Without a Soft Fork
Recently a weakness in the merkle tree algorithm of Bitcoin, called Leaf-Node
weakness was discussed during a responsible discolsure. It’s been known for a
while that the Merkle tree algorithm fails to distinguish between inner nodes
64 byte transaction, as inner nodes of the merkle tree are constructed
by concatenation of two
32 byte SHA256'd hashes.
The resulting variable is hashed to get the next node in the tree, a
transaction maybe carefully constructed to make a node think it’s actually an
inner node and that it hasn’t reached a transaction yet, that would make it
possible for the attacker to make a “fake” transaction and have it linked
as a leaf node to the fake inner node.
This exploit can be used to attack SPV nodes, these are nodes that verify the existence of a payment without downloading the entire block through the Merkle tree. An attacker can provide a Merkle tree that has a fake transaction as a proof of payment to a transaction he didn’t actually pay.
Several solutions were proposed to invalidate this exploit, one solution,
proposed by Sergio Demian Lerner from RSK, is to just refuse any
transaction and invalidate blocks that have them.
Another, as pointed by Peter Todd, is to include the depth of the Merkle tree in the block headers, as clients will be able to check if they have arrived to a leaf node by comparing it to the depth.
It occured to me that if the depth of the merkle tree is known, this vulnerability can be trivially avoided by simply comparing the length of the merkle path to that known depth. For pruned nodes, if the depth is saved prior to pruning the block contents itself, this would allow for completely safe verification of tx inclusion proofs, without a soft-fork; storing this data in the block header database would be a simple thing to do.
The first solution requires a soft fork while the second solution, suggested by Peter Todd, does not. He continued on with some details on the Brute Force Cost assuming a valid transaction:
Consider the following 64 byte transaction: tx = CTransaction([CTxIn(COutPoint(b'\xaa'*32,0xbbbbbbbb),nSequence=0xcccccccc)],[CTxOut(2**44-1,CScript([b'\xdd\xdd\xdd']))],nLockTime=2**31-1) If we serialize it, the last 32 bytes are: aaaaaaaaaa bbbbbbbb 00 cccccccc 01 ffffffffff0f0000 04 03dddddd ffffff7f ↳prevhash↲ ↳ n ↲ ↳ seq ↲ ↳ nValue ↲ ↳ pubk ↲ ↳lockt ↲ ↳ sig_len ↳num_txouts ↳scriptPubKey_len Of those fields, we have free choice of the following bits: prevhash: 40 - prev tx fully brute-forcable, as tx can be created to match prev_n: 16 - can create a tx with up to about 2^16 outputs seq: 32 - fully brute-forcable in nVersion=1 txs nValue: 44 - assuming attacker has access to 175,921 BTC, worth ~1.3 billion right now pubk: 32 - fully brute-forcable if willing to lose BTC spent; all scriptPubKeys are valid nLockTime: 31 - valid time-based nLockTime Total: 195 bits free choice → 61 bits need to be brute-forced
Sergio Demian highlighted that it’s team already reported this scurity breach as CVE-2017-12842, and urged the list participants to work on a clean design for this important issue.
He suggested to use in the version bits 4 bits indicating the tree depth (-1), as a soft-fork:
00=1 depth, 0F = 16 depth (maximum 64K transactions). Very clean.
For a more in depth explanation of the Leaf-Node weakness we recommend you the following article published by BITSLOG: Leaf-Node weakness in Bitcoin Merkle Tree Design
Support us and the authors of this article by donating to the following address:37Hqz14hiqreJuWWuf3pyxL9LhovLmhvQb