CVE-2017-12842: Trusted Merkle Tree Depth for Safe Tx Inclusion Proofs Without a Soft Fork

Recently a weakness in the merkle tree algorithm of Bitcoin, called Leaf-Node weakness was discussed during a responsible discolsure. It’s been known for a while that the Merkle tree algorithm fails to distinguish between inner nodes and a 64 byte transaction, as inner nodes of the merkle tree are constructed by concatenation of two 32 byte SHA256'd hashes.

The resulting variable is hashed to get the next node in the tree, a 64 byte transaction maybe carefully constructed to make a node think it’s actually an inner node and that it hasn’t reached a transaction yet, that would make it possible for the attacker to make a “fake” transaction and have it linked as a leaf node to the fake inner node.

This exploit can be used to attack SPV nodes, these are nodes that verify the existence of a payment without downloading the entire block through the Merkle tree. An attacker can provide a Merkle tree that has a fake transaction as a proof of payment to a transaction he didn’t actually pay.

Several solutions were proposed to invalidate this exploit, one solution, proposed by Sergio Demian Lerner from RSK, is to just refuse any 64 byte transaction and invalidate blocks that have them.

Another, as pointed by Peter Todd, is to include the depth of the Merkle tree in the block headers, as clients will be able to check if they have arrived to a leaf node by comparing it to the depth.

It occured to me that if the depth of the merkle tree is known, this vulnerability can be trivially avoided by simply comparing the length of the merkle path to that known depth. For pruned nodes, if the depth is saved prior to pruning the block contents itself, this would allow for completely safe verification of tx inclusion proofs, without a soft-fork; storing this data in the block header database would be a simple thing to do.

The first solution requires a soft fork while the second solution, suggested by Peter Todd, does not. He continued on with some details on the Brute Force Cost assuming a valid transaction:

Consider the following 64 byte transaction:

    tx = 
CTransaction([CTxIn(COutPoint(b'\xaa'*32,0xbbbbbbbb),nSequence=0xcccccccc)],[CTxOut(2**44-1,CScript([b'\xdd\xdd\xdd']))],nLockTime=2**31-1)

If we serialize it, the last 32 bytes are:

    aaaaaaaaaa bbbbbbbb 00 cccccccc 01 ffffffffff0f0000 04 03dddddd ffffff7f
    ↳prevhash↲ ↳ n    ↲    ↳ seq  ↲    ↳ nValue       ↲    ↳ pubk ↲ ↳lockt ↲
                        ↳ sig_len   ↳num_txouts         ↳scriptPubKey_len


Of those fields, we have free choice of the following bits:

prevhash:  40 - prev tx fully brute-forcable, as tx can be created to match
prev_n:    16 - can create a tx with up to about 2^16 outputs
seq:       32 - fully brute-forcable in nVersion=1 txs
nValue:    44 - assuming attacker has access to 175,921 BTC, worth ~1.3 billion 
right now
pubk:      32 - fully brute-forcable if willing to lose BTC spent; all 
scriptPubKeys are valid
nLockTime: 31 - valid time-based nLockTime
Total: 195 bits free choice → 61 bits need to be brute-forced

Sergio Demian highlighted that it’s team already reported this scurity breach as CVE-2017-12842, and urged the list participants to work on a clean design for this important issue.

He suggested to use in the version bits 4 bits indicating the tree depth (-1), as a soft-fork:

00=1 depth,
0F = 16 depth (maximum 64K transactions). Very clean.

For a more in depth explanation of the Leaf-Node weakness we recommend you the following article published by BITSLOG: Leaf-Node weakness in Bitcoin Merkle Tree Design

Support us and the authors of this article by donating to the following address:

37Hqz14hiqreJuWWuf3pyxL9LhovLmhvQb

Comments powered by Talkyard.