Make up a blockchain: serialize the data structure to see how digital currency transmits data

Students who have experience in blockchain development know that to develop smart contract applications, you first need to synchronize the Ethereum main network through geth, which means you need to download a lot of data from other nodes. In addition, when using blockchain technology, such as payment and receiving digital currency, the "wallet" application needs to send a series of data to each other. When we need to send and receive data through the network, we need to serialize the data.

We have learned a lot about data structures, such as finite groups, elliptic curves, public keys, private keys, etc. relevant data needs to be transmitted through the network during application, so relevant data structures need to be serialized. For a point on an elliptic curve, a data format called uncompressed sec (standard for effective cryptography) is usually adopted during serialization. The steps are as follows:
1. Add 0x04 as the flag at the beginning
2. The x coordinate value of the placement point, which has 32 bytes
3. The y coordinate value of the placement point, which has 32 bytes.

The corresponding serialization code is as follows:

    def sec(self):
        '''
        sec Compression writes 04 at the beginning,Then followed by 32 bytes x Coordinate value,Followed by 32 bytes y Coordinate value
        '''
        return b'\x04' + self.x.num.to_bytes(32, 'big') + self.y.num.to_bytes(32, 'big')

Since there is non compression, there is the corresponding compression form SEC. Since the elliptic curve is symmetrical about the x axis, given an x coordinate, it corresponds to at most two Y coordinates, which are opposite to each other. However, since the points we act on the elliptic curve are all elements in a finite group, for a group containing P elements (note that P is a prime number), if y is the element in the group, then "- Y" is p-y, if y is an even number, then p-y is an odd number, on the contrary, if y is an odd number, then p-y is an even number. Using this feature, we can compress the content corresponding to the Y coordinate.

Therefore, given a point (x,y) on the elliptic curve, the steps of generating the compressed form SEC are as follows:
1. If y is an odd number, it starts with 0x03; if it is an even number, it starts with 0x02
2. Add 32 byte x coordinate value
So the corresponding implementation code is:

 def sec(self, compressed = True):
        if compressed:
            if self.y.num % 2 == 0:
                return b'\x02' + self.x.num.to_bytes(32, 'big')
            else:
                return b'\x03' + self.x.num.to_bytes(32, 'big')
        '''
        sec Compression writes 04 at the beginning,Then followed by 32 bytes x Coordinate value,Followed by 32 bytes y Coordinate value
        '''
        return b'\x04' + self.x.num.to_bytes(32, 'big') + self.y.num.to_bytes(32, 'big')

Compared with the uncompressed form, the serialization of the compressed form saves 32 bytes. Since y is saved, the receiver needs to restore it after receiving the data. Remember that the format of the elliptic curve is y ^ 2 = x^3 + a * x + b. We know the value of X, which means we know the value of Y square. Now we need to calculate the value of Y.

Suppose W and v are the elements of a finite group, and w^2 = v, where there are p group elements in total. Now we know v, we need to calculate W. If P% 4 = 3, then we have a good algorithm to calculate w quickly. Since P% 4 = 3, there is (p+1)% 4 = 0. That is, (p+1) can divide 4, that is, (p+1) / 4 is an integer. According to Fermat's small theorem w ^ (p-1)% P = 1, so there is w ^ 2 = w ^ 2 * 1 = w ^ 2 * (W ^ (p-1)) = w ^ (p+1). Since P is an odd number, so (p+1) can divide by 2, that is, (p+1)/2 is an integer, so there is w = w ^ ((p+1)/2).

We mentioned above that (p+1)/4 is an integer, so there is w = w ^ ((P + 1) / 2) = w ^ (2 * (P + 1) / 2) ^ ((p+1)/4) = V ^ ((p+1)/4). So if w ^ 2 = v and P% 4 = 3, then w = v ^ ((p+1)/4). For the elliptic curve used by bitcoin, the p value can just meet P% 4 = 3. In this way, we will convert the square operation to the exponential operation. Therefore, the implementation code of the square operation of finite group elements is:

P = 2 ** 256 - 2 ** 32 - 977
class BitcoinFieldElement(FieldElemet):  #S256Field
    def __init__(self, num, prime = None):
        super().__init__(num, P)
    def __repr__(self):
        return "{:x}".format(self.num).zfill(64)  # Fill in 64 numbers

    def sqrt(self):
        return self ** ((P+1) // 4)

Let's see how to parse SEC format data in compressed form. The implementation method is as follows:

    def parse(self, sec_bin): #Parsing sec compressed data
        if sec_bin[0] == 4: #Uncompressed sec format
            x = int.from_bytes(sec_bin[1:33], 'big')
            y = int.from_bytes(sec_bin[33:65], 'big')
            return BitcoinEllipticPoint(x = x, y = y)

        #In the compressed sec format, first obtain x, then calculate the square of y, and finally use the square algorithm to obtain the value of y
        is_even = sec_bin[0] == 2
        x = BitcoinEllipticPoint(int.from_bytes(sec_bin[1:], 'big'))
        #y ^ 2 = x ^3 + 7
        alpha = x ** 3 + BitcoinEllipticPoint(B)
        beta = alpha.sqrt()
        '''
        In the real field,y Will correspond to one positive and one negative value, which is the same in a finite field, if y and(p-y)Positive and negative to each other.
        That is, in a finite group, if y Satisfy the elliptic equation, then p-y It also satisfies the elliptic equation. Since bitcoin corresponds to the number of elements in a finite group P Is prime,
        So if y It's even, so p-y It's an odd number, if y It's odd, so p-y It's an even number. So it's compressing SEC In the format, if the start flag bit is 0 x02,
        So if you calculate y If it's an odd number, use it P-y
        '''
        if beta.num % 2 == 0: #Y is even and P-y is odd
            even_beta = beta
            odd_beta = BitcoinFieldElement(P - beta.num)
        else: #Y is odd and P-y is even
            even_beta = BitcoinFieldElement(P - beta.num)
            odd_beta = beta

        if is_even:
            return BitcoinEllipticPoint(x, even_beta)
        else:
            return BitcoinEllipticPoint(x, odd_beta)

Next, we give several private keys e, obtain the corresponding public key through e * G, and then look at the SEC compressed format data corresponding to the public key. The code is as follows:

'''
Given the following private key, find the compression of its public key sec number:
5001, 2019 ^ 5, 0xdeadbeef54321
'''
priv = PrivateKey(5001)
print(priv.point.sec(True))

priv = PrivateKey(2019 ** 5)
print(priv.point.sec(True))

priv = PrivateKey(0xdeadbeef54321)
print(priv.point.sec(True))

After running the above code, the results are as follows:

0357a4f368868a8a6d572991e484e664810ff14c05c0fa023275251151fe0e53d1
02933ec2d2b111b92737ec12f1c5d20f3233a0ad21cd8b36d0bca7a0cfa5cb8701
0296be5b1292f6c856b3c5654e886fc13511462059089cdf9c479623bfcbe77690

In addition, the structure that needs to be serialized is the signature. It has two values to deal with, namely, s and r. these two values are not logically related, so they cannot be compressed as above. In the blockchain, the format used to serialize signatures is called DER (Distinguished Encoding signatures). DER format is as follows:
1, starting with 0x30 bytes
2. Add the sum of the lengths of s and r, usually (0x44 and 0x45).
3. Add 0x02 as separator
4. Add the length of R (one byte) to convert r into the form of large end byte. If the byte at the beginning of it > = 0x80, add a 0x00 first, and then add the content of R,
5. Add the length of S (1 byte) and convert s to big end format. If its first byte > = 0x80, add a 0x00 first, followed by the content of S.
Since R is a 256 bit value, it can be up to 32 bytes. If its first byte > = 0x80, we need to add 0x00 in front of it, so r can be up to 33 bytes. S can also be inferred. Let's look at the code implementation:

class Signature:
    def __init__(self, r, s):
        self.r = r
        self.s = s

    def __repr__(self):
        return f"Signature({hex(self.r)}, {hex(self.s)})"

    def der(self):
        #Now convert r to big end format
        rbin = self.r.to_bytes(32, byteorder='big')
        #Remove the beginning 0x00 content
        rbin = rbin.lstrip(b'\0x00')
        if rbin[0] & 0x80: # If the header byte > = 0x80, add 0x00 before the header
            rbin = '\0x00' + rbin
        #Start with 0x2, then the length of r, and finally the content of r
        result = bytes([2, len(rbin)]) + rbin

        sbin = self.s.to_bytes(32, byteorder='big')
        sbin.lstrip(b'\0x00')
        if sbin[0] & 0x80:
            sbin = b'0x00' + sbin
        result += bytes([2, len(sbin)]) + sbin
        '''
        With 0 x30 First, followed by r and s The total length of, and then r and s Coding of
        '''
        print(f"der total bytes:{len(result)}")
        return bytes([0x30, len(result)]) + result

Let's run the above code:

sig = Signature(0x37206a0610995c58074999cb9767b87af4c4978db68c06e8e6e81d282047a7c6,
                0x8ca63759c1157ebeaec0d03cecca119fc9a75bf8e6d0fa65c841c8e2738cdaec)
print(f"signature der format:{sig.der().hex()}")

After running the code, the results are as follows:

signature der format:3048022037206a0610995c58074999cb9767b87af4c4978db68c06e8e6e81d282047a7c60224307830308ca63759c1157ebeaec0d03cecca119fc9a75bf8e6d0fa65c841c8e2738cdaec

At present, all codes are in binary format. One problem with this format is that it is not conducive to people's reading and understanding. Therefore, bitcoin later used base58 to encode binary data again. The reason why base64 is not used is that the latter limited characters are easy to be confused, such as number 0 and letter O, lowercase l and uppercase I. therefore, using base58 can avoid these problems. Let's take a look at the implementation of base58 coding:

'''
base58 The encoded characters are not lowercase l And capitalized I,And capital letters O
'''
BASE58_ALPHABET = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
def encode_base58(s):
    '''
    First count the number of zeros in the data to be encoded
    '''
    count = 0
    for c in s:
        if c == 0:
            count += 1
        else:
            break

    num = int.from_bytes(s, 'big')
    prefix = '1' * count
    result = ''
    while num > 0:
        num, mod = divmod(num, 58)
        result = BASE58_ALPHABET[mod] + result

    return prefix + result

Let's try the above code:

'''
test base58 code
'''
encode = encode_base58(bytes.fromhex('7c076ff316692a3d7eb3c3bb0f8b1488cf72e1afcd929e29307032997a838a3d'))
print(f"base58 encode:{encode}")

The code content obtained after the above code is run is:

9MA8fRQrT4u8Zj8ZRd6MAiiyaxb2Y1CMpvVkHQu5hVM6

Base58 has many defects and is rarely used in the market, but some of its specific features can be used in blockchain or bitcoin. In Ethereum or bitcoin applications, digital currency needs to have a corresponding receiving address during transfer, and base58 is used for the encoding of this address. Let's see the specific process:
1. If the address comes from the main network, it starts with 0x00; if it comes from the test network, it starts with 0x6f.
2. Get the SEC code (compressed or uncompressed), hash it with sha256, and then hash the result with ripemd160 again. This process is called hash160 operation.
3. Connect the results of the first and second steps back and forth
4. Hash the result of step 3 and take the first 4 bytes
5. Combine the results obtained in steps 3 and 4 to encode base58
The result obtained in step 5 is also called checksum. Suppose we have the result of step 3, and then see how to implement steps 4 and 5. The code is as follows:

'''
Steps 4 and 5 of address coding
'''
def encode_base58_checksum(b):
    return encode_base58(b + hashlib.sha256(b).digest()[:4])

The hash160 operation mentioned above can be implemented by directly calling the hashlib interface:

def hash160(s):
    return hashlib.new('ripemd160', hashlib.sha256(s).digest()).digest()
'''
In bitcoin applications,Hash 256 is executed twice in a row to enhance security
'''
def hash256(s):
    return hashlib.sha256(hashlib.sha256(s).digest()).digest()

We apply these steps exactly to the points of the elliptic curve:

class BitcoinEllipticPoint(EllipticPoint):
    ....
    def hash160(self, compressed=True):
        return hash160(self.sec(compressed))

    def address(self, compressed=True, testnet=False):
        h160 = self.hash160(compressed)
        if testnet:
            prefix = b'\x6f'
        else:
            prefix = b'\x00'
        return encode_base58_checksum(prefix + h160)

Let's run the logic implemented above to see:

'''
Address coding of test elliptic curve points
'''
priv = PrivateKey(5002)
print(f"address for uncompressed SEC on testnet:{priv.point.address(compressed = False, testnet = True)}")

After running the code, the results are as follows:

address for uncompressed SEC on testnet:mmTPbXQFxboEtNRkwfh6K51jvdtHLxGeMA

Another data structure that needs to be serialized is the private key. This thing is very sensitive. Once your private key is lost, all your monetary assets will be stolen by others. Therefore, we usually don't transmit the private key on the network, but we need to do so in rare cases. Therefore, we also need to serialize the private key. Its corresponding format is WIF(Wallet Import Format). Its serialization steps are as follows:
1. If it is the private key of the main network, it starts with 0x80; if it is the test network, it starts with 0xef
2. Convert the private key into a large byte array for encoding
3. If the public key uses compressed SEC format, add 0x01 at the end
4. Combine the results of steps 1, 2 and 3
5. Carry out sha256 (i.e. two consecutive rounds of 256 hashing) operation in the fourth step, and take the first four bytes of the result
6. Combine steps 4 and 5 and encode with base58
Let's look at the corresponding code implementation:

class PrivateKey:
    ....
        def wif(self, compressed = True, testnet = False):
        #First convert the secret key to big endian byte order
        secret_bytes = self.secret.to_bytes(32, 'big')
        if testnet: #Add prefix according to the main network or test network
            prefix = b'\xef'
        else:
            prefix = b'\x80'

        if compressed: # Add suffix according to compression form
            suffix = b'\x01'
        else:
            suffix = b'\01'

        return encode_base58_checksum(prefix + secret_bytes + suffix)

Let's experiment with the serialization function of the secret key:

''
Verify the serialization function of the secret key
'''
priv = PrivateKey(5003)
print(f"secret key wif:{priv.wif(True, True)}")

After the above code is run, the result is:

secret key wif:cMahea7zqjxrtgAbB7LSGbcQUr1uX1ojuat9jZodMN8rFTv2sfUK

The code of bitcoin application is very chaotic. Nakamoto confuses the big endian byte order and the small endian byte order, which is the privilege of the founder. Just like Lu Xun's misspelling becomes a "fake word", so we also need to understand the implementation of small endian coding:

'''
We use a paragraph of text to 256 hash to form a secret key, and then set the address format of the secret key
'''
secret_text = "this is my secret key"
secret = little_endian_to_int(hash256(secret_text.encode('utf-8')))
priv = PrivateKey(secret)
print(f"secret key address:{priv.point.address(testnet=True)}")

After the above code is run, the result is:

secret key address:mtg8uRgVQHjPiX15mmt7FVZeqSUbkUn3h3

We have a lot of content in this section. Fortunately, the logic is not complicated, mainly cumbersome. Code download path: https://github.com/wycl16514/blockchain-serialization.git

Keywords: Blockchain data structure Digital Currency

Added by joebloggs1987 on Sun, 20 Feb 2022 08:05:13 +0200