IEEE754 Standard Floating Point Storage Format

Basic storage format (from high to low): Sign + Exponent + Fraction

Sign: Symbol bit

Exponent: Order Code

Fraction: Valid Number

Analysis of 32-bit Floating Point Storage Format

Sign: 1 bit (31bit)

Exponent: 8 bits (8 bits in total from 30 to 23)

Fraction: 23 bits (23 bits from 22 to 0)

The true value of 32-bit non-zero floating-point numbers (python syntax):

(-1) **Sign * 2 **(Exponent-127) * (1 + Fraction)

Examples are as follows:

a = 12.5

1. Solving symbolic bits

If a is greater than 0, Sign is 0, expressed in binary as:

2. Solving Order Code

a is expressed as binary: 1100.0

If the decimal point needs to move 3 bits to the left, the Exponent is 130 (127 + 3), expressed in binary: 10000010

3. Solving Valid Numbers

If a significant number needs to remove the 1 implied in the highest digit, the integral part of the significant number is 100.

The decimal decimal fraction is converted to binary decimal by the decimal * 2. If the integer part is taken, the decimal part is: 1.

The binary of a can be expressed as: 01000101000000000000000000000000000 when the latter is added 0.

That is: 0100 0001 0100 10000 0000 0000 0000 0000 0000 0000

Expressed in hexadecimal system: 0x41480000

4. Reducing Truth Value

Sign = bin(0) = 0

Exponent = bin(10000010) = 130

Fraction = bin(0.1001) = 2 ** (-1) + 2 ** (-4) = 0.5625

True value:

(-1) **0 * 2 **(130-127) * (1 + 0.5625) = 12.5

32-bit floating-point binary storage parsing code (c++):

https://github.com/mike-zhang/cppExamples/blob/master/dataTypeOpt/IEEE754Relate/floatTest1.cpp

Operation effect:

[root@localhost floatTest1]# ./floatToBin1
sizeof(float) : 4
sizeof(int) : 4
a = 12.500000
showFloat : 0x 41 48 00 00
UFP : 0,82,480000
b : 0x41480000
showIEEE754 a = 12.500000
showIEEE754 varTmp = 0x00c00000
showIEEE754 c = 0x00400000
showIEEE754 i = 19 , a1 = 1.000000 , showIEEE754 c = 00480000 , showIEEE754 b = 0x41000000
showIEEE754 i = 18 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 17 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 16 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 15 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 14 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 13 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 12 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 11 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 10 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 9 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 8 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 7 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 6 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 5 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 4 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 3 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 2 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 1 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 : 0x41480000
[root@localhost floatTest1]#

Analysis of 64-bit floating-point storage format

Sign: 1 bit (31bit)

Exponent: 11 bits (62 to 52 total 11 bits)

Fraction: 52 bits (52 bits from 51 to 0)

The true value of 64-bit non-zero floating-point numbers (python syntax):

(-1) **Sign * 2 **(Exponent-1023) * (1 + Fraction)

Examples are as follows:

a = 12.5

1. Solving symbolic bits

If a is greater than 0, Sign is 0, expressed in binary as:

2. Solving Order Code

a is expressed as binary: 1100.0

If the decimal point needs to move 3 bits to the left, the Exponent is 1026 (1023 + 3), expressed in binary: 100000000010

3. Solving Valid Numbers

If a significant number needs to remove the 1 implied in the highest digit, the integral part of the significant number is 100.

The decimal decimal fraction is converted to binary decimal by the decimal * 2. If the integer part is taken, the decimal part is: 1.

If the following complement is 0, the binary of a can be expressed as:

0100000000101001000000000000000000000000000000000000000000000000

That is: 0100 0000 00010 1001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Expressed in hexadecimal system: 0x4029000000000000

4. Reducing Truth Value

Sign = bin(0) = 0
Exponent = bin(10000000010) = 1026
Fraction = bin(0.1001) = 2 ** (-1) + 2 ** (-4) = 0.5625

True value:

(-1) **0 * 2 **(1026-1023) * (1 + 0.5625) = 12.5

64-bit floating-point binary storage parsing code (c++):

https://github.com/mike-zhang/cppExamples/blob/master/dataTypeOpt/IEEE754Relate/doubleTest1.cpp

Operation effect:

[root@localhost t1]# ./doubleToBin1
sizeof(double) : 8
sizeof(long) : 8
a = 12.500000
showDouble : 0x 40 29 00 00 00 00 00 00
UFP : 0,402,0
b : 0x0
showIEEE754 a = 12.500000
showIEEE754 logLen = 3
showIEEE754 c = 4620693217682128896(0x4020000000000000)
showIEEE754 b = 0x4020000000000000
showIEEE754 varTmp = 0x8000000000000
showIEEE754 c = 0x8000000000000
showIEEE754 i = 48 , a1 = 1.000000 , showIEEE754 c = 9000000000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 47 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 46 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 45 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 44 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 43 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 42 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 41 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 40 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 39 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 38 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 37 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 36 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 35 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 34 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 33 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 32 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 31 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 30 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 29 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 28 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 27 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 26 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 25 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 24 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 23 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 22 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 21 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 20 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 19 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 18 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 17 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 16 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 15 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 14 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 13 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 12 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 11 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 10 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 9 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 8 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 7 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 6 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 5 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 4 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 3 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 2 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 1 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 : 0x4029000000000000
[root@localhost t1]#

Okay, that's all. I hope it will help you.

This article github address:

Https://github.com/mike-zhang/mikeBlog Essays/blob/master/2018/20180117_IEEE754 standard floating point storage format.rst

Welcome to add

Keywords: C++ github Python

Added by superdude on Sun, 19 May 2019 05:11:24 +0300

Programming VIP

IEEE754 Standard Floating Point Storage Format

Analysis of 32-bit Floating Point Storage Format

Analysis of 64-bit floating-point storage format

Popular Keywords