# IEEE754 Standard Floating Point Storage Format

Basic storage format (from high to low): Sign + Exponent + Fraction

Sign: Symbol bit

Exponent: Order Code

Fraction: Valid Number

## Analysis of 32-bit Floating Point Storage Format

Sign: 1 bit (31bit)

Exponent: 8 bits (8 bits in total from 30 to 23)

Fraction: 23 bits (23 bits from 22 to 0)

The true value of 32-bit non-zero floating-point numbers (python syntax):

`(-1) **Sign * 2 **(Exponent-127) * (1 + Fraction)`

Examples are as follows:

a = 12.5

1. Solving symbolic bits

If a is greater than 0, Sign is 0, expressed in binary as:

2. Solving Order Code

a is expressed as binary: 1100.0

If the decimal point needs to move 3 bits to the left, the Exponent is 130 (127 + 3), expressed in binary: 10000010

3. Solving Valid Numbers

If a significant number needs to remove the 1 implied in the highest digit, the integral part of the significant number is 100.

The decimal decimal fraction is converted to binary decimal by the decimal * 2. If the integer part is taken, the decimal part is: 1.

The binary of a can be expressed as: 01000101000000000000000000000000000 when the latter is added 0.

That is: 0100 0001 0100 10000 0000 0000 0000 0000 0000 0000

4. Reducing Truth Value

```Sign = bin(0) = 0

Exponent = bin(10000010) = 130

Fraction = bin(0.1001) = 2 ** (-1) + 2 ** (-4) = 0.5625```

True value:

`(-1) **0 * 2 **(130-127) * (1 + 0.5625) = 12.5`

32-bit floating-point binary storage parsing code (c++):

https://github.com/mike-zhang/cppExamples/blob/master/dataTypeOpt/IEEE754Relate/floatTest1.cpp

Operation effect:

```[root@localhost floatTest1]# ./floatToBin1
sizeof(float) : 4
sizeof(int) : 4
a = 12.500000
showFloat : 0x 41 48 00 00
UFP : 0,82,480000
b : 0x41480000
showIEEE754 a = 12.500000
showIEEE754 varTmp = 0x00c00000
showIEEE754 c = 0x00400000
showIEEE754 i = 19 , a1 = 1.000000 , showIEEE754 c = 00480000 , showIEEE754 b = 0x41000000
showIEEE754 i = 18 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 17 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 16 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 15 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 14 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 13 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 12 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 11 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 10 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 9 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 8 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 7 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 6 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 5 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 4 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 3 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 2 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 i = 1 , a1 = 0.000000 , showIEEE754 b = 0x41000000
showIEEE754 : 0x41480000
[root@localhost floatTest1]#```

## Analysis of 64-bit floating-point storage format

Sign: 1 bit (31bit)

Exponent: 11 bits (62 to 52 total 11 bits)

Fraction: 52 bits (52 bits from 51 to 0)

The true value of 64-bit non-zero floating-point numbers (python syntax):

`(-1) **Sign * 2 **(Exponent-1023) * (1 + Fraction)`

Examples are as follows:

a = 12.5

1. Solving symbolic bits

If a is greater than 0, Sign is 0, expressed in binary as:

2. Solving Order Code

a is expressed as binary: 1100.0

If the decimal point needs to move 3 bits to the left, the Exponent is 1026 (1023 + 3), expressed in binary: 100000000010

3. Solving Valid Numbers

If a significant number needs to remove the 1 implied in the highest digit, the integral part of the significant number is 100.

The decimal decimal fraction is converted to binary decimal by the decimal * 2. If the integer part is taken, the decimal part is: 1.

If the following complement is 0, the binary of a can be expressed as:

0100000000101001000000000000000000000000000000000000000000000000

That is: 0100 0000 00010 1001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

4. Reducing Truth Value

```Sign = bin(0) = 0
Exponent = bin(10000000010) = 1026
Fraction = bin(0.1001) = 2 ** (-1) + 2 ** (-4) = 0.5625```

True value:

`(-1) **0 * 2 **(1026-1023) * (1 + 0.5625) = 12.5`

64-bit floating-point binary storage parsing code (c++):

https://github.com/mike-zhang/cppExamples/blob/master/dataTypeOpt/IEEE754Relate/doubleTest1.cpp

Operation effect:

```[root@localhost t1]# ./doubleToBin1
sizeof(double) : 8
sizeof(long) : 8
a = 12.500000
showDouble : 0x 40 29 00 00 00 00 00 00
UFP : 0,402,0
b : 0x0
showIEEE754 a = 12.500000
showIEEE754 logLen = 3
showIEEE754 c = 4620693217682128896(0x4020000000000000)
showIEEE754 b = 0x4020000000000000
showIEEE754 varTmp = 0x8000000000000
showIEEE754 c = 0x8000000000000
showIEEE754 i = 48 , a1 = 1.000000 , showIEEE754 c = 9000000000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 47 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 46 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 45 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 44 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 43 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 42 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 41 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 40 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 39 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 38 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 37 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 36 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 35 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 34 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 33 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 32 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 31 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 30 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 29 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 28 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 27 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 26 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 25 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 24 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 23 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 22 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 21 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 20 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 19 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 18 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 17 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 16 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 15 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 14 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 13 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 12 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 11 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 10 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 9 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 8 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 7 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 6 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 5 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 4 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 3 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 2 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 i = 1 , a1 = 0.000000 , showIEEE754 b = 0x4020000000000000
showIEEE754 : 0x4029000000000000
[root@localhost t1]#```