Floating point number operation may be involved in our current mainstream desktop processors and mobile processors. In computer, the expression of floating point number itself should follow IEEE754 specification. This specification clearly defines which floating-point numbers are normalized, which are denormalized, which belong to positive and negative infinity, and which belong to the non number (NaN) we want to discuss today.
In the IEEE754 standard, non numbers are also mentioned. For each processor implementation, they can be divided into two categories: one is signaling NaN (SNaN), that is, non numbers that will trigger abnormal floating-point signals, which we also call "signaling NaN"; The other is quiet NaN (abbreviated as QNaN), that is, non numbers that will not trigger floating-point exception signals. We also call it "silent NaN".
Section 4.2.2 "floating point data types" in Volume 1 of Intel Developer's guide introduces the representation of floating point numbers in x86 processors, and table 4-3 in this section also lists the encoding methods of floating point numbers and non floating point numbers.
The official programming guide of ARMv8 also starts from section A1.4.2 to introduce the representation method from semi precision floating point to fixed-point number. Among them, section A1.4.3 introduces the detailed representation of single precision floating point and the coding method of NaN.
From the above documents, they encode SNaN and QNaN in the same way, that is, ignoring symbol bits; The index part is full 1; In the mantissa part, if the highest bit is 0 and the mantissa is not all 0, the floating point number is SNaN, and if the highest bit of the mantissa is 1, the floating point number is QNaN.
Section 4.8.3 of Volume 1 of the Intel Developer's guide describes the encoding of real and non numbers. Sections 4.8.3.4 to 4.8.3.7 describe in detail the behavior and application of SNaN and QNaN in x86 processors.
ARMv8 programming guide starts from section A1.5.2 to describe the representation of floating-point numbers in ARMv8 architecture and related terms in detail. Among them, subsection A1.5.5 describes the processing of NaN and the operation execution of QNaN and SNaN.
To sum up, whether x86 processor or ARM, if one operand in a floating-point calculation is NaN and the other operand is not NaN, the destination operand will select the value of NaN. If both source operands are NaN, the execution of x86 processor and ARM processor will be different:
- For ARM Processor: if one source operand is SNaN and the other source operand is QNaN, SNaN is selected as the result; If both source operands are snans, the first source operand is selected as the result.
- For x86 processors: no matter which of the two operands is QNaN and which is SNaN, the first source operand is taken as the result. The following examples will illustrate this point.
For the above process, if the result obtained by the destination operand is an SNaN, both processors will convert it into a QNaN as the calculation result. Moreover, the behavior of the two in converting SNaN to QNaN is also surprisingly consistent - the highest position of the mantissa is directly 1, and the other bits remain unchanged.
As mentioned earlier, QNaN sends signals to the processor. So how does the processor control this behavior?
- For x86 processors: see section 10.2.3 "MXCSR Control and Status Register" in Volume 1 of the Intel programming guide. If the 7th bit (IM) of MXCSR register is 0, an interrupt will be triggered when SNaN results, otherwise the interrupt will not be triggered, but the 0th bit (IE) of MXCSR, that is, the floating-point abnormal operation flag bit, will still be set to 1.
- For ARMv8 architecture processor, please refer to section C5.2.7 "FPCR, floating point control register" and section C5.2.8 "FPSR, floating point status register" of ARMv8 programming guide. If bit 8 (IOE bit) of FPCR is set to 1, a software interrupt will be triggered when SNaN results are generated, but bit 0 (IOC bit) of FPSR, that is, the floating-point invalid operation cumulative flag bit, will not be updated. The flag bit can be manually updated and set in software exception handling. If the IOE bit of the FPCR is set to 0 (by default), the abnormal interrupt will not be triggered, and the processor will automatically set the IOC bit of the FPSR to 1.
Code example
Let's test x86 under Windows system through a piece of code_ 64 processor processing behavior for NaN. The author's environment here is Windows 11, the development tool is Visual Studio 2022, and the code adopts C++ 20 standard. If you use the Visual Studio 2019 version, you can just choose the C++ 20 standard.
Here is the code of main.cpp:
#include <cstdio> #include <cstdlib> #include <algorithm> #include <utility> #include <limits> extern "C" void NanOpTest(unsigned dst[4], unsigned srcOp1[4], unsigned srcOp2[4], unsigned* pMXCSR); int main(void) { printf("Has signaling NaN? %s\n", std::numeric_limits<float>::has_signaling_NaN ? "YES" : "NO"); union FloatType { float f; unsigned u; double d; unsigned long long ull; } qnanf = { .f = std::numeric_limits<float>::quiet_NaN() }, qnand = { .d = std::numeric_limits<double>::quiet_NaN() }, snanf = { .f = std::numeric_limits<float>::signaling_NaN() }, snand = { .d = std::numeric_limits<double>::signaling_NaN() }; printf("qnanf = 0x%08X\n", qnanf.u); printf("qnand = 0x%.16llX\n", qnand.ull); printf("snanf = 0x%08X\n", snanf.u); printf("snand = 0x%.16llX\n", snand.ull); // Explicitly set it to a SNaN snanf.u = 0x7fa0'0000U; snand.ull = 0x7ff8'0000'0000'0000ULL; constexpr FloatType normalInt = { .f = 0.5f }; struct alignas(64) { unsigned dst[4]; unsigned src1[4]; unsigned src2[4]; } opData = { .src1 = { qnanf.u | 1, snanf.u | 1, snanf.u | 1, normalInt.u | 1 }, .src2 = { snanf.u | 2, qnanf.u | 2, snanf.u | 2, qnanf.u | 2 } }; unsigned mxcsrReg = 0; NanOpTest(opData.dst, opData.src1, opData.src2, &mxcsrReg); printf("Before operaton, MXCSR = 0x%04X\n", mxcsrReg); printf("Op result: 0x%08X 0x%08X 0x%08X 0x%08X\n", opData.dst[0], opData.dst[1], opData.dst[2], opData.dst[3]); NanOpTest(opData.dst, opData.src1, opData.src2, &mxcsrReg); printf("After operaton, MXCSR = 0x%04X\n", mxcsrReg); }
There is a detail to the above code. Originally, the current C + + standard gives a constant representation of SNaN in a given running environment. However, the SNaN value of a single precision floating-point given by MSVC here is 0x7FC00001. It can be seen that it is not a real SNaN, because the highest bit (i.e. 22 bits) of the mantissa is 1, not 0. Therefore, I explicitly write a constant below: 0x7fa0'0000U, which is also a constant obtained under Linux GCC. It is a correct representation of SNaN.
The implementation of NanOpTest function is given below, which is in the test.asm assembly file:
.code ; void NanOpTest(unsigned dst[4], unsigned srcOp1[4], unsigned srcOp2[4], unsigned *pMXCSR) NanOpTest proc public stmxcsr dword ptr [r9] vmovdqa xmm1, xmmword ptr [rdx] vmovdqa xmm2, xmmword ptr [r8] vaddps xmm0, xmm1, xmm2 vmovdqa xmmword ptr [rcx], xmm0 ret NanOpTest endp end
Finally, you can set up the project and introduce the assembly generation dependency of masm to build and run it.