课程

教学相长

认真教书,认真学习

第一章 概述
第二章 Nios II处理器体系结构
第三章 Avalon接口规范
第四章 SOPC软硬件开发平台
第五章 Nios II处理器常用外设
第六章 μC/OS II操作系统移植
第七章 Nios II系统深入设计
第八章 调试技术

Double-precision floating-point format

Double-precision floating-point format is a computer number format which occupies 8 bytes (64 bits) in computer memory and represents a wide, dynamic range of values by using a floating point.

Computers with 32-bit storage locations use two memory locations to store a 64-bit double-precision number; each storage location holds a single-precision number. Double-precision floating-point format usually refers to binary64, as specified by the IEEE 754 standard, not to the 64-bit decimal format decimal64.

IEEE 754 double-precision binary floating-point format: binary64

Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. As with single-precision floating-point format, it lacks precision on integer numbers when compared with an integer format of the same size. It is commonly known simply as double. The IEEE 754 standard specifies a binary64 as having:

This gives 15–17 significant decimal digits precision. If a decimal string with at most 15 significant digits is converted to IEEE 754 double precision representation and then converted back to a string with the same number of significant digits, then the final string should match the original. If an IEEE 754 double precision is converted to a decimal string with at least 17 significant digits and then converted back to double, then the final number must match the original.[1]

The format is written with the significand having an implicit integer bit of value 1 (except for special datums, see the exponent encoding below). With the 52 bits of the fraction significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955). The bits are laid out as follows:

The real value assumed by a given 64-bit double-precision datum with a given biased exponent e and a 52-bit fraction is

 (-1)^{\text{sign}}(1.b_{51}b_{50}...b_{0})_2 \times 2^{e-1023}

or

 (-1)^{\text{sign}}\left(1 + \sum_{i=1}^{52} b_{52-i} 2^{-i} \right)\times 2^{e-1023}

Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 253 to 254, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 251 to 252, the spacing is 0.5, etc.

The spacing as a fraction of the numbers in the range from 2n to 2n+1 is 2n−52. The maximum relative rounding error when rounding a number to the nearest representable one (the machine epsilon) is therefore 2−53.

The 11 bit width of the exponent allows the representation of numbers with a decimal exponent between 10−308 and 10308, with full 15–17 decimal digits precision. By compromising precision, subnormal representation allows values smaller than 10−323.