# CS计算机代考程序代写 computer architecture mips assembly c/c++ c++ Arithmetic for Computers 2: Floating Point Numbers

Arithmetic for Computers 2: Floating Point Numbers
CS 154: Computer Architecture Lecture #9
Winter 2020
Dept. of Computer Science, UCSB

• Lab 4 due today! • Lab5outsoon
• Syllabus (Schedule Section) has been updated 2/5/20 Matni, CS154, Wi20 2

Midterm Exam (Wed. 2/12) What’s on It?
• Everything we’ve done so far from start to Monday, 2/10
What Should I Bring?
• Your pencil(s), eraser, MIPS Reference Card (on 1 page)
• You can bring 1 sheet of hand-written notes (turn it in with exam). 2 sides ok.
What Else Should I Do?
• IMPORTANT: Come to the classroom 5-10 minutes EARLY
• If you are late, I may not let you take the exam
• IMPORTANT: Use the bathroom before the exam – once inside, you cannot leave
• Random seat assignments
2/5/20 Matni, CS154, Wi20 3

Lecture Outline
• Floating Point Numbers Representations • IEEE 754 F-P Standard
• Arithmetic in F-P
• Instructions for F-P
• Hardware implementations
2/5/20 Matni, CS154, Wi20 4

Floating Point
• Representation for non-integral numbers
• Including very small and very large numbers
• Usually follows some ”normalized” form
of scientific notation
2/5/20 Matni, CS154, Wi20 5

Floating Point Numbers in CPUs
We need 3 pieces of information to produce a binary floating point number:
+/- N x 2E
The mantissa (aka significand) of the number
The sign of the number (positive or negative)
2/5/20 Matni, CS111, Sp19 6
The exponent of the number

Representation in MIPS (Single Precision)
• The actual form is: (-1)S x (1 + Fraction) x – Bias
• Called the IEEE 754 F-P Standard (more on this coming up)
• MIPS design for “single-precision” has:
8 bits for exponent and 23 bits for fraction
• Gives a range from 2.0 x 10-38 to 2.0 x 1038 – quite large!
• Overflow can occur: here it means that the exponent is too large to be represented in the exponent field.
• If a negative exponent is too large, then we get underflow.
2/5/20 Matni, CS154, Wi20 7
2
Exponent

Double Precision Floating Points
• Single Precision is float in C/C++
• Double Precision is double in C/C++
• 64 bits (2 words) instead of 32 bits • 11 bits for exponent (instead of 8) • 52 bits for fraction (instead of 23)
Gives a wider range and greater precision than single-precision
Range is: 2.0 x 10-308 to 2.0 x 10308
2/5/20 Matni, CS154, Wi20 8

IEEE 754 Floating-Point Standard
• Includes single and double-precision definitions (since 1980s) • Very widespread in almost all CPUs today
• S = 0èpositive S = 1ènegative •
• The ”Bias” is 127 for single-precision and 1023 for double-precision Examples with single-precision:
The ”1” in “1 + Fraction” is implicit
S=0, E=0x82, F=0 is: (+1) x (1 + 0) x 2 (130-127) =1×23 =8
S=0, E=0x83, F=0x600000 is: (+1) x (1 + 0.11) x 2 (131-127)
=1.11×24 =11100=28
2/5/20 Usefulwebsite: https://www.h-Msactnhi,mCS1i5d4,tW.ni20et/FloatConverter/IEEE754.html 9

More Examples!
• Hex word for single-precision F-P is: 0x3FA00000 • So:
0011 1111 1010 0000 … 0000
S=0 E=0x7F=127 F=010…0 • So:
Number = (+1) x (1 + 0.01) x 2(127 – 127) = 1.01 (bin) =1+1×2-2 =1.25
2/5/20
Matni, CS154, Wi20 10

Yet More Examples!!
• Hex word for single-precision F-P is: 0xBF300000 • So:
1011 1111 0011 0000 … 0000
S=1 E=0x7E=126 F=011…0 • So:
2/5/20
Matni, CS154, Wi20 11
Number
= (-1) x (1 + 0.011) x 2(126 – 127) = 1.011 (bin) = -(1 + (1 x 2-2) + (1 x 2-3)) x 2-1
= -(1 + 0.25 + 0.125) x 0.5
= -0.6875
2-1 = 0.5
2-2 = 0.25
2-3 = 0.125 2-4 = 0.0625 2-5 = 0.03125

Even More Examples!!!
• What is the single-precision word (in hex) of the F-P number 29.125? • Ok, here we go:
I am reminded that 0.125 = 2-3
And, I know that 29 in binary is: 11101
So 29.125(10) = 11101.001(2) = 1.1101001 x 24 This is a positive number, so S = 0
F = 1101001000…0 (23 bits in all)
E = 4 + 127 = 131 = 10000011
• So:
Number in bin = 0 10000011 1101001000…0
or 0100 0001 1110 1001 0…0 = 0x41E90000
2/5/20
Matni, CS154, Wi20 12
2-1 = 0.5
2-2 = 0.25
2-3 = 0.125 2-4 = 0.0625 2-5 = 0.03125

Special Exponent Values
Consider Single-Precision Numbers:
• Exponents 0x00 and 0xFF are reserved
• Smallest exponent is 1 è Actual exponent = 1 – 127 = -126 • Smallest fraction is 0
•So,Iget±1.0×2-126 ≅±1.2×10–38
• Largest exponent is 0xFE = 254èActual exp. = 127 • Largest fraction is 111…11 , which approaches 1 •So,Iget±2.0×2+127 ≅±3.4×10+38
2/5/20 Matni, CS154, Wi20 13

Special IEEE 754 Values
• IEEE 754 allows for special symbols to represent “unusual events” •WhenS=0, E=0xFF, F=0,
IEEE calls the number “inf” (i.e. infinity) •“-inf”iswhenS=1, E=0xFF, F=0
• These are to optionally allow programmers to divide by 0.
• Allows for the result of invalid operations
These are called “Not a Number” or “NaN”
2/5/20
• Example: 0/0 , inf – inf, etc…
Matni, CS154, Wi20 14

Consider a 4-digit decimal example:
9.999 x 101 + 1.610 x 10–1
1.
2. 3. 4.
Align decimal points
• Shift number with smaller exponent • 9.999 x 101 + 0.016 x 101
Add significands • 10.015 x 101
2/5/20
Matni, CS154, Wi20 15
Normalize result & check for over/underflow • 1.0015 x 102
Round and renormalize if necessary (what? why? Be patient…) • 1.002 x 102

Consider a 4-digit binary example:
1.000 x 2-1 + -1.110 x 2–2
1.
2. 3. 4.
Align decimal points
• Shift number with smaller exponent • 1.000 x 2-1 + -0.111 x 2-1
Add significands • 0.001 x 2-1
2/5/20
Matni, CS154, Wi20 16
Normalize result & check for over/underflow • 1.000 x 2-4
Round and renormalize if necessary • 1.000 x 2-4 = 0.0625

Re: Rounding in Binary F-P
• Can we create ANY floating point number in binary?
• What about 0.3333… (i.e. 1/3)?
• In binary, 1/10 is the infinitely repeating fraction 0.0001100110011001100110011001100110011001100…
• Since we cannot create ALL F-P numbers in binary, rounding (i.e. approximating) is necessary
• Many users are not aware of the approximation because of the way values are displayed
• The actual stored value is the nearest representable binary fraction
2/5/20 Matni, CS154, Wi20 17

C++ Program to Illustrate Rounding in Binary F-P
#include
#include
int main()
{
// Try running the program without the next 2 lines
// as a comparison. Or change the precision number around.