CS代考计算机代写 Floating Point

Floating Point

á Fixedpointrepresentations á Big and Small Numbers
á ScientificNotation
á IEEE754floatingpointstandard ± Special symbols
± Underflow overflow
á Floatingpointadditionandmultiplication á Material from section 3.5 of textbook

How to Represent Real Numbers?

Real Numbers á Positionalnotationallowsforfractions
anan-1………a1a0 . a-1a-2……..a-m
á Let’s start with fixed point representation ± Choose n and m
± Radix point is always in the same position
± Easy to implement
± Limited range

á 152.310
á 1011.012
Real Numbers

Binary to Decimal
á Integersscaledbyanappropriatefactor á Directexpansionwithpositionalweights á 0.110012

Binary to Hexadecimal á Use the same trick as before

Decimal to Binary
á Multiply by 2 and note the integer part
á Subtractintegerpartandrepeatuntilnofractionleft

Decimal to Binary
á Can all decimal fractions be expressed exactly in Binary? 0.110

How to Represent Small and Big Numbers in Decimal?

How big is Coronavirus?
Size (meter)
Red Blood Cell
0.00001 0.000007
Bacteria Coronavirus
0.0000005 0.0000001
Particles filtered by masks

What numbers do we need?
1.0 × 10-9
3.15576 × 109 1.47 × 1013 2.99792458 × 1010 6.67300 × 10-11 1.98892 × 1030 2.08 × 1022
1.0 × 10-15
Seconds per nanosecond Seconds per century
US National Debt
Speed of light in cm/s Gravitational constant
Mass of sun in kilograms Distance to Andromeda in m Size of a proton in meters

Scientific Notation for Decimal
á Weusescientificnotationforbigandsmallnumbers ± Use a single digit to the left of the decimal point
± Multiplied by base (e.g., 10) raised to some exponent
± Use e or E to denote the exponent part
1.0 × 10-15 1.0e-15 1.0E-15
á Anormalizednumberhasnoleadingzero ± 1.010 x 10-9 normalized
± 0.110 x 10-8 not normalized
± 10.010 x 10-10 not normalized

Scientific Notation for Binary
á How do we represent very small and big numbers in Binary?
á Binary numbers can be written in scientific notation too
1.02 x 2-1 1.12 x 211

How to Represent Floating Points?

Floating Point
á Thebinarypointisnotfixed,butinsteadcanmovebasedon the exponent
Normalized Binary number always has the form:
± x is the fraction / significand / coefficient / mantissa ± y is the exponent
± always has a one to the left of the binary point
1.xxxxxxx2 × 2yyyy

Floating Point Standards
á Manyoptionsforrepresentingfloatingpoint ± Number of bits for the fraction
± Number of bits for the exponent
± How to represent zero?
± How to represent negative numbers?
á Standardsareimportantforexchangingdata

Floating Point Standards á IEEE 754 used in nearly all computers today
± Defines two representations á single precision (32 bits)
á double precision (64 bits)
In high level languages, data of this type is called
á float (single precision)
á double (for double precision)

Single Precision
Sign Exponent Fraction 1 bit 8 bits 23 bits
A real number can be described as (-1)S x (1+F) x 2E
á IEEE 754 does not use 2’s complement
á Clarification:
± Fraction refers to the 23-bit number F
± Mantissa refers to the 24-bit number 1+F

á Numbers are in normalized form. Why? Base 2
S Exponent
Single Precision
Sign Exponent Fraction 1 bit 8 bits 23 bits
A real number can be described as (-1)S
x (1+F) x 2E
0.0011 ×20 0.011 × 2>Ì
0 0000000
0 1111111
0 1111110
011000… 110000…
0.11 × 2>Ó
All equivalent to the same real number. The encoding is wasteful

Biased Notation
Sign Exponent Fraction 1 bit 8 bits 23 bits
á In IEEE 754, actual representation is
(-1)S x (1 + Fraction) x 2(Exponent t Bias)
á In single-precision, bias = 127
á Represent negative exponents
á Wanteasyintegerstylecomparison/sorting

Single Precision Floating Point
Sign Exponent Fraction 1 bit 8 bits 23 bits
á Largest number?
á Smallestnumber?
á How many numbers can we represent?
(-1)S x (1+F) x 2E

Single Precision Floating Point
á Convert -0.75 from decimal to single precision

Single Precision Floating Point
á ConvertÌÏÏÏÏÏÏÌÏÌÏÏÏYÏÏÏÏfromsingleprecisionto decimal:

Double Precision
Sign Exponent Fraction 1 bit 11 bits 52 bits
á More bits!
á More precision
á Double precision uses a bias of 1023
á Can do more before underflow / overflow ± Approximately 1E-308 to 1E308

Double Precision
á Convert 3.25 from decimal to double precision

Tricky Questions
á What is the largest number that can be represented in single precision?
á What is the smallest number that can be represented in single precision?

Floating Point Arithmetic

á Add the significands
Floating Point Addition
á Align the radix points
± Make the smaller number to match the larger
á Normalize the result
± What if one number is positive and the other negative? ± May need to shift a lot!
± Check for overflow or underflow when shifting!
á Round so number fits in available digits/bits ± If bad luck when rounding, renormalize

Floating Point Addition
9.999e1 + 1.610e-1 with 4 digits precision

á Adding exponents
Floating Point Multiplication
á Multiplythesignificands
á Normalize the result (check for overflow)
á Round to fit in available digits/bits ± Normalize again if necessary
á Compute sign of result
± Positive if signs of operands match, negative otherwise

Floating Point Multiplication
1.110e10 times 9.200e-5 with 4 digits precision

Special Cases?

Special symbols
Exponent Fraction Object represented
0 1-254 255 255
Nonzero Anything 0 Nonzero
± denormalized number ± floating point number ± infinity
NaN (Not a Number)

Denormalized Numbers
á The exponent 00000000 is used to represent a set of numbers in the tiny interval ( -2-126, 2-126 )
á This includes the number 0
á Calleddenormalizednumbers
± Smallest normalized is 1.0 x 2-126 = 2-126
± Smallest denormalized is 0.000 μ μ μ 01 x 2-126 = 2-149
á Allows us to squeeze more precision out of a floating point operation
á Tricky to implement. We will come back to this topic later

Unusual events
á Nonzerodividedbyzero
± Not the end of the world!
± Results in positive or negative infinity
á 0/0(invalid),orsubtractinginfinityfrominfinity ± Results in NaN
á Notes on NaN
± Using NaN in math always results in NaN
± Allows us to avoid tests or decisions until a later time in our program

What can go wrong?

Overflow / Underflow
á Largest number that can be represented in single precision:
Approximately ±2.0 x 2128 = 2.0 x 1038
á Smallest fraction that can be represented in single precision:
Approximately ±2.0 x 2-128 = 2.0 x 10-38
á Overflow: representing a number larger than the one above;
á Underflow: representing a number smaller than the one above

Loss of Precision

Compare these for loops
for ( int i = 0; i <= 10; i += 1 ) { System.out.println( i/10f ); for ( float y = 0; y <= 1; y += 0.1f ) { } System.out.println( y ); Same or different? Questions á Represent 0.110 in IEEE 754 single precision floating point á Represent 1.110 in IEEE 754 single precision floating point? Review and more information á Big and Small Numbers á Scientific Notation á IEEE754floatingpointstandard á Floating point addition and multiplication á Material from Section 3.5 of textbook

Leave a Reply

Your email address will not be published. Required fields are marked *