# CS代考计算机代写 Floating Point

Floating Point

á Fixedpointrepresentations á Big and Small Numbers

á ScientificNotation

á IEEE754floatingpointstandard ± Special symbols

± Underflow overflow

á Floatingpointadditionandmultiplication á Material from section 3.5 of textbook

Agenda

How to Represent Real Numbers?

Real Numbers á Positionalnotationallowsforfractions

anan-1………a1a0 . a-1a-2……..a-m

á Let’s start with fixed point representation ± Choose n and m

± Radix point is always in the same position

± Easy to implement

± Limited range

á 152.310

á 1011.012

Real Numbers

Binary to Decimal

á Integersscaledbyanappropriatefactor á Directexpansionwithpositionalweights á 0.110012

Binary to Hexadecimal á Use the same trick as before

0.1101010012

0.2BE16

Decimal to Binary

á Multiply by 2 and note the integer part

á Subtractintegerpartandrepeatuntilnofractionleft

0.62510

Decimal to Binary

á Can all decimal fractions be expressed exactly in Binary? 0.110

How to Represent Small and Big Numbers in Decimal?

How big is Coronavirus?

Particle

Size (meter)

PM10

Red Blood Cell

0.00001 0.000007

PM2.5

0.0000025

Bacteria Coronavirus

0.0000005 0.0000001

Particles filtered by masks

0.000000007

What numbers do we need?

ÔXÌðÌÒıÓY ÓXÛÌÙÓÙY

1.0 × 10-9

3.15576 × 109 1.47 × 1013 2.99792458 × 1010 6.67300 × 10-11 1.98892 × 1030 2.08 × 1022

S

1.0 × 10-15

e

Seconds per nanosecond Seconds per century

US National Debt

Speed of light in cm/s Gravitational constant

Mass of sun in kilograms Distance to Andromeda in m Size of a proton in meters

Scientific Notation for Decimal

á Weusescientificnotationforbigandsmallnumbers ± Use a single digit to the left of the decimal point

± Multiplied by base (e.g., 10) raised to some exponent

± Use e or E to denote the exponent part

1.0 × 10-15 1.0e-15 1.0E-15

á Anormalizednumberhasnoleadingzero ± 1.010 x 10-9 normalized

± 0.110 x 10-8 not normalized

± 10.010 x 10-10 not normalized

Scientific Notation for Binary

á How do we represent very small and big numbers in Binary?

á Binary numbers can be written in scientific notation too

1.02 x 2-1 1.12 x 211

How to Represent Floating Points?

Floating Point

á Thebinarypointisnotfixed,butinsteadcanmovebasedon the exponent

Normalized Binary number always has the form:

± x is the fraction / significand / coefficient / mantissa ± y is the exponent

± always has a one to the left of the binary point

1.xxxxxxx2 × 2yyyy

Floating Point Standards

á Manyoptionsforrepresentingfloatingpoint ± Number of bits for the fraction

± Number of bits for the exponent

± How to represent zero?

± How to represent negative numbers?

á Standardsareimportantforexchangingdata

Floating Point Standards á IEEE 754 used in nearly all computers today

± Defines two representations á single precision (32 bits)

á double precision (64 bits)

In high level languages, data of this type is called

á float (single precision)

á double (for double precision)

Single Precision

Sign Exponent Fraction 1 bit 8 bits 23 bits

SEF

A real number can be described as (-1)S x (1+F) x 2E

á IEEE 754 does not use 2’s complement

á Clarification:

± Fraction refers to the 23-bit number F

± Mantissa refers to the 24-bit number 1+F

á Numbers are in normalized form. Why? Base 2

S Exponent

Mantissa

Single Precision

Sign Exponent Fraction 1 bit 8 bits 23 bits

SEF

A real number can be described as (-1)S

x (1+F) x 2E

0.0011 ×20 0.011 × 2>Ì

0 0000000

001100…

0 1111111

0 1111110

011000… 110000…

0.11 × 2>Ó

All equivalent to the same real number. The encoding is wasteful

Biased Notation

Sign Exponent Fraction 1 bit 8 bits 23 bits

SEF

á In IEEE 754, actual representation is

(-1)S x (1 + Fraction) x 2(Exponent t Bias)

á In single-precision, bias = 127

á Represent negative exponents

á Wanteasyintegerstylecomparison/sorting

Single Precision Floating Point

Sign Exponent Fraction 1 bit 8 bits 23 bits

SEF

á Largest number?

á Smallestnumber?

á How many numbers can we represent?

(-1)S x (1+F) x 2E

Single Precision Floating Point

á Convert -0.75 from decimal to single precision

Single Precision Floating Point

á ConvertÌÏÏÏÏÏÏÌÏÌÏÏÏYÏÏÏÏfromsingleprecisionto decimal:

Double Precision

Sign Exponent Fraction 1 bit 11 bits 52 bits

SEF

á More bits!

á More precision

á Double precision uses a bias of 1023

á Can do more before underflow / overflow ± Approximately 1E-308 to 1E308

Double Precision

á Convert 3.25 from decimal to double precision

Tricky Questions

á What is the largest number that can be represented in single precision?

á What is the smallest number that can be represented in single precision?

Floating Point Arithmetic

á Add the significands

Floating Point Addition

á Align the radix points

± Make the smaller number to match the larger

á Normalize the result

± What if one number is positive and the other negative? ± May need to shift a lot!

± Check for overflow or underflow when shifting!

á Round so number fits in available digits/bits ± If bad luck when rounding, renormalize

Floating Point Addition

9.999e1 + 1.610e-1 with 4 digits precision

á Adding exponents

Floating Point Multiplication

á Multiplythesignificands

á Normalize the result (check for overflow)

á Round to fit in available digits/bits ± Normalize again if necessary

á Compute sign of result

± Positive if signs of operands match, negative otherwise

Floating Point Multiplication

1.110e10 times 9.200e-5 with 4 digits precision

Special Cases?

Special symbols

Exponent Fraction Object represented

000

0 1-254 255 255

Nonzero Anything 0 Nonzero

± denormalized number ± floating point number ± infinity

NaN (Not a Number)

Denormalized Numbers

á The exponent 00000000 is used to represent a set of numbers in the tiny interval ( -2-126, 2-126 )

á This includes the number 0

á Calleddenormalizednumbers

± Smallest normalized is 1.0 x 2-126 = 2-126

± Smallest denormalized is 0.000 μ μ μ 01 x 2-126 = 2-149

á Allows us to squeeze more precision out of a floating point operation

á Tricky to implement. We will come back to this topic later

Unusual events

á Nonzerodividedbyzero

± Not the end of the world!

± Results in positive or negative infinity

á 0/0(invalid),orsubtractinginfinityfrominfinity ± Results in NaN

á Notes on NaN

± Using NaN in math always results in NaN

± Allows us to avoid tests or decisions until a later time in our program

What can go wrong?

Overflow / Underflow

á Largest number that can be represented in single precision:

Approximately ±2.0 x 2128 = 2.0 x 1038

á Smallest fraction that can be represented in single precision:

Approximately ±2.0 x 2-128 = 2.0 x 10-38

á Overflow: representing a number larger than the one above;

á Underflow: representing a number smaller than the one above

Loss of Precision

https://imgur.com/r/totallynotrobots/lsNcv

}

Compare these for loops

for ( int i = 0; i <= 10; i += 1 ) {
System.out.println( i/10f );
for ( float y = 0; y <= 1; y += 0.1f ) {
}
System.out.println( y );
Same or different?
Questions
á Represent 0.110 in IEEE 754 single precision floating point
á Represent 1.110 in IEEE 754 single precision floating point?
Review and more information
á Big and Small Numbers
á Scientific Notation
á IEEE754floatingpointstandard
á Floating point addition and multiplication á Material from Section 3.5 of textbook