[Subject home],
[FAQ],
[progress],
Bib',
Alg's,
C ,
Java- L.A.,
Friday, 29-Mar-2024 17:36:03 AEDT Instructions:
Topics discussed in these lecture notes are examinable
unless otherwise indicated.
You need to follow instructions,
take more notes &
draw diagrams especially as [indicated] or
as done in lectures,
work through examples, and
do extra reading.
Hyper-links not in [square brackets] are mainly for revision,
for further reading, and for lecturers of the subject.
i.e. floating point numbers (real / float / double).
Will take about two lectures.
<exponent, mantissa>
limited numerical accuracy
e.g. 1/3 and 1/5 are not represented exactly
perhaps 6 or 16 decimal digits
(cf Babbage'sdifference engine had (has)
30 decimal digits of precision and his
Analytical Engine was to have
40 decimal digits of precision.)
Problem: Solve equation f(x)=0.
Problem: Integrate function . . .
Numerical
accuracy and
errors.
e.g. IEEE
"single precision"
floating-point representation:
S
exponent (8)
mantissa or fraction (23)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
S
E
E
E
E
E
E
E
E
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
normal value =
(-1)S . 2E-127 . (1.F)2
+0: E=0, F=0, S=0
-0: E=0, F=0, S=1
NaN, not a number:
E=FF16=25510, F<>0
+oo : E=FF16=25510, F=0, S=1
-oo : E=FF16=25510, F=0, S=0
unnormalised if E=0 & F<>0,
(-1)S . 2-126 . (0.F)2
least +ve value
= 2-126 . (0.000000000000000000000012)= 2-149
(Also IEEE "double precision":
64-bit, S 1-bit, E 11-bits, F 52-bits.)
It is generally better to combine small numbers first
before combining them with large numbers.
Consider SUMi=1.. ( 1 / i )
-- sum to infinity is a divergent series
There is a number `delta' s.t.
delta > 0 and yet apparently 1.0 = 1.0 - delta.
[lecturer: use the
demo';
class: note value of delta & probable representation.]
In some cases limited numerical accuracy
can cause severe errors . . .
? Big - ( Big' - small )
= ( Big - Big' ) + small ?
if Big = Big'
then Big - ( Big' - small )
= Big - (Big - small) which
may equal Big - Big = 0
but ( Big - Big' ) + small
= 0 + small
= small <> 0