# 程序代写代做代考 Excel database chain cache Microsoft PowerPoint – 16- Indexing-p2

Microsoft PowerPoint – 16- Indexing-p2

© 2018 A. Alawini & A. Parameswaran

Indexing:
B+ Tree &

Hash Tables
Doris Xin, Abdu Alawini

University of Illinois at Urbana-Champaign

CS411: Database Systems

October 24, 2018

1

© 2018 A. Alawini & A. Parameswaran

Announcements

• HW 3: Due by Friday 10/26 (23:59)

(23:59)
https://wiki.illinois.edu/wiki/display/cs411sfa18/Project+Track+1
+Midterm+Demo+Signup

• Midterm review session: Friday 10/26 (4:00-4:50) SC 1404
• To suggest topics to discuss in the review session, please fill this

form: https://goo.gl/forms/5fDcm80cDjmtMJ0H3

•Please fill the early course feedback form:
https://goo.gl/forms/SC4BYcrDy8dai8PE2

•Midterm: 10/29 in class 11-12:15 pm
2

© 2018 A. Alawini & A. Parameswaran

Today’s lecture

•Indexing
• Continue with B+ Trees
• Hash Tables

3

© 2018 A. Alawini & A. Parameswaran

What is the best part of the class so far?

4

© 2018 A. Alawini & A. Parameswaran

What is the least useful part of the
course so far?

5

© 2018 A. Alawini & A. Parameswaran

What would you change to make the
course even better?

6

© 2018 A. Alawini & A. Parameswaran

B+ Trees (Recap)

•Multi-level index on a specific search key
•Nodes contain keys and pointers

• Internal nodes: point to other nodes
• Leaf nodes: point to data records; last pointer to the next leaf node

7

20 60

10

10 15 18 20 30 50

15 18 20 30 50

© 2018 A. Alawini & A. Parameswaran

B+ Trees (Recap con’t)

•Each node can contain up to n keys;
• All nodes have the same capacity (n max keys)
• Degree d = n/2  the minimum # of keys per node

(assume n is even for simplicity)
• Each node has between [d, 2d] keys at all times
• Except root node, which can have only one key.

• In practice, each node has its own file block
• For a 4KB block, we can accommodate up to 340 keys (d=170).
• In practice, d = 100, 66.5% fill-in factor  133 keys
• Visiting one node = one disk read (latency ~ 105-107 ns)
• First 3 levels of can be cached in main memory (latency ~ 100 ns) to

8

an artificial
requirement to
make B+ Trees
“balanced”

© 2018 A. Alawini & A. Parameswaran

B+ Trees (Recap con’t)

•Do B+ Trees always help?
• No. e.g., an array of sorted integers.

•Types of queries to answer with a B+ Tree:
• Exact key value, e.g., SELECT name FROM people WHERE age=20
• Range queries, e.g., SELECT name FROM people WHERE age>=20

< n) • Idea 2: allow overflow blocks (not expensive to overflow) • Convention: Read from the right (as opposed to the left) © 2018 A. Alawini & A. Parameswaran 50 Linear Hash Table Example • N=3 <= 22 = 4 • Therefore, only buckets until 10 • When inserting 0111, 11 is flipped => 01

(01)00

(10)10

i = 2

n = 3

r = 4

00
01
10

(01)01

(01)11 BIT FLIP

© 2018 A. Alawini & A. Parameswaran
51

Linear Hash Table Example

• Insert 1001:

(01)00

(10)10

00
01
10

(01)01

(01)11

i = 2

n = 3

r = 4

© 2018 A. Alawini & A. Parameswaran
52

Linear Hash Table Example

• Insert 1001: overflow blocks…

(01)00

(10)10

00
01
10

(01)01

(10)01

(01)11

i = 2

n = 3

r = 5

© 2018 A. Alawini & A. Parameswaran
53

Linear Hash Tables

• Extend n  n+1 when average number of records per bucket exceeds
(say) 80% of total number of records per block

• e.g., r/n <= 0.8 * 2 = 1.6 (for block size = 2) • Until then, use overflow blocks (cheaper than adding buckets) r/n = 5/3 = 1.67 > 1.6

(01)00

(10)10

00
01
10

(01)01

(10)01

(01)11

i = 2

n = 3

r = 5

© 2018 A. Alawini & A. Parameswaran
54

Linear Hash Table Extension

• From n=3 to n=4

• Only need to touch
one block (which one ?)

(01)00

(10)10

00
01
10

(01)01

(10)01

(01)11

00
01
10
11

i = 2

n = 3

r = 5

i = 2

n = 4

r = 5

(01)11

(01)00

(10)10

(01)01

(10)01

(01)11

© 2018 A. Alawini & A. Parameswaran
55

Linear Hash Table Extension

• From n=3 to n=4 finished

(01)01

(10)01

(01)11

00
01
10 (10)10

(01)00

11

i = 2

n = 4

r = 5

r/n = 5/4 = 1.25 < 1.6 ✔ © 2018 A. Alawini & A. Parameswaran 56 Summary • B+ Trees (search, insertion, deletion) • Good for point and range queries • Log time lookup, insertion and deletion because of balanced tree • Hash Tables (search, insertion) • Static hash tables: one I/O lookup, unless long chain of overflow • Extensible hash tables: one I/O lookup, extension can take long • Linear hash tables: ~ one I/O lookup, cheaper extension •No panacea; dependent on data and use case © 2018 A. Alawini & A. Parameswaran 57 Index 2.0 • Learn the best index from the data and queries logs • Machine Learning to the rescue! • Recall, an index is a function • Machine learning are good at learning functions from data • What’s your cool idea for a better index?

Posted in Uncategorized