# CS代考计算机代写 Each student must complete this assignment individually and independently; violation will automatically void the 20% weight of the assignment. Penalty for late submission: 30% deduction for 1 day late, 50% deduction for 2 days late. No mark will be given for 3 days late. If you need clarification of assignment questions, please contact the instructor. However, do not “fish” for the answers by asking answers specific questions. A reminder that all assignments are preparing you to the final exam. You must pass the final exam to pass the course. Therefore, it is essential to do the assignments by yourself to understand concepts and clarify confusion.

Each student must complete this assignment individually and independently; violation will automatically void the 20% weight of the assignment. Penalty for late submission: 30% deduction for 1 day late, 50% deduction for 2 days late. No mark will be given for 3 days late. If you need clarification of assignment questions, please contact the instructor. However, do not “fish” for the answers by asking answers specific questions. A reminder that all assignments are preparing you to the final exam. You must pass the final exam to pass the course. Therefore, it is essential to do the assignments by yourself to understand concepts and clarify confusion.

Do not write excessively long answers. The marker will assign marks by identifying several key points in the answer. Short and concise answers can help the marker easily identify the key points. Provide the logic behind the final answer. For example, instead of writing only the final I/O cost like 100, it helps the marker know how you get this final answer if you write 20*5=100.

Question 1 (15 marks, 3 marks each part). Consider an extendible hash index with all of directory pages (storing pointers to buckets), bucket pages (storing data entries), and data pages (storing data records) on disk. The index uses Alternative (2) for data entries. Assume there is no overflow page.

There are 1,000,000 records.

Each record has 100 Bytes.

Each data entry has 20 Bytes.

Each directory entry has 10 Bytes.

A disk page has 4000 Bytes.

Each bucket is 1/2 page full.

• What is the number of data pages? What is the number of bucket pages? What is the number of directory pages?

• What is number of I/O pages required to retrieve a record by a search key value k using this index, assuming that the search key is a candidate key?

• In (2), if the search key is not a candidate key and 10 records match the search key value k, what is the I/O cost to retrieve the matching records using the hash index. You can assume the hash index is clustered but first explain what is required of a clustered hash index.

• Same as (3) but assume the hash index is unclustered.

Question 2 (9 marks, 3 marks each part) Consider the linear hash structure below. Show the new structure (including Next, Level, N) after inserting each of the following records accumulatively in that order. Assume that adding a new overflow page will trigger splitting a bucket.

• a record with the search key 7 (i.e., binary 111)

• a record with the search key 2 (i.e., binary 010)

• a record with the search key 1 (i.e., binary 001)

Note that you need to show the new structure of inserting each of above records.

Level=0,N=4

H1

H0

Primary pages

Overflow pages

0000

000

32*, 8*

0001

001

9*, 25*, 1*, 9*

Next=3

Next=3

010

42*, 18*, 10*, 34*

0011

011

31*, 35*, 7*. 11*

43*

0100

100

44*, 36*

0101

101

5*, 29*

0110

110

14*, 30*, 22*, 38*

46*

Question 3 (9 marks, 3 marks each part) A Employee file has three attributes EmpID, Salary, and Start_date, where EmpID is a candidate key. We want to answer three queries on a continuous basis during some period of time:

Q1: EmpID=a,

Q2: a<=Salary<=b,
Q3: a<=Start_date<=b.
Assume that the frequency of asking Q1, Q2 and Q3 is 60%, 30%, and 10%, respectively. Suppose you want to build indexes on this file to answer these queries efficiently. What indexes (i.e., clustered or unclustered index, sorted or hashed index, or no index at all) would you choose? Which index (or no index at all) will you use to answer which query? Why?
Question 4 (10 marks, 5 marks each part). A B+tree index has the height of 4 (including the root), stored on disk using Alternative (2) for data entries. The number of data entry pages is 1,000 and the number of data record pages is 100,000, and the number of data records is 10M. Suppose that the search key value k=5 matches 10% of data records in the file. What is the I/O cost (in page) of retrieving the records matching the search key k=5 using the index under the following two cases:
• The index is clustered.
• The index is unclustered.
Question 5 (15 marks, 5 marks each part)
A disk block contains 512 bytes. The search key, record pointer and page pointer occupy 8 bytes, 6 bytes and 4 bytes, respectively. Data entries use Alternative (2). A leaf node in B+-tree has a “previous-leaf pointer” and a “next-leaf-pointer”.
• Compute the order d of the B+-tree. Note that the order for leaf nodes and index nodes may be different.
• Given the above parameters, what is the minimum number of data records that a 2-level B+-tree (not counting the root) can index.
• Given the above parameters, what is the maximum number of data records that a 2-level B+-tree (not counting the root) can index.