# CS计算机代考程序代写 data structure Hashes

Hashes

Quite often we modify how we represent our data for convenience

For example, in the Huffman code we took a string and made a tree (input=string, output=tree)

Hashes: who are they?

A hash function takes any input, but then outputs something a fixed size

If done well, the hash can be a fair approximation for the original (typically larger) input

Hashes: who are they?

Let¡¯s take a look at a very simple hash function: f(n) = n mod 10

5 mod 10 = 5

9 mod 10 = 9

10 mod 10 = 0 9856492059823457 mod 10 = 7 (always get a single digit as hash)

Hashes: who are they?

You¡¯ve probably used ¡°hashes¡± in your life without the fancy term:

Name initials (always 2 letters) Verifying identity (last 4 digits)

The same idea applies: not ¡°unique¡± but normally unique enough

Hashes: who are they?

Typically hashes (the output of hash functions) are used to differentiate or verify the inputs

Hash functions are typically one-way meaning performing the hash is easy but un-hashing is hard/impossible (how do you un-hash initials ¡°J.P.¡±?)

Hashes: what¡¯re they good for?

Hashes are used many places:

– Data structures (our focus)

– Twitter (¡°hashtag¡± is a hash) – Security (verification)

– File verification (ensure

downloaded files not corrupted)

Hashes: what¡¯re they good for?

Security often takes advantage that many hash functions are one-way

¡°Bypass codes¡± can utilize hashes

(backwards)

2359234

verific-

9361649

enter in order

hash

5040283

ation

h(9361649) = 2359234 h(5040283) = 9361649

Hashes: what¡¯re they good for?

When downloading large files, you may have seen ¡°SHA¡± or ¡°MD5¡±

This is a short (couple dozen hexadecimal numbers) hash of all the 1s and 0s in the large file

Hashes: what¡¯re they good for?

Suppose we want to store (key,value) pairs (called tables/maps/dictionaries)

These tables use the ¡°key¡± to look up the associated ¡°value¡± efficiently Examples:

key=umnID, value=your name key=house address, value=zip code

Tables

For a table we want 3 (or 2) things: -insert (put/add or update a key) -search (get the value of a key) -delete (remove an existing key)

To be useful, all of these commands must be quite fast (better than O(n), if ¡°n¡± keys are stored)

Tables

Let¡¯s make a table for (address, zip) We¡¯ll assume address are 4 numbers followed by a street name

Pad the left with 1, numbers with

zeros and letters as order in alphabet: 234 Main St –>1 0234 13 01 09 14 19 20

234 M A I N S T

Direct tables

Let¡¯s assume the zipcode of 234 Main St is 12345

We¡¯ll probably need to store many of these (key,value) pairs

The simplest ¡°multiple storage¡± unit is an array… so let¡¯s try that

Direct tables

We assume the keys are unique (no two houses with same address…)

So we could just directly use the key (a number) as the index in the array So for 234 Main St at zip 12345: table[10234130109141920] = 12345 … problems?

Direct tables

Problem 1: size of array

Based on the last index (from a short street name), you will need a very large array (what¡¯s the longest name)

Problem 2: Memory inefficient

The vast majority of indexes will be blank and wasted (no street ¡°aaaaa¡±)

Direct tables

Ignoring these memory issues… (1) How would you code each of

these operations?

(2) What are their runtime?

-search (get the value of a key) -insert (put/add or update a key) -delete (remove an existing key)

Direct tables

search(key)

return table[key]

insert(key, value) table[key] = value

delete(key) table[key] = null

All O(1) very nice!

Direct tables

You may have seen where this is going already…

We have a very large ¡°key¡± that we want to keep unique but find a more efficient indexing system….

Use a hash function!

(Fixed output length, semi-unique)

Hash tables

The only issue we need to resolve is how to handle hashes that ¡°overlap¡± (collide is technical term)

If this were initials ¡°James Parker¡±

and ¡°Jean-Luc Picard¡± are both ¡°J.P.¡± Need to still remember(and not lose one): (James Parker, teacher)

(Jean-Luc Picard, captain)

Hash tables

If there is a hash collision/overlap,

just store them all! (sorta like bucket

sort… but without the sorting)

stored as double linked list

(James Parker, teacher) (Jean-Luc Picard, captain)

Index: 1 2 3 250 676 Hash: aa ab ac … jp … zz Array:

(real key,value)

Hash tables

Book has a nice picture of the full double linked list implementation (double linked list makes easy insert and delete)

array index

Hash tables

How do the functions change when using hash tables?

search(key) insert(key, value) delete(key)

Hash tables: implementation

search(key)

index = hash(key) data = table[index] while data not null:

if data.key = key return data

data = data.next return null

Hash tables: implementation

insert(key, value) object = search(key) if object is null

index = hash(key)

data = (key,value)

data.next = table[index] table[index].previous = data table[index] = data

else

data.value = value

Hash tables: implementation

delete(key)

object = search(key) if object not null

object.previous.next = object.next object.next.previous = object.previous delete object

Note: need to add checks for null

Hash tables: implementation

Runtimes in hash table?

search(key) insert(key, value) delete(key)

Hash tables: analysis

Runtimes in hash table?

search(key) = O(max collision) insert(key, value) = O(1 + search) delete(key) = O(1 + search)

Worst case max collision = n (everyone¡¯s initials are ¡°J.P.¡±)

Hash tables: analysis

Average runtime of search()?

Math! For now we¡¯ll assume the hash function is good at mixing the input

If there are ¡°m¡± possible outputs of the has (for initials aa-zz, m=676), each key will have a probability 1/m of landing in any index

Hash tables: analysis

Average runtime of search() = average over keys:

length of each list with that key

Let Xij = 1 if keyi and keyj

have the same hash (h(ki)=h(kj)) otherwise Xij = 0

(called ¡°indicator function¡±)

Hash tables: analysis

if m proportional to n, this is O(1)

(like bucket sort)

i=1, j sums n-1 i=2, j sums n-2 i=3, j sums n-3 …

i=n-1, j sums 1 i=n, j sums 0 Total: 0+1+2+…+(n-1) =n(n-1)/2

Hash tables: analysis