# 程序代写代做代考 scheme algorithm information retrieval data structure L5-compression-yifang

L5-compression-yifang

COMP6714: Information Retrieval & Web Search

Introduction to

Information Retrieval

Lecture 5: Index Compression

1

COMP6714: Information Retrieval & Web Search

Inverted index

▪For each term t, we must store a list of all

documents that contain t.

▪ Identify each doc by a docID, a document serial

number

▪Can we used fixed-size arrays for this?

2

Brutus

Calpurnia

Caesar 1 2 4 5 6 16 57 132

1 2 4 11 31 45 173

2 31

174

54 101

What happens if the word Caesar

is added to document 14?

COMP6714: Information Retrieval & Web Search

Inverted index

▪We need variable-size postings lists

▪ On disk, a continuous run of postings is normal and

best

▪ In memory, can use linked lists or variable length

arrays

▪ Some tradeoffs in size/ease of insertion

3

Brutus

Calpurnia

Caesar 1 2 4 5 6 16 57 132

1 2 4 11 31 45 173

2 31

174

54 101

Dictionary Postings

Sorted by docID (more later on why).

Posting

COMP6714: Information Retrieval & Web Search

Inverted index construction

4

Tokenizer

Token stream Friends Romans Countrymen

Linguistic modules

Modified tokens friend roman countryman

Indexer

Inverted index

friend

roman

countryman

2 4

2

13 16

1

Documents to

be indexed

Friends, Romans, countrymen.

COMP6714: Information Retrieval & Web Search

Where do we pay in storage?

5Pointers

Terms

and

counts

Lists of

docIDs

COMP6714: Information Retrieval & Web Search

Today

▪ Why compression?

▪ Collection statistics in more detail (with RCV1)

▪ How big will the dictionary and postings be?

▪ Dictionary compression

▪ Postings compression

Ch. 5

6

COMP6714: Information Retrieval & Web Search

Why compression (in general)?

▪ Use less disk space

▪ Saves a little money

▪ Keep more stuff in memory

▪ Increases speed

▪ Increase speed of data transfer from disk to memory

▪ [read compressed data | decompress] is faster than

[read uncompressed data]

▪ Premise: Decompression algorithms are fast

▪ True of the decompression algorithms we use

Ch. 5

7

COMP6714: Information Retrieval & Web Search

Why compression for inverted indexes?

▪ Dictionary

▪ Make it small enough to keep in main memory

▪ Make it so small that you can keep some postings lists in

main memory too

▪ Postings file(s)

▪ Reduce disk space needed

▪ Decrease time needed to read postings lists from disk

▪ Large search engines keep a significant part of the postings

in memory.

▪ Compression lets you keep more in memory

▪ We will devise various IR-specific compression schemes

Ch. 5

8

COMP6714: Information Retrieval & Web Search

RCV1: Our collection for this lecture

▪Shakespeare’s collected works definitely aren’t large

enough for demonstrating many of the points in this

course.

▪The collection we’ll use isn’t really large enough

either, but it’s publicly available and is at least a

more plausible example.

▪As an example for applying scalable index

construction algorithms, we will use the Reuters

RCV1 collection.

▪This is one year of Reuters newswire (part of 1995

and 1996)

9

COMP6714: Information Retrieval & Web Search

Reuters RCV1 statistics

▪ symbol value statistic

▪ N documents

▪ L avg. # tokens per doc

▪ M terms (= word types)

800,000

200

~400,000

▪ avg. # bytes per token 6

(incl. spaces/punct.)

▪ avg. # bytes per token 4.5

(without spaces/punct.)

▪ avg. # bytes per term 7.5

Sec. 5.1

10

COMP6714: Information Retrieval & Web Search

Index parameters vs. what we index

(details IIR Table 5.1, p.80)

size of word types (terms) non-positional

postings

positional postings

dictionary non-positional index positional index

Size

(K)

∆% cumul

%

Size (K) ∆

%

cumul

%

Size (K) ∆

%

cumul

%

Unfiltered 484 109,971 197,879

No numbers 474 -2 -2 100,680 -8 -8 179,158 -9 -9

Case folding 392 -17 -19 96,969 -3 -12 179,158 0 -9

30 stopwords 391 -0 -19 83,390 -14 -24 121,858 -31 -38

150 stopwords 391 -0 -19 67,002 -30 -39 94,517 -47 -52

stemming 322 -17 -33 63,812 -4 -42 94,517 0 -52

Exercise: give intuitions for all the ‘0’ entries. Why do some

zero entries correspond to big deltas in other columns?

Sec. 5.1

11

COMP6714: Information Retrieval & Web Search

Lossless vs. lossy compression

▪ Lossless compression: All information is preserved.

▪ What we mostly do in IR.

▪ Lossy compression: Discard some information

▪ Several of the preprocessing steps can be viewed as

lossy compression: case folding, stop words,

stemming, number elimination.

▪ Chap/Lecture 7: Prune postings entries that are

unlikely to turn up in the top k list for any query.

▪ Almost no loss quality for top k list.

Sec. 5.1

12

COMP6714: Information Retrieval & Web Search

Vocabulary vs. collection size

▪ How big is the term vocabulary?

▪ That is, how many distinct words are there?

▪ Can we assume an upper bound?

▪ Not really: At least 7020 = 1037 different words of length 20

▪ In practice, the vocabulary will keep growing with

the collection size

▪ Especially with Unicode ☺

Sec. 5.1

13

COMP6714: Information Retrieval & Web Search

Vocabulary vs. collection size

▪ Heaps’ law: M = kTb

▪ M is the size of the vocabulary, T is the number of

tokens in the collection

▪ Typical values: 30 ≤ k ≤ 100 and b ≈ 0.5

▪ In a log-log plot of vocabulary size M vs. T, Heaps’

law predicts a line with slope about ½

▪ It is the simplest possible relationship between the two in

log-log space

▪ An empirical finding (“empirical law”)

Sec. 5.1

14

COMP6714: Information Retrieval & Web Search

Heaps’ Law

For RCV1, the dashed line

log10M = 0.49 log10T + 1.64

is the best least squares fit.

Thus, M = 101.64T0.49 so k =

101.64 ≈ 44 and b = 0.49.

Good empirical fit for

Reuters RCV1 !

For first 1,000,020 tokens,

law predicts 38,323 terms;

actually, 38,365 terms

Fig 5.1 p81

Sec. 5.1

15

COMP6714: Information Retrieval & Web Search

Exercises

▪ What is the effect of including spelling errors, vs.

automatically correcting spelling errors on Heaps’

law?

▪ Compute the vocabulary size M for this scenario:

▪ Looking at a collection of web pages, you find that there

are 3000 different terms in the first 10,000 tokens and

30,000 different terms in the first 1,000,000 tokens.

▪ Assume a search engine indexes a total of 20,000,000,000

(2 × 1010) pages, containing 200 tokens on average

▪ What is the size of the vocabulary of the indexed

collection as predicted by Heaps’ law?

Sec. 5.1

16

COMP6714: Information Retrieval & Web Search

Zipf’s law

▪ Heaps’ law gives the vocabulary size in collections.

▪ We also study the relative frequencies of terms.

▪ In natural language, there are a few very frequent

terms and very many very rare terms.

▪ Zipf’s law: The ith most frequent term has frequency

proportional to 1/i .

▪ cfi∝ 1/i = K/i where K is a normalizing constant

▪ cfi is collection frequency: the number of

occurrences of the term ti in the collection.

Sec. 5.1

17

COMP6714: Information Retrieval & Web Search

Zipf consequences

▪ If the most frequent term (the) occurs cf1 times

▪ then the second most frequent term (of) occurs cf1/2

times

▪ the third most frequent term (and) occurs cf1/3 times …

▪ Equivalent: cfi = K/i where K is a normalizing factor,

so

▪ log cfi = log K – log i

▪ Linear relationship between log cfi and log i

▪ Another power law relationship

Sec. 5.1

18

COMP6714: Information Retrieval & Web Search

Zipf’s law for Reuters RCV1

19

Sec. 5.1

COMP6714: Information Retrieval & Web Search

Compression

▪ Now, we will consider compressing the space

for the dictionary and postings

▪ Basic Boolean index only

▪ No study of positional indexes, etc.

▪ We will consider compression schemes

Ch. 5

20

COMP6714: Information Retrieval & Web Search

DICTIONARY COMPRESSION

Sec. 5.2

21

COMP6714: Information Retrieval & Web Search

Why compress the dictionary?

▪ Search begins with the dictionary

▪ We want to keep it in memory

▪ Memory footprint competition with other

applications

▪ Embedded/mobile devices may have very little

memory

▪ Even if the dictionary isn’t in memory, we want it to

be small for a fast search startup time

▪ So, compressing the dictionary is important

Sec. 5.2

22

COMP6714: Information Retrieval & Web Search

Dictionary storage – first cut

▪ Array of fixed-width entries

▪ ~400,000 terms; 28 bytes/term = 11.2 MB.

Dictionary search

structure

20 bytes 4 bytes each

Sec. 5.2

23

COMP6714: Information Retrieval & Web Search

Fixed-width terms are wasteful

▪ Most of the bytes in the Term column are wasted –

we allot 20 bytes for 1 letter terms.

▪ And we still can’t handle supercalifragilisticexpialidocious or

hydrochlorofluorocarbons.

▪ Written English averages ~4.5 characters/word.

▪ Exercise: Why is/isn’t this the number to use for

estimating the dictionary size?

▪ Ave. dictionary word in English: ~8 characters

▪ How do we use ~8 characters per dictionary term?

▪ Short words dominate token counts but not type

average.

Sec. 5.2

24

COMP6714: Information Retrieval & Web Search

Compressing the term list:

Dictionary-as-a-String

….systilesyzygeticsyzygialsyzygyszaibelyiteszczecinszomo….

Total string length =

400K x 8B = 3.2MB

Pointers resolve

3.2M

positions: log23.2M

=

22bits = 3bytes

■Store dictionary as a (long) string of characters:

■Pointer to next word shows end of current word

■Hope to save up to 60% of dictionary space.

Sec. 5.2

25

COMP6714: Information Retrieval & Web Search

Space for dictionary as a string

▪ 4 bytes per term for Freq.

▪ 4 bytes per term for pointer to Postings.

▪ 3 bytes per term pointer

▪ Avg. 8 bytes per term in term string

▪ 400K terms x 19 ➔ 7.6 MB (against 11.2MB for fixed

width)

Now avg. 19

bytes/term,

not 28.

Sec. 5.2

26

COMP6714: Information Retrieval & Web Search

Blocking

▪ Store pointers to every kth term string.

▪ Example below: k=4.

▪ Need to store term lengths (1 extra byte)

….7systile9syzygetic8syzygial6syzygy11szaibelyite8szczecin9szomo….

Save 9 bytes

on 3

pointers.

Lose 4 bytes

on

term lengths.

Sec. 5.2

27

COMP6714: Information Retrieval & Web Search

Net

▪ Example for block size k = 4

▪ Where we used 3 bytes/pointer without blocking

▪ 3 x 4 = 12 bytes,

now we use 3 + 4 = 7 bytes.

Shaved another ~0.5MB. This reduces the size of the

dictionary from 7.6 MB to 7.1 MB.

We can save more with larger k.

Why not go with larger k?

Sec. 5.2

28

COMP6714: Information Retrieval & Web Search

Exercise

▪ Estimate the space usage (and savings compared to

7.6 MB) with blocking, for block sizes of k = 4, 8 and

16.

Sec. 5.2

29

COMP6714: Information Retrieval & Web Search

Dictionary search without blocking

▪ Assuming each

dictionary term equally

likely in query (not really

so in practice!), average

number of comparisons

= (1+2·2+4·3+4)/8 ~2.6

Sec. 5.2

Exercise: what if the

frequencies of query terms

were non-uniform but

known, how would you

structure the dictionary

search tree?

30

COMP6714: Information Retrieval & Web Search

Dictionary search with blocking

▪ Binary search down to 4-term block;

▪ Then linear search through terms in block.

▪ Blocks of 4 (binary tree), avg. =

(1+2·2+2·3+2·4+5)/8 = 3 compares

Sec. 5.2

31

COMP6714: Information Retrieval & Web Search

Exercise

▪ Estimate the impact on search performance (and

slowdown compared to k=1) with blocking, for block

sizes of k = 4, 8 and 16.

Sec. 5.2

32

COMP6714: Information Retrieval & Web Search

Front coding

▪ Front-coding:

▪ Sorted words commonly have long common prefix – store

differences only

▪ (for last k-1 in a block of k)

8automata8automate9automatic10automation

♢8automat*a1♢e2♢ic3♢ion

Encodes automat Extra length

beyond automat.

Begins to resemble general string compression.

Sec. 5.2

33

COMP6714: Information Retrieval & Web Search

Front Encoding [Witten, Moffat, Bell]

34

▪ Sorted words commonly have long common prefix

▪store differences only

▪ Complete front encoding

▪ (prefix-len, suffix-len, suffix)

▪ Partial 3-in-4 front encoding

▪ No encoding/compression for the first string in a block

▪ Enables binary search

String Complete Front

Encoding

Partial 3-in-4 Front

Encoding

8, automata 4, 4, mata , 8, automata

8, automate 7, 1, e 7, 1, e

9, automatic 7, 2, ic 7, 2, ic

10, automation 8, 2, on 8, , on

Assume

previous

string is

“auto”

COMP6714: Information Retrieval & Web Search

RCV1 dictionary compression summary

Technique Size in MB

Fixed width 11.2

Dictionary-as-String with pointers to every term 7.6

Also, blocking k = 4 7.1

Also, Blocking + front coding 5.9

Sec. 5.2

35

COMP6714: Information Retrieval & Web Search

POSTINGS COMPRESSION

Sec. 5.3

36

COMP6714: Information Retrieval & Web Search

Postings compression

▪ The postings file is much larger than the dictionary,

factor of at least 10.

▪ Key desideratum: store each posting compactly.

▪ A posting for our purposes is a docID.

▪ For Reuters (800,000 documents), we would use 32

bits per docID when using 4-byte integers.

▪ Alternatively, we can use log2 800,000 ≈ 20 bits per

docID.

▪ Our goal: use a lot less than 20 bits per docID.

Sec. 5.3

37

COMP6714: Information Retrieval & Web Search

Postings: two conflicting forces

▪ A term like arachnocentric occurs in maybe one doc

out of a million – we would like to store this posting

using log2 1M ~ 20 bits.

▪ A term like the occurs in virtually every doc, so 20

bits/posting is too expensive.

▪ Prefer 0/1 bitmap vector in this case

Sec. 5.3

38

COMP6714: Information Retrieval & Web Search

Postings file entry

▪ We store the list of docs containing a term in

increasing order of docID.

▪ computer: 33,47,154,159,202 …

▪ Consequence: it suffices to store gaps.

▪ 33,14,107,5,43 …

▪ Hope: most gaps can be encoded/stored with far

fewer than 20 bits.

Sec. 5.3

39

COMP6714: Information Retrieval & Web Search

Three postings entries

Sec. 5.3

40

COMP6714: Information Retrieval & Web Search

Variable length encoding

▪ Aim:

▪ For arachnocentric, we will use ~20 bits/gap entry.

▪ For the, we will use ~1 bit/gap entry.

▪ If the average gap for a term is G, we want to use

~log2G bits/gap entry.

▪ Key challenge: encode every integer (gap) with about

as few bits as needed for that integer.

▪ This requires a variable length encoding

▪ Variable length codes achieve this by using short

codes for small numbers

Sec. 5.3

41

COMP6714: Information Retrieval & Web Search

Variable Byte (VB) codes

▪ For a gap value G, we want to use close to the fewest

bytes needed to hold log2 G bits

▪ Begin with one byte to store G and dedicate 1 bit in

it to be a continuation bit c

▪ If G ≤127, binary-encode it in the 7 available bits and

set c =1

▪ Else encode G’s lower-order 7 bits and then use

additional bytes to encode the higher order bits

using the same algorithm

▪ At the end set the continuation bit of the last byte to

1 (c =1) – and for the other bytes c = 0.

Sec. 5.3

42

COMP6714: Information Retrieval & Web Search

Example

docIDs 824 829 215406

gaps 5 214577

VB code 00000110

10111000

10000101 00001101

00001100

10110001

Postings stored as the byte concatenation

000001101011100010000101000011010000110010110001

Key property: VB-encoded postings are

uniquely prefix-decodable.

For a small gap (5), VB

uses a whole byte.

Sec. 5.3

43

Hex(824)=0x0338

Hex(214577)=0x0003463

1

COMP6714: Information Retrieval & Web Search

Other variable unit codes

▪ Instead of bytes, we can also use a different “unit of

alignment”: 32 bits (words), 16 bits, 4 bits (nibbles).

▪ Variable byte alignment wastes space if you have

many small gaps – nibbles do better in such cases.

▪ Variable byte codes:

▪ Used by many commercial/research systems

▪ Good low-tech blend of variable-length coding and

sensitivity to computer memory alignment matches (vs.

bit-level codes, which we look at next).

▪ There is also recent work on word-aligned codes that

pack a variable number of gaps into one word (e.g.,

simple9)

Sec. 5.3

44

COMP6714: Information Retrieval & Web Search

Simple9

▪ Encodes as many gaps as possible in one DWORD

▪ 4 bit selector + 28 bit data bits

▪ Encodes 9 possible ways to “use” the data bits

Sec. 5.3

45

Selector # of gaps encoded Len of each gap

encoded

Wasted bits

0000 28 1 0

0001 14 2 0

0010 9 3 1

0011 7 4 0

0100 5 5 3

0101 4 7 0

0110 3 9 1

0111 2 14 0

1000 1 28 0

COMP6714: Information Retrieval & Web Search

Unary code

▪ Represent n as n 1s with a final 0.

▪ Unary code for 3 is 1110.

▪ Unary code for 40 is

11111111111111111111111111111111111111110 .

▪ Unary code for 80 is:

111111111111111111111111111111111111111111

111111111111111111111111111111111111110

▪ This doesn’t look promising, but….

46

COMP6714: Information Retrieval & Web Search

Bit-Aligned Codes

▪ Breaks between encoded numbers can occur after

any bit position

▪ Unary code

▪ Encode k by k 1s followed by 0

▪ 0 at end makes code unambiguous

47

COMP6714: Information Retrieval & Web Search

Unary and Binary Codes

▪ Unary is very efficient for small numbers such as 0

and 1, but quickly becomes very expensive

▪ 1023 can be represented in 10 binary bits, but requires

1024 bits in unary

▪ Binary is more efficient for large numbers, but it may

be ambiguous

48

COMP6714: Information Retrieval & Web Search

Elias-γ Code

▪ To encode a number k, compute

▪ kd is number of binary digits, encoded in unary

unary

binary

49

COMP6714: Information Retrieval & Web Search

Elias-δ Code

▪ Elias-γ code uses no more bits than unary, many

fewer for k > 2

▪ 1023 takes 19 bits instead of 1024 bits using unary

▪ In general, takes 2⎣log2k⎦+1 bits

▪ To improve coding of large numbers, use Elias-δ code

▪ Instead of encoding kd in unary, we encode kd + 1 using

Elias-γ

▪ Takes approximately 2 log2 log2 k + log2 k bits

50

COMP6714: Information Retrieval & Web Search

Elias-δ Code

▪ Split (kd+ 1) into:

▪ encode kdd in unary, kdr in binary, and kr in binary

51

COMP6714: Information Retrieval & Web Search

52

COMP6714: Information Retrieval & Web Search

Gamma code properties

▪ G is encoded using 2 ⎣log G⎦ + 1 bits

▪ Length of offset is ⎣log G⎦ bits

▪ Length of length is ⎣log G⎦ + 1 bits

▪ All gamma codes have an odd number of bits

▪ Almost within a factor of 2 of best possible, log2 G

▪ Gamma code is uniquely prefix-decodable, like VB

▪ Gamma code can be used for any distribution

▪ Gamma code is parameter-free

Sec. 5.3

53

COMP6714: Information Retrieval & Web Search

Gamma seldom used in practice

▪ Machines have word boundaries – 8, 16, 32, 64 bits

▪ Operations that cross word boundaries are slower

▪ Compressing and manipulating at the granularity of

bits can be slow

▪ Variable byte encoding is aligned and thus

potentially more efficient

▪ Regardless of efficiency, variable byte is conceptually

simpler at little additional space cost

Sec. 5.3

54

COMP6714: Information Retrieval & Web Search

Shannon Limit

▪ Is it possible to derive codes that are optimal (under

certain assumptions)?

▪ What is the optimal average code length for a code

that encodes each integer (gap lengths)

independently?

▪ Lower bounds on average code length: Shannon

entropy

▪ H(X) = – Σx=1n Pr[X=x] log Pr[X=x]

▪ Asymptotically optimal codes (finite alphabets):

arithmetic coding, Huffman codes

Sec. 5.3

55

COMP6714: Information Retrieval & Web Search

Global Bernoulli Model

▪ Assumption: term occurrence are Bernoulli events

▪ Notation:

▪ n: # of documents, m: # of terms in vocabulary

▪ N: total # of (unique) occurrences

▪ Probability of a term tj occurring in document di: p =

N/nm

▪ Each term-document occurrence is an independent

event

▪ Probability of a gap of length x is given by the

geometric distribution

Sec. 5.3

56

How to design an optimal

code for geometric

distribution?

COMP6714: Information Retrieval & Web Search

Golomb Code

▪ Golomb Code (Golomb 1966): highly efficient way to

design optimal Huffman-style code for geometric

distribution

▪ Parameter b

▪ For given x ≥ 1, computer integer quotient

▪ and remainder

▪ Assume b = 2k

▪ Encode q in unary, followed by r coded in binary

▪ A bit complicated if b != 2k. See wikipedia.

▪ First step: (q+1) bits

▪ Second step: log(b) bits

Sec. 5.3

57

It can also be deemed as a

generalization of the unary

code.

COMP6714: Information Retrieval & Web Search

Golomb Code & Rice Code

▪ How to determine optimal b*?

▪ Select minimal b such that

▪ Result due to Gallager and Van Voorhis 1975:

generates an optimal prefix code for geometric

distribution

▪ Small p approximation:

▪ Rice code: only allow b = 2k

Sec. 5.3

58

COMP6714: Information Retrieval & Web Search

Local Bernoulli Model

▪ If length of posting lists is known, then a Bernoulli

model on each individual inverted list can be used

▪ Frequent words are coded with smaller b, infrequent

words with larger b

▪ Term frequency need to be encoded (use gamma-

code)

▪ Local Bernoulli outperforms global Bernoulli model in

practice (method of practice!)

Sec. 5.3

59

COMP6714: Information Retrieval & Web Search

RCV1 compression

Data structure Size in MB

dictionary, fixed-width 11.2

dictionary, term pointers into string 7.6

with blocking, k = 4 7.1

with blocking & front coding 5.9

collection (text, xml markup etc) 3,600.0

collection (text) 960.0

Term-doc incidence matrix 40,000.0

postings, uncompressed (32-bit words) 400.0

postings, uncompressed (20 bits) 250.0

postings, variable byte encoded 116.0

postings, γ−encoded 101.0

Sec. 5.3

60

COMP6714: Information Retrieval & Web Search

Google’s Indexing Choice

▪ Index shards partition by doc, multiple replicates

▪ Disk-resident index

▪ Use outer parts of the disk

▪ Use different compression methods for different fields:

Ricek (a special kind of Golomb code) for gaps, and Gamma

for positions.

▪ In-memory index

▪ All positions; No docid

▪ Keep track of document boundaries

▪ Group-variant encoding

▪ Fast to decode

Sec. 5.3

61Source: Jeff Dean’s WSDM 2009 Keynote

COMP6714: Information Retrieval & Web Search

Other details

▪ Gap = docidn- docidn-1 – 1

▪ Freq = freq – 1

▪ Pos_Gap = posn- posn-1 – 1

▪ C.f., Jiangong Zhang, Xiaohui Long and Torsten Suel:

Performance of Compressed Inverted List Caching in

Search Engines. WWW 2008.

Sec. 5.3

62

COMP6714: Information Retrieval & Web Search

Index compression summary

▪ We can now create an index for highly efficient

Boolean retrieval that is very space efficient

▪ Only 4% of the total size of the collection

▪ Only 10-15% of the total size of the text in the

collection

▪ However, we’ve ignored positional information

▪ Hence, space savings are less for indexes used in

practice

▪ But techniques substantially the same.

Sec. 5.3

63

COMP6714: Information Retrieval & Web Search

Resources for today’s lecture

▪ IIR 5

▪ MG 3.3, 3.4.

▪ F. Scholer, H.E. Williams and J. Zobel. 2002.

Compression of Inverted Indexes For Fast Query

Evaluation. Proc. ACM-SIGIR 2002.

▪ Variable byte codes

▪ V. N. Anh and A. Moffat. 2005. Inverted Index

Compression Using Word-Aligned Binary Codes.

Information Retrieval 8: 151–166.

▪ Word aligned codes

Ch. 5

64