# CS代考程序代写 algorithm data mining CCIT 4075: Data Mining 2020–21 Second Term

CCIT 4075: Data Mining 2020–21 Second Term

Tutorial 2: Data Compression Via SVD

Instructor: Wai-Yiu Keung

While it took years for me myself to fully understand what actually is singular value decom- position (SVD), it is surprisingly easy to implement SVD for the purpose of data compression in practice. In this tutorial I will bring you with the basics and key insights needed to perform data compression via SVD, with an emphasis in the application of image compression.

Basics of Singular Value Decomposition

We start by stating the main results. For any real matrix X ∈ Rm×n, it admits a factorization of X = U ΣV T .

We will be assuming that this X represents a square image data throughout this note. Here,

U = u1 u2 … un,

where the middle diagonal matrix Σ XT X and XXT . Note that {ui}ni=1 adopt the convention from the linear using vector outer-product notations,

T σ1 0 … 0 ←− v1 −→

Σ =

0σ2…0 ←−vT−→ T2

. . .. . , V = . …. .

0 0 … σn ←− vnT −→

has diagonal elements σi’s with σi2 being the eigenvalues of and {vi}ni=1 are orthogonal set, i.e. UT U = V T V = I. We algebra community to assume that σ1 ≥ σ2 ≥ ··· ≥ σn. By

n

X =σ1 ·u1v1T +σ2 ·u2v2T +···+σn ·unvnT =σi ·uiviT

i=1

Although this doesn’t seem to be any interesting or non-boring expression, it actually has a very

profound connection with the task of data compression, as to be discussed in subsequent sections.

Using SVD to Compress Matrix

With the above equation, together with the assumption that σ1 ≥ σ2 ≥ · · · ≥ σn; it makes sense to ignore σi’s for large i; particularly for those with σ1 ≫ σi. This leads to the approximation of

k

X ≈ σ i u i v iT = U ̄ Σ ̄ V ̄ T = X ̄ i=1

where this time the matrices are much smaller than the original matrices U , Σ and V . This is captured by Figure 1. As the number of k increases, we ignore less component of σi’s, and hence the image quality should be better. This claim is verified by the simulation performed during lecture (and in the slides).

1

Figure 1: Illustration of the SVD compression in image processing.

How Much Space Do We Save?

So far we’ve claimed that the SVD helps in compressing data. But how much space do we actually save by using the approximation X ̄ instead of the original matrix X? This is nothing but an enumeration procedure. Let us assume that every pixel in X are of the same bit-length (say, 16-bit or 32-bit per pixel). The storage used for the original data matrix X is

storage(X) = n · m pixels.

Next, observe Figure 1 to see that

storage(U ̄ ) = m · k and storage(V ̄ ) = n · k

and for the diagonal matrix Σ ̄ , it suffices to store (σ1, . . . , σk), implying storage(Σ ̄ ) will be exactly k. Therefore,

storage(X ̄ ) = k · (n + m + 1) pixels.

Putting the above together, we notice that for the SVD compression to be useful, we need to choose

a k such that storage(X ̄) ≤ storage(X), i.e. k≤n·m.

n+m+1

Concretely, this states that the number of k shall depends on the size of the data matrix X.

Remark 1. Alternatively, because of the fact that many data matrix has σi’s has a trend of “decreasing in i”, you can also use a heuristic approach to choose a k where σk has already decreased to a relatively small value compared to σ1, i.e. choose k such that σk/σ1 ≤ ε for some small positive constant ε. The former claim is supported by the various examples of testing images as shown in Figure 2.

2

Testing Image 1

Testing Image 2

Testing Image 3

500 400 300 200 100

350 300

300 250 200 150 100

250 200 150 100

50

50 000

0 20406080100 0 20406080100

0 20406080100

iii

200 150 100

50

400 300 200 100

250 200 150 100

50

Testing Image 4

Testing Image 5

500 300

Testing Image 6

0 20406080100

000 0 20406080100 0 20406080100

iii

Testing Image 7

Testing Image 8

300 300 250 250 200 200 150 150 100 100

50 50

Testing Image 9

0 20406080100

140 120 100

80 60 40 20

000 0 20406080100 0 20406080100

iii

Figure 2: Illustration of the SVD compression in image processing. 3

σi σi σi

σi σi σi

σi σi σi

Octave Implementation of the SVD Compression

The Octave implementation of the SVD algorithm is surprisingly easy. It only reads the following:

By using these few lines of command, you’ve already given with U , Σ and V ! Next, we read the corresponding smaller matrices

T σ1 0 … 0 ←− v1 −→

1 X = imread(‘testing.png’);

2 X = rgb2gray(X);

3 X = double(X); X = X./max(X(:));

4 % where the built-in SVD algorithm is used!

5 [U, S, V] = svd(X);

0 σ2 … 0 ←− vT −→ ̄ ̄ ̄T2

U= u1 u2 … uk , Σ=. . .. ., V = .

…. .

by the following command lines

0 0 … σk ←− vkT −→

6 U new = U(:, 1:k);

7 S new = diag(S); S new = diag(S new(1:k));

8 V new = V.’; V new = V new(1:k,:).’;

and finally reassemble X ̄ = U ̄ Σ ̄ V ̄ T by the command of X new = U new*S new*V new.’. It remains to show the results of the approximation. You can visualize it by using

to visualize your compression result.

Figure 3: Visualization of X (left) and its approximation X ̄ (right). 4

9 figure();

10 subplot(2,1,1); imshow(X);

11 subplot(2,1,2); imshow(X new);

Your Tasks Here

(a) Following the instructions stated above, build an Octave programme to implement the SVD data compression algorithm.

(b) Select an image by yourself. Implement the SVD data compression algorithm on it. Choose a suitable value of k and discuss on how did you obtain this k. Visualize the reconstructed image X ̄ and the original image X as in the format of Figure 3.

(c) Visualize the results of X ̄ output by your SVD algorithm with the compression rate of 10%, 20%, 40% and 80%. Here, compression rate refers to the ratio of

compression rate = storage(X ̄ ) . storage(X )

List the corresponding k’s and comment on which compression rate do you recommend to be used for your image.

Submission files shall include (i) an executable programme; (ii) your source code; (iii) a readme file describing on how to use your programme and (iv) a PDF report addressing (a)–(c) above-stated.

5