COMP3308/COMP3608 Artificial Intelligence

Week 12 Tutorial exercises Unsupervised Learning (Clustering)

Exercise 1. K-means clustering (Homework)

Given is the one-dimensional dataset: {5, 7, 10, 12}. Run the k-means clustering algorithm for 1 epoch to cluster this dataset into 2 clusters. Assume that the initial seeds (cluster centers) are c1=3 and c2=13 and that the distance measure is the absolute distance between the examples. Show the clusters at the end of the epoch and the new cluster centers.

Copyright By cscodehelp代写 加微信 cscodehelp

Exercise 2. Nearest neighbor clustering

Use the Nearest Neighbor clustering algorithm to cluster examples A, B, C and D described by the following distance matrix. Suppose that the threshold t is 3.

ABCD A0145 B 026 C04 D0

Exercise 3. Hierarchical clustering – single link agglomerative algorithm

Use the single link agglomerative clustering to group the data described by the following distance matrix. Draw the dendrogram.

ABCD A0145 B 026 C03 D0

Exercise 4. Hierarchical clustering – complete link agglomerative algorithm

The same task as in the previous exercise but using the complete link distance measure. Exercise 5. Clustering using Weka

Load the glass.arff data. It describes different types of glass based on their chemical components. The identification of different types of glass is important for criminological investigations – it can be used as evidence.

There are 9 attributes: 1) RI: refractive index and 2)-9) measurement of the following chemical elements: Na (Sodium), Mg (Magnesium), Al (Aluminum), Si (Silicon), K (Potassium), Ca (Calcium), Ba (Barium) and Fe (Iron).

COMP3308/3608 Artificial Intelligence, s1 2022

There are 7 classes (types of glass) but for one of them there are no examples in the dataset, so there are 6 classes:

Type of glass: (class attribute)

1) building_windows_float_processed

2) building_windows_non_float_processed

3) vehicle_windows_float_processed

4) vehicle_windows_non_float_processed (none in this database) 5) containers

6) tableware

7) headlamps

1. From the Preprocess tab:

a) Select the class attribute “type” and remove it. Clustering is an unsupervised method and doesn’t use

the class attribute.

b) Normalise the data using unsupervised->attribute->Normalize filter. This is important as the

clustering algorithms we will be applying (k-means and hierarchical) are distance-based.

2. To perform clustering using the k-means algorithm, click the Cluster tab and select the

SimpleKMeans algorithm. By default it uses k=2 clusters and Euclidean distance. You can see and change the parameters of the algorithm by right clicking on the name, then Show Properties.

3. Evaluation method: check that Use training set is selected and that Store clusters for visualization is also selected (both should be the default options).

4. Run the k-means algorithm and analyse the output. It shows the within cluster sum of squared errors (the smaller the better, i.e. high cohesion) and also the centroids for each of the two clusters (circled in the figure below):

COMP3308/3608 Artificial Intelligence, s1 2022

COMP3308/3608 Artificial Intelligence, s1 2022 You can visualize the data by using Visualize cluster assignments:

Try different attributes for Y to see which attributes are more discriminative for the 2 clusters. E.g. Mg separates the 2 clusters relatively well:

You can also save the clustering results by clicking Save on the Visualization panel. The results will be saved in a .arff file. For each example, at the end of the line, Weka will add the cluster of the example (clusterr0 or cluster1). You can open and view the saved file with Weka:

Experiment with different number of clusters: right click on SimpleKMeans -> Show properties, then change k:

5. Weka also includes an implementation of the hierarchical agglomerative algorithm; it is called HierrachicalClusterer:

It includes different ways to measure the distance between the cluster – single link, complete link, etc. Explore them from Show properties -> More. Select one of them, e.g. the complete link and run the algorithm.

You can visualise the results pairwise as in the k-means algorithm and can also plot the hierarchical tree:

COMP3308/3608 Artificial Intelligence, s1 2022

Additional exercises to be done at your own time

Exercise 6. K-means clustering

Use the k-means algorithm and Euclidean distance to group the following 8 examples into 3 clusters: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).

The distance matrix based on the Euclidean distance is given below:

A1 A2 A3 A4 A5 A6 A7 A8

COMP3308/3608 Artificial Intelligence, s1 2022

25 36 13 50 52 65 5 37 18 25 17 10 20 25 2 2 53 41

13 17 52 2 2 45 25 29 29 58

Suppose that the initial seeds (centroids) are A1, A4 and A7. Run the k-means algorithm for 1 epoch only. At the end of this epoch show:

a) the new clusters (i.e. the examples belonging to each cluster)

b) the centroids of the new clusters

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com