# 程序代写 IFN647 Week 12 Workshop: WordCloud and Clustering – cscodehelp代写

IFN647 Week 12 Workshop: WordCloud and Clustering

********************************************************************

Task 1. Working with csv files

CSV stands for ¡°comma-separated values¡±. A csv file is a simplified spreadsheet stored as a plain text file. Please see the attached example.csv and its example.xlsx.

Copyright By cscodehelp代写 加微信 cscodehelp

Please try the following to read a csv file and save the contents in a list.

>>> dFile=open(‘example.csv’)

>>> dReader=csv.reader(dFile)

>>> df = list(dReader)

Task 2. Generating Word-Cloud in Python

Word-Cloud is a data visualization method that is used for representing text data in which the size of each word indicates its frequency or importance. Significant information can be

highlighted using a word cloud, see more details at

https://www.geeksforgeeks.org/generating-word-cloud-python/

The following modules are needed for

To install them, run the following commands:

The attached is a csv file that you can find

You are required to write a python program to open this csv file, store the csv file into a list of rows, select the CONTENT

generating word cloud in

Python: matplotlib, pandas and wordcloud.

pip install matplotlib

pip install pandas

pip install wordcloud

Learning Repository. It consists of YouTube comments on videos

Link: https://archive.ics.uci.edu/ml/machine-learning-

from UCI Machine

of popular artists (Dataset

databases/00380/

column, and produce a word cloud figure to show the important information in

(a) All CONTENTS

(b) The positive CONTENTS (the class = 1)

(c) The negative CONTENTS (the class = 0)

Task 3. k-Means clustering using python

The k-means algorithm aims to partition n documents X into k clusters C in which each document belongs to the cluster with the nearest mean ¦Ìj (the cluster centre or cluster centroid)

The k-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion:

In practice, the k-means algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That¡¯s why it can be useful to restart it several times.

sklearn.cluster.KMeans

fit(X[, y, sample_weight]) Compute k-means clustering.

fit_predict(X[, y, sample_weight]) Compute cluster centers and predict cluster index for each sample.

fit_transform(X[, y, sample_weight]) Compute clustering and transform X to cluster-distance space.

get_params([deep]) Get parameters for this estimator.

predict(X[, sample_weight]) Predict the closest cluster each sample in X belongs to.

score(X[, y, sample_weight]) Opposite of the value of X on the K-means objective.

set_params(**params) Set the parameters of this estimator.

transform(X) Transform X to a cluster-distance space.

Design a python program to

(a) Cluster the following six documents X (where each document is represented as a tripe) into 3 clusters (i.e., assign labels (0, 1, 2) to them) and print the centres of each cluster.

[[1 2 1] [1 4 2] [1 0 0] [10 2 0] [10 4 1] [10 0 5]]

(b) Assign cluster labels to four incoming documents [0, 0,

0], [12, 3, 5], [11, 0, 6] and [11, 2, 0] based on the

centres calculated in (a).

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com