IFN647 Week 12 Workshop: WordCloud and Clustering
Task 1. Working with csv files
CSV stands for ¡°comma-separated values¡±. A csv file is a simplified spreadsheet stored as a plain text file. Please see the attached example.csv and its example.xlsx.
Copyright By cscodehelp代写 加微信 cscodehelp
Please try the following to read a csv file and save the contents in a list.
>>> df = list(dReader)
Task 2. Generating Word-Cloud in Python
Word-Cloud is a data visualization method that is used for representing text data in which the size of each word indicates its frequency or importance. Significant information can be
highlighted using a word cloud, see more details at
The following modules are needed for
To install them, run the following commands:
The attached is a csv file that you can find
You are required to write a python program to open this csv file, store the csv file into a list of rows, select the CONTENT
generating word cloud in
Python: matplotlib, pandas and wordcloud.
pip install matplotlib
pip install pandas
pip install wordcloud
Learning Repository. It consists of YouTube comments on videos
from UCI Machine
of popular artists (Dataset
column, and produce a word cloud figure to show the important information in
(a) All CONTENTS
(b) The positive CONTENTS (the class = 1)
(c) The negative CONTENTS (the class = 0)
Task 3. k-Means clustering using python
The k-means algorithm aims to partition n documents X into k clusters C in which each document belongs to the cluster with the nearest mean ¦Ìj (the cluster centre or cluster centroid)
The k-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion:
In practice, the k-means algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That¡¯s why it can be useful to restart it several times.
fit(X[, y, sample_weight]) Compute k-means clustering.
fit_predict(X[, y, sample_weight]) Compute cluster centers and predict cluster index for each sample.
fit_transform(X[, y, sample_weight]) Compute clustering and transform X to cluster-distance space.
get_params([deep]) Get parameters for this estimator.
predict(X[, sample_weight]) Predict the closest cluster each sample in X belongs to.
score(X[, y, sample_weight]) Opposite of the value of X on the K-means objective.
set_params(**params) Set the parameters of this estimator.
transform(X) Transform X to a cluster-distance space.
Design a python program to
(a) Cluster the following six documents X (where each document is represented as a tripe) into 3 clusters (i.e., assign labels (0, 1, 2) to them) and print the centres of each cluster.
[[1 2 1] [1 4 2] [1 0 0] [10 2 0] [10 4 1] [10 0 5]]
(b) Assign cluster labels to four incoming documents [0, 0,
0], [12, 3, 5], [11, 0, 6] and [11, 2, 0] based on the
centres calculated in (a).
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: email@example.com