COMP3308/3608, Lecture 13a

ARTIFICIAL INTELLIGENCE

Applications of Artificial Intelligence – Recommender Systems

, COMP3308/3608 AI, week 13a, 2022 1

Copyright By cscodehelp代写 加微信 cscodehelp

• Introduction

• Collaborative filtering approaches

• Content-based approaches

, COMP3308/3608 AI, week 13a, 2022 2

Recommender Systems

Recommender Systems (RS) recommend items to people Which digital camera (mobile phone, washing machine, etc) should I buy?

What is the best holiday for me?

Which movie should I rent?

Which web sites will I find interesting?

Which book should I buy for my next holiday?

Which degree and university are the best for my future?

, COMP3308/3608 AI, week 13a, 2022 3

Motivation

• WWW has become the primary source of information

• It has a huge content -> need to reduce the information

• RS help users to find products, services and information, by predicting their relevance

, COMP3308/3608 AI, week 13a, 2022 4

Three Main Approaches

1) Collaborative Filtering

• “Show me books that are popular among my peers”

• Uses the preferences of the community and not the item’s features

2) Content-Based

• “Show me more of the type of books I like”

• Uses only the item’s features and not the preferences of the

3) Hybrid – combinations of collaborative filtering and content-based

, COMP3308/3608 AI, week 13a, 2022 5

Collaborative Filtering

, COMP3308/3608 AI, week 13a, 2022 6

Collaborative Filtering (CF)

The most prominent and widely used approach Well understood, many variations

Main idea: use the “wisdom of the crowd” Input:

a matrix of user–item ratings, i.e. assumes that the target user U had rated some of the items

a predicted rating for a new (unseen) item indicating how much U will like it

a list of predicted items (top-N list) sorted in decreasing order based on their scores

, COMP3308/3608 AI, week 13a, 2022 7

Example – Movielens

• User U rates some movies: actual

• The system uses U’s ratings, together with the ratings of other users, to give recommendations to U:

predicted ratings

, COMP3308/3608 AI, week 13a, 2022 8

Main CF Methods

• Nearest neighbour

• user-to-user – uses the similarity between users

• item-to-item – uses the similarity between items

• Advanced

• Matrix factorization – maps users and items to a joint factor

• Probabilistic

• Clustering-based

• Using association rules

, COMP3308/3608 AI, week 13a, 2022 9

User-to-User CF

• Makes predictions using the similarity between users

• Finds like-minded users who can complement each other’s

Given: a target user Alice and an item 𝑖 not yet seen by Alice

• Find (neighbourhood) N – a set of users (nearest neighbors), who liked the same items as Alice in the past and who have rated item 𝑖

• Combine the ratings of the users in N (e.g. take the average) to predict if Alice will like item 𝑖

• Do this for all items Alice has not seen and recommend the best-rated

, COMP3308/3608 AI, week 13a, 2022 10

• Given: a matrix of ratings of Alice and other users

• Goal: Find if Alice will like or dislike Item5, which Alice has not

rated (seen) yet

Questions:

1) How do we measure similarity between users?

2) How many neighbors should we consider?

3) How do we generate a prediction from the neighbors’ ratings?

, COMP3308/3608 AI, week 13a, 2022 11

Prediction for Alice

1) Correlation as a similarity (distance measure)

2) 2-Nearest-Neighbour – closest neighbours are User1 and User3 3) Average of neighbors’ ratings:

=> P(Alice, item5) = (3+4)/2=3.5

sim = 0.85

sim = 0.00

sim = 0.70

sim = -0.79

COMP3308/3608 AI, week 13a, 2022 12 12

Similarity Measure – Correlation

• Pearson correlation coefficient

• Most popular in user-based collaborative filtering

σ𝒑 ∈𝑷(𝒓𝒂,𝒑 − 𝒓ത𝒂)(𝒓𝒃,𝒑 − 𝒓ത𝒃)

σ 𝒓 −𝒓ത 𝟐 σ 𝒓 −𝒓ത 𝟐 𝒑∈𝑷𝒂,𝒑𝒂 𝒑∈𝑷𝒃,𝒑𝒃

𝒂, 𝒃 – users

𝑷 -setofitems,ratedbyboth𝒂and𝒃 𝒓𝒂,𝒑 – rating of user 𝒂 for item 𝒑

𝒓ത𝒂 – average rating of user 𝒂

, COMP3308/3608 AI, week 13a, 2022 13

Combining the Ratings – Various Ways

• How many neighbors?

• use a similarity threshold or fixed number of neighbors

• How to combine the ratings of the neighbors?

• Simple average

• Weighed average (by the similarity between the neighbour and target user)

• Weighed normalised adjusted average – commonly used:

Prediction for weighing by the similarity Is b’s rating of 𝑝 higher or

user a for item p between a & b lower than b’s average 𝒑𝒓𝒆𝒅 𝒂,𝒑 =𝒓𝒂 +σ𝒃∈𝑵𝒔𝒊𝒎 𝒂,𝒃 ∗(𝒓𝒃,𝒑 −𝒓𝒃) rating?

a’s average σ𝒃 ∈𝑵 𝒔𝒊𝒎 𝒂, 𝒃

normalization

• Adjusted = considers rating differences between users

, COMP3308/3608 AI, week 13a, 2022 14

Improving the Prediction – Variations

• Not all neighbor ratings are equally valuable

• E.g. agreement on commonly liked items is not so informative as

agreement on controversial items

• Solution: Give more weight to items that have a higher variance

• Number of co-rated items

• Reduce the weight when the number of co-rated items is low

• Case amplification

• Higher weight to very similar neighbors, i.e. similarity close to 1

, COMP3308/3608 AI, week 13a, 2022 15

Item-to-Item CF

• Makes predictions using the similarity between items

• Main idea:

• Find items that are similar to the new item (item5) based on the rankings • Use Alice’s ratings for these items to predict her rating for the new item

e.g. P(Alice, item5) =(5+4)/2=4.5

Recommendation for Alice: Alice will like item 5, as she bought and liked

items 1 and 4, and item 5 is similar to items 1 and 4 based other people’s

, COMP3308/3608 AI, week 13a, 2022 16

Similarity Measure – Cosine

• Most popular in item-based collaborative filtering

• Ratings are seen as vector in n-dimensional space

• Cosine similarity is the angle between the vectors

𝒂∙𝒃 𝒂 ∗|𝒃|

• Adjusted cosine similarity

• Considers also the average user ratings to transform the original

• 𝑈: set of users who have rated both items 𝑎 and 𝑏

σ𝒖∈𝑼(𝒓𝒖,𝒂 − 𝒓𝒖)(𝒓𝒖,𝒃 − 𝒓𝒖)

σ 𝒓−𝒓𝟐σ 𝒓−𝒓𝟐 𝒖∈𝑼 𝒖,𝒂 𝒖 𝒖∈𝑼 𝒖,𝒃 𝒖

, COMP3308/3608 AI, week 13a, 2022 17

Combining the Ratings

• Common prediction function – weighted prediction: 𝒑𝒓𝒆𝒅𝒖,𝒑 =σ𝒊∈𝒓𝒂𝒕𝒆𝒅𝑰𝒕𝒆𝒎(𝒖)𝒔𝒊𝒎𝒊,𝒑 ∗𝒓𝒖,𝒊

σ𝒊∈𝒓𝒂𝒕𝒆𝒅𝑰𝒕𝒆𝒎(𝒖) 𝒔𝒊𝒎 𝒊, 𝒑

, COMP3308/3608 AI, week 13a, 2022 18

Can we Precompute the Similarities?

• Rating matrix: a large number of items and a small number of ratings per user

• User-to-user collaborative filtering

• similarity between users is unstable (changes considerably when only a few ratings are changing (e.g. added) because of the small number of commonly ranked items)

• => pre-computing the similarities leads to poor performance

• Item-to-item collaborative filtering

• similarity between items is more stable

• we can pre-compute the item-to-item similarity and the nearest neighbours

• prediction involves lookup for these values and computing the weighed sum (Amazon does this)

, COMP3308/3608 AI, week 13a, 2022 19

Explicit vs Implicit Ratings

, COMP3308/3608 AI, week 13a, 2022 20

Explicit Ratings

• Explicitly provided by the user

• Most commonly used scales: 1 to 5 or 1 to 7 on Likert scale

• Explicit ratings are reliable but it is often difficult to obtain a sufficient number of ratings

• Small number of available ratings => sparse rating matrices => poor recommendation quality

• How to encourage users to rate more items?

, COMP3308/3608 AI, week 13a, 2022 21

Implicit Ratings

• Collected automatically by the web shop or application in which the recommender system is embedded

• When a customer buys an item, many recommender systems interpret this behaviour as a positive rating

• Clicks, page views, time spent on some page, demo downloads, etc. • Main problem – might not be reliable

• One cannot be sure whether the user behaviour is correctly interpreted

• E.g. a user might not like all the books he bought or might have bought a book for someone else

• Typically used in conjunction with the explicit ratings

, COMP3308/3608 AI, week 13a, 2022 22

Advanced CF Methods

, COMP3308/3608 AI, week 13a, 2022 23

SVD-based CF

• Idea: Generate predictions faster by reducing the dimensionality of the rating matrix

• Based on Singular Value Decomposition (SVD) for dimensionality reduction of the rating matrix

• SVD is widely used for dimensionality reduction – e.g. feature selection in ML, image compression, efficient indexing and retrieval in IR (Latent Semantic Analysis)

, COMP3308/3608 AI, week 13a, 2022 24

• Any n x m matrix X (n>=m) can be written as the product of 3

U – n x m orthogonal matrix

Vt – the transpose of matrix V, where V is a m x m orthogonal matrix

– m x m diagonal matrix containing the singular values along the main diagonal, sorted in increasing order

• X is the original data

• V defines a new set of axes which provide information about the variance in data (the 1st axis is the most important – it goes in the direction of the highest variance, the 2nd axis is orthogonal to the 1st and goes in the direction of the second highest variance, etc)

• U is the transformed data

• We can approximate X by retaining only the most important axes, those with the largest singular values, and project the data on them -> new features, smaller number

• In our example (slide 28), we calculate 𝑈, 𝑉, and Σ but retain only the 2 most important axes by taking only the first 2 columns of 𝑈 and the first two rows of 𝑉t

, COMP3308/3608 AI, week 13a, 2022 25

Without data reduction

SVD – Graphical Representation

Withdata m k k m

COMP3308/3608 AI, week 13a, 2022 26

Compression Ratio

• Space needed before SVD: n x m

• Space needed after SVD: n*k + k + k*m

• The first k columns of U (n dimensional vectors), the k singular vectors and the first k columns of V (m dimensional vectors) => total = k(n+1+m)

• Compression ratio

r=nk+k+km nm

• For n >> m >k, this ratio is approximately k/m

, COMP3308/3608 AI, week 13a, 2022 27

SVD-Based Recommendation

new features (factors)

Calculating the prediction for user u and item i=EPL:

= 3 + 0.84 = 3.84

Predictions are generated faster on the low dimensional data

M =U VT kkkk

rˆ =r +U (Alice) VT(EPL) uiuk kk

, COMP3308/3608 AI, week 13a, 2022 28

Matrix Factorization

• An extension of SVD-based recommendation

• The rating matrix is sparse and the standard SVD cannot be directly

• Earlier systems filled in the missing data and then applied SVD – data distortion, inaccurate predictions

• Current approach

• Use only the available ratings

• Formulate the problem of finding U and V as an optimisation problem (minimizing the reqularized squared error on the set of known ratings)

• Use stochastic gradient descent to solve it

• Matrix factorization is the state-of-the-art approach for CF

Y. Koren, R. Bell and C. Volinsky (2009). Matrix Factorization Techniques for Recommender Systems. IEEE Computer, Volume 42, Issue 8, p.30-37.

, COMP3308/3608 AI, week 13a, 2022 29

Probabilistic CF

• We can use Naïve Bayes

• What is the probability that Alice will give rating 1 to item 5, given her

previous ratings?

• Evidence E (Alice’s previous ratings) – 5 pieces:

E=item1=1, item2=3, item3=3, item4=2

• 5 hypotheses H (possible ratings for item5): H1: item5=1, H2: item5=2,…, H5: item5=5

• Assumption: the previous ratings (5 pieces of E) are independent of each other given a hypothesis:

, COMP3308/3608 AI, week 13a, 2022 30

P(H | E) = P(E | H)P(H) P(E)

P(H | E) =

P(Ei |H)P(H)

Probabilistic CF

Estimate the right hand side from the data for the other users:

𝑷 𝒊𝒕𝒆𝒎𝟓=𝟏𝑬

=𝑷 𝑰𝒕𝒆𝒎𝟏=𝟏𝑰𝒕𝒆𝒎𝟓=𝟏 ×𝑷 𝑰𝒕𝒆𝒎𝟐=𝟑𝑰𝒕𝒆𝒎𝟓=𝟏

×𝑷 𝑰𝒕𝒆𝒎𝟑=𝟑𝑰𝒕𝒆𝒎𝟓=𝟏 ×𝑷 𝑰𝒕𝒆𝒎𝟒=𝟐𝑰𝒕𝒆𝒎𝟓=𝟏 =𝟐×𝟏×𝟏×𝟏≈𝟎.𝟏𝟐𝟓 𝟐𝟐𝟐𝟐

𝑷 𝒊𝒕𝒆𝒎𝟓=𝟐𝑬

=𝑷 𝑰𝒕𝒆𝒎𝟏=𝟏𝑰𝒕𝒆𝒎𝟓=𝟐 ×𝑷 𝑰𝒕𝒆𝒎𝟐=𝟑𝑰𝒕𝒆𝒎𝟓=𝟐

×𝑷 𝑰𝒕𝒆𝒎𝟑=𝟑𝑰𝒕𝒆𝒎𝟓=𝟐 ×𝑷 𝑰𝒕𝒆𝒎𝟒=𝟐𝑰𝒕𝒆𝒎𝟓=𝟐 =𝟎×⋯×⋯×⋯=𝟎 𝟎

𝑷 𝒊𝒕𝒆𝒎𝟓=𝟑𝑬 =… 𝑷 𝒊𝒕𝒆𝒎𝟓=𝟒𝑬 =… 𝑷 𝒊𝒕𝒆𝒎𝟓=𝟓𝑬 =…

Select the hypothesis (rating for item 5) with highest probability

, COMP3308/3608 AI, week 13a, 2022 31

Clustering-Based CF

• Cluster the users based on their ratings

• Various distance measures and clustering algorithms

• Each cluster is represented with its centroid, a vector whose i-th component is the average rating of all users in the cluster for item i

• To make a recommendation for a new user U, find the closest centroid to U’s ratings and use the predictions of the users in this cluster

COMP3308/3608 AI, week 13a, 2022 32

Using Association Rules

• Association rules are used to analyse purchasing behaviour of customers

– Customers who buy X and Y, also buy Z

• Rule quality measures:

support(X−Y)= #(XY) n

confidence(X−Y)= #(XY) #X

, COMP3308/3608 AI, week 13a, 2022 33

Association Rules-Based CF

Transform 5-point ratings into binary

• e.g. rating = 1, if it is above the user’s average score and 0 otherwise

Mine rules such as Item1 → Item5 • support=2/4

• confidence=2/2 (without Alice)

To make recommendations for Alice:

• Left-hand side – find “relevant” rules based on Alice’s transactions (the above rule will be relevant as Alice bought Item1)

• Right-hand side – find items not already bought by Alice and sort them based on the rules’ confidence values

Different variations possible

• dislike statements, user associations, etc.

, COMP3308/3608 AI, week 13a, 2022 34

CF – Strengths and Weaknesses

• Strengths

• Minimum knowledge engineering – requires only a rating

matrix and no knowledge about the features of the products • Weaknesses

• Requires user community

• Requires sufficient number of co-rated items

• Sparse rating matrix

• Cold start problem – users need to rate items before a recommendation can be made

• Does not provide explanation for the recommendation

, COMP3308/3608 AI, week 13a, 2022 35

Content-Based Recommenders

, COMP3308/3608 AI, week 13a, 2022 36

Content-Based Recommender

• Uses information about the items and not about the user community

• e.g. recommend fantasy novels to people who liked fantasy novels in the past

• Has its roots in Information Retrieval

• What we need:

• Information about the content of the items (e.g. for movies: genre, leading actors, director, awards, etc.)

• Information about what the user likes (user preferences, also called user profile) – explicit (e.g. movie rankings by the user) or implicit

• Task: recommend items that match the user preferences

, COMP3308/3608 AI, week 13a, 2022 37

Classification Task

• Content-based recommendation can be formulated as a classification/regression task

• Using the rated data as a training set, build a classifier, for each user, that predicts the rating of new items

Item’s features

class: like/dislike

Leading actors

Keywords or Movie review

The lives of others

Drama, triller

Koch, , ̈he

Oscar and 66 other awards

In 1984 East Berlin, an agent of the secret police, conducting surveillance on a writer and his lover, …

The Spanish apartment

Comedy, drama, romance

France, Spain

Romain Duris, ,…

A French student moves into an apartment in Barcelona with six other European students. Together, they speak the international language of love and friendship. …

The sea inside

Drama, biography

Javier Bardem, Belén Rueda, ̃as

Oscar and 61 other awards

The real-life story of , who …

Mike Myers,

A 1960s hipster secret agent is brought out of cryofreeze …

, COMP3308/3608 AI, week 13a, 2022 38

Representing Text Documents

• Content-based approaches are mainly used for recommending text- based products, e.g. web pages, news, etc.

• The task becomes text classification

• Standard representation of a document: a vector of tf-idf values for

each selected word

• Each document is represented as a vector of words, each word is

represented with its tf-idf value (weight)

• tf (term frequency) part: measures how often a term (word) appears a document, assumes that important terms appear more often

• idf (inverse document frequency) part: aims to reduce the weight of terms that appear in all documents

• Instead of single words, bigrams and n-grams can be used • An example of bigram: “machine learning”

, COMP3308/3608 AI, week 13a, 2022 39

Example of tf-idf Representation

• tf part – term frequency:

• Each document is a count vector in N 𝒗

“Antony” appears 73 times in the document

Antony and

Each document is represented as a vector 𝒗 with dimension 𝒗 = 𝟕

Example from http://informationretrieval.org

, COMP3308/3608 AI, week 13a, 2022 40

Example of tf-idf Representation (2)

• tf-idf weights

• The idf part penalizes terms that appear in many documents

Antony and Cleopatra

The Tempest

1.21 6.1 000

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com