Lecture 20: Anomaly Detection

Introduction to Machine Learning Semester 1, 2022

C 2022 The University of Melbourne Acknowledgement:

Copyright By cscodehelp代写 加微信 cscodehelp

Lecture Outline

• AnomalyDetection • Definition

• Importance • Structure

• AnomalyDetectionAlgorithms • Statistical

• Proximity-based • Density-based

• Clustering-based

What are Outliers/Anomalies?

• Anomaly: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism

• Ex.: Unusual credit card purchase, sports: , …

• Anomalies are different from noise

• Anomalies are interesting:

• They violate the mechanism that generates the normal data

• translate to significant (often critical) real life entities

• Cyber intrusions

• Credit card fraud

Noise is random error or variance in a measured variable Noise should be removed before anomaly detection

Importance of Anomaly Detection?

Ozone Depletion History

• In 1985 three researchers (Farman, Gardinar and Shanklin) were puzzled by data gathered by the British Antarctic Survey showing that ozone levels for Antarctica had dropped 10% below normal levels

• Why did the Nimbus 7 satellite, which had instruments aboard for recording ozone levels, not record similarly low ozone concentrations?

• The ozone concentrations recorded by the satellite were so low they were being treated as noise by a computer program and discarded!

Variants of Anomaly Detection Problem

• VariantsofAnomaly/OutlierDetectionProblems

• Given a database D, find all the data points x ∈ D having the top-n largest anomaly scores f(x)

• Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D

Given a database D, find all the data points x ∈ D with anomaly scores greater than some threshold t

Structure of Anomalies

• Global/Point anomalies

• Contextual/Conditional anomalies • Collective anomalies

Global/Point anomalies

• GlobalAnomaly(orpoint)

– Object is Og if it significantly deviates from the rest of the data set

– Ex.Intrusiondetectionincomputernetworks

– Issue:Findanappropriatemeasurementofdeviation

Contextual anomalies

• Contextual Anomaly (or conditional)

• Object is Oc if it deviates significantly based on a selected

• Attributes of data objects should be divided into two groups

• Contextual attributes: defines the context, e.g., time & location

• Behavioral attributes: characteristics of the object, used in anomaly evaluation, e.g., temperature

• Can be viewed as a generalization of local anomalies—whose density significantly deviates from its local area

• Issue: How to define or formulate meaningful context?

* Song, et al, “Conditional Anomaly Detection”, IEEE Transactions on Data and Knowledge Engineering, 2006.

Example of Contextual Anomalies

Ex. 10o C in Paris: Is this an anomaly?

Collective anomalies

Collective Anomalies

• A subset of data objects that collectively deviate significantly from the whole data set, even if the individual data objects may not be anomalies

E.g., intrusion detection:

When a number of computers keep sending denial-of-service • Detection of collective anomalies

packages to each other

Consider not only behavior of individual objects, but also that of

groups of objects

Requires background knowledge about the relationship among

data objects, such as a distance or similarity measure on objects.

Example of Collective anomalies

• Requires a relationship among data instances

• The individual instances within a collective anomaly are not anomalous by themselves

anomalous subsequence

Sequential data Spatial data Graph data

Anomaly detection paradigms: supervised, semi-supervised, and unsupervised

Supervised Anomaly Detection

Supervised anomaly detection

• Labels available for both normal data and anomalies

• Samples examined by domain experts used for training & testing • Challenges

Require both labels from both normal and anomaly class Imbalanced classes, i.e., anomalies are rare: Boost the

anomaly class and make up some artificial anomalies

Cannot detect unknown and emerging anomalies

Catch as many outliers as possible, i.e., recall is more important than accuracy (i.e., not mislabeling normal objects as outliers)

Semi-supervised Anomaly Detection

Semi-Supervised anomaly detection

Labels available only for normal data

Model normal objects & report those not matching the model as

Challenges

Require labels from normal class

Possible high false alarm rate – previously unseen (yet legitimate)

data records may be recognized as anomalies

Unsupervised Anomaly Detection I

Unsupervised anomaly detection

• Assume the normal objects are somewhat “clustered” into multiple

groups, each having some distinct features

• An outlier is expected to be far away from any groups of normal

General steps

Build a profile of “normal” behavior

summary statistics for overall population model of multivariate data distribution

• Use the “normal” profile to detect anomalies

anomalies are observations whose characteristics

differ significantly from the normal profile

Unsupervised Anomaly Detection II

Unsupervised anomaly detection Challenges

• Normal objects may not share any strong patterns, but the

collective outliers may share high similarity in a small area

• Ex. In some intrusion or virus detection, normal activities are diverse

• Unsupervised methods may have a high false positive rate but still miss many real outliers.

Many clustering methods can be adapted for unsupervised methods

• Find clusters, then outliers: not belonging to any cluster

• Problem 1: Hard to distinguish noise from outliers

• Problem 2: Costly since first clustering: but far less outliers than normal objects

Unsupervised anomaly detection: Approaches

• Statistical (or: model-based)

• Assume that normal data follow some statistical model

• Proximity-based

• An object is an outlier if the nearest neighbors of the object are far away

• Density-based

• Outliers are objects in regions of low density

• Clustering-based

Normal data belong to large and dense clusters

Statistical Anomaly detection

Statistical anomaly detection

Anomalies are objects that are fit poorly by a statistical model. • Idea: learn a model fitting the given data set, and then identify the

objects in low probability regions of the model as anomalies

• Assumption:normaldataisgeneratedbyaparametricdistribution

with parameter θ

– Theprobabilitydensityfunctionoftheparametricdistributionf(x,θ)

gives the probability that object x is generated by the distribution – Thesmallerthisvalue,themorelikelyxisanoutlier

Challenges of Statistical testing:

– highly depends on whether the assumption of statistical model holds in the real data

Visualizing the data

Graphical Approaches

• Boxplot (1-D), Scatter plot (2-D), Spin plot (3-D)

• Limitations: Time consuming, Subjective

Image: https://en.wikipedia.org/wiki/Box_plot#/media/File:Boxplot_with_outlier.png

Univariate data — General Approach

Avg. temp.: x={24.0, 28.9, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4} • Use the maximum likelihood method to estimate μ and σ

• Fortheabovedataxwithn=10:

• Decide on a confidence limits, e.g.,

• Then 24 is an outlier since:

(24 – 28.61) /1.51 = – 3.04 < –3
Image: https://en.wikipedia.org/wiki/Standard_deviation#/media/File:Standard_deviation_diagram.svg
Multivariate Data
• Multivariate Gaussian distribution
– OutlierdefinedbyMahalanobisdistance – Grubb’stestonthedistances
Mahalanobis
Mahalanobis Distance
Mahalanobis Distance
• Mahalanobis Distance
• S is the covariance matrix: • For 2-dimensional data:
Likelihood approach
• Assume the dataset D contains samples from a mixture of two probability distributions:
• M (majority distribution)
• A (anomalous distribution)
• Generalapproach:
• Initially, assume all the data points belong to M
• Let Lt(D) be the log likelihood of D at time t
• For each point xt that belongs to M, move it to A
Let Lt+1 (D) be the new log likelihood.
Compute the difference, Δ = Lt(D) – Lt+1 (D)
If Δ > c (some threshold), then xt is declared as an anomaly and moved permanently from M to A

Likelihood approach

Data distribution, D = (1 –λ) M +λA

• M is a probability distribution estimated from data • A is initially assumed to be uniform distribution

• Likelihood at time t:

L(D) P(x)(1λ)t

|A| P(x)λt

i1xM xA

LL(D)M log(1λ)logP (x)A logλlogP (x)

xM xA it it

ttMitAi tt

Statistical Anomaly detection

• Statistical tests are well-understood and well-validated.

• Quantitative measure of degree to which object is an outlier.

• Data may be hard to model parametrically.

• multiple modes

• variable density

• In high dimensions, data may be insufficient to estimate true distribution.

Proximity-based Anomaly detection

Proximity-based Anomaly detection

Anomalies are objects far away from other objects.

• An object is an anomaly if the nearest neighbors of the object are far away, i.e., the proximity of the object significantly deviates from the proximity of most of the other objects in the same data set

• Common approach:

– Outlier score is distance to kth nearest neighbor. – Scoresensitivetochoiceofk.

Proximity-based anomaly detection

Proximity-based anomaly detection

Proximity-based outlier detection

– Easiertodefineaproximitymeasureforadatasetthandetermineits statistical distribution.

– Quantitativemeasureofdegreetowhichobjectisanoutlier.

– Dealsnaturallywithmultiplemodes.

– O(n2)complexity.

– Scoresensitivetochoiceofk.

– Doesnotworkwellifdatahaswidelyvariabledensity.

Density-based Anomaly detection

Density-based outlier detection

Outliers are objects in regions of low density. • Outlierscoreistheinverseofthedensity

around a point

• Scoresusuallybasedonproximities. • Examplescores:

• #pointswithinafixedradiusd

• Reciprocalofaveragedistancetok

nearest neighbors:

density(x,k)1 distance(x,y)1 k yN(x,k)

Tend to work poorly if data has variable density.

Image: https://en.wikipedia.org/wiki/Local_outlier_factor#/media/File:Reachability-distance.svg

Density-based outlier detection

Relative density outlier score

Local Outlier Factor (LOF)

Reciprocal of average distance to k nearest neighbors, relative to that of those k neighbors.

relative density(x, k ) density(x, k )

1 density(y,k) k yN(x,k)

Image: https://en.wikipedia.org/wiki/File:LOF-idea.svg

Density-based outlier detection

In the NN approach, o2 is not

considered as outlier, while LOF approach find both o1

and o2 as outliers!

Density-based outlier detection

• Quantitative measure of degree to which object is an outlier.

• O(n2) complexity

• Must choose parameters

• k for nearest neighbor

• d for distance threshold

Can work well even if data has variable density.

Cluster-based Anomaly Detection

Cluster-based outlier detection

Outliers are objects that do not belong strongly to any cluster.

Approaches:

– Assess degree to which object belongs to any cluster.

– Eliminate object(s) to improve objective function.

– Discard small clusters far from other clusters

– Outliers may affect initial formation of clusters.

Cluster-based outlier detection

Assess degree to which object belongs to any cluster.

For prototype-based clustering (e.g. k-means), use distance to cluster centers.

To deal with variable density clusters, use relative distance:

distance(x,centroidC ) medianx’C distance(x’,centroidC )

Similar concepts for density-based or connectivity-based clusters.

Cluster-based outlier detection

distance of points from nearest centroid

Cluster-based outlier detection

relative distance of points from nearest centroid

Cluster-based outlier detection

Eliminate object(s) to improve objective function.

1) Form initial set of clusters.

2) Remove the object which most improves objective function.

3) Repeat step 2) until …

Discard small clusters far from other clusters.

• Need to define thresholds for “small” and “far”.

Cluster-based outlier detection

• Some clustering techniques have O(n) complexity.

• Extends concept of outlier from single objects to groups of objects.

• Requires thresholds for minimum size and distance.

• Sensitive to number of clusters chosen.

• Hard to associate outlier score with objects.

• Outliers may affect initial formation of clusters.

• Types of outliers

• Supervised, semi-supervised, or unsupervised

• Statistical, proximity-based, clustering-based approaches

• Ethics in Machine Learing

• Friday: Defining and measuring (un)fairness in ML

Next Wednesday: Guest lecture (live)

References

• Tan et al (2006) Introduction to Data Mining. Section 4.3, pp 150-171. (Chapter 10)

• V. Chandola, A. Banerjee, and V. Kumar, (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.

• A. Banerjee, et al (2008). Tutorial session on anomaly detection. The SIAM Data Mining Conference (SDM08)

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com