# 程序代写代做代考 Excel algorithm finance Ensemble

Ensemble

Lecture 11: Online Learning

Prof. Michael R. Lyu

Computer Science & Engineering Dept.

The Chinese University of Hong Kong

1

CMSC5741 Big Data Tech. & Apps.

1

A Motivating Example– Spam Filtering

2

Incoming Emails

Spams

Not Spams

Spam Filter

Incoming Emails

Spams

Not Spams

Spam Filter

Traditional Method: Training

3

Feature extraction: X

A batch of Emails

Linear classifier- WTX

Model selection

Incoming Emails

Spams

Not Spams

Spam Filter

Traditional Method: Test

4

Feature extraction – X

A batch of Emails

Linear classifier- WTX

Prediction:

Non-spam – Y’=1

Prediction: spam – Y’=-1

Determination

A new mail

Incoming Emails

Spams

Not Spams

Spam Filter

Online Protocol

5

Feature extraction – X

An online mail

Linear classifier- WTX

Prediction: Non-spam – Y’=1

Prediction: spam – Y’=-1

User judge

Update model

Outline

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

6

Outline

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

7

Outline

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

8

Learning Paradigms Overview

Learning paradigms

Supervised learning

Semisupervised learning

Transductive learning

Unsupervised learning

Universum learning

Transfer learning

9

9

Learning Paradigms Overview

Learning paradigms

Supervised learning

Semisupervised learning

Transductive learning

Unsupervised learning

Universum learning

Transfer learning

10

10

Train on labeled data

Test on test data

Supervised Learning

11

X

–

X

–

?

?

Horse and donkey

11

Train on labeled and unlabeled data

Test on test data

Semisupervised Learning

12

X

–

X

–

o

o

o

o

o

o

?

?

Horse and donkey

12

Train on labeled and test data

Test on test data

Transductive Learning

13

?

?

X

–

X

–

o

o

o

o

o

o

Horse and donkey

13

Unsupervised Learning

Train on unlabeled data

Test on test data (Test reconstruction error)

14

jump

height

?

?

Train on labeled and universum data

Test on test data

Universum Learning

15

?

?

X

–

X

–

Train on horse and donkey, universum is mule

15

Transfer Learning

Train on labeled from source and target domains

Test on test data

16

?

?

Source: goat and sheep

Target: horse and donkey

16

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

17

What is Online Learning?

Batch/Offline learning

Observe a batch of training data

Learn a model from them

Predict new samples accurately

Online learning

Observe a sequence of data

Learn a model incrementally as instances come

Make the sequence of online predictions accurately

18

Update a model

True response

user

Make prediction

Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of the correct answers to previous questions and possibly additional available information. [Shal11]

Difference between online and incremental learning

Incremental learning includes informationally incremental learning and operationally incremental learning

Informationally incremental learning: work incrementally, no permission to look back at the whole history of information presented.

Operationally incremental learning: permission to look back, not allow to use information of the past in some effective way.

Incremental learning: incremental, not looking back

Online learning: the key defining characteristic is that soon after the prediction is made, the true label of the instance is discovered. This information can then be used to refine the prediction hypothesis used by the algorithm. The goal of the algorithm is to make predictions that are close to the true labels.

Stochastic gradient descent: in machine learning, the problem is usually to minimize an objective function.

18

Online Prediction Algorithm

An initial prediction rule

For t = 1, 2, …

We observe and make a prediction

We observe the true outcome yt and then compute a loss

The online algorithm updates the prediction rule using the new example and construct

19

Update ft

yt

user

19

Online Prediction Algorithm

The total error of the method is

Goal: this error to be as small as possible

Predict unknown future one step a time: similar to generalization error

20

yt

user

Regret Analysis

: optimal prediction function from a class H, e.g., the class of linear classifiers

with minimum error after seeing all examples

Regret for the online learning algorithm

21

We want regret as small as possible

Why Low Regret?

Regret for the online learning algorithm

Advantages

We do not lose much from not knowing future events

We can perform almost as well as someone who observes the entire sequence and picks the best prediction strategy in hindsight

We can also compete with changing environment

22

Advantages of Online Learning

Meet many applications for data arriving sequentially while predictions are required on-the-fly

Avoid re-training when adding new data

Applicable in adversarial and competitive environment

Strong adaptability to changing environment

High efficiency and excellent scalability

Simple to understand and easy to implement

Easy to be parallelized

Theoretical guarantees

23

In many cases, data arrives sequentially while predictions are required on-the-fly

Applicable also in adversarial and competitive environments (e.g. spam filtering, stock market)

Can adapt to changing environment

Efficient to run (scalability) and easy to code

Theoretical guarantees

23

Where to Apply Online Learning?

24

Online Learning

Social Media

Finance

Internet Security

Robot Motion Planning

Online Learning for Social Media

Recommendation, sentiment/emotion analysis

25

Where to Apply Online Learning?

26

Online Learning

Social Media

Finance

Internet Security

Robot Motion Planning

Online Learning for Robot Motion Planning

Tasks

Exploring an unknown terrain

Finding a destination

27

Rock-Paper-Scissors: You vs. the Computer

Robot Dog

27

Where to Apply Online Learning?

28

Online Learning

Social Media

Finance

Internet Security

Robot Motion Planning

Online Learning for Internet Security

Electronic business sectors

Spam email filtering

Fraud credit card transaction detection

Network intrusion detection system, etc.

29

29

Where to Apply Online Learning?

30

Online Learning

Social Media

Finance

Internet Security

Robot Motion Planning

Online Learning for Financial Decision

Financial decision

Online portfolio selection

Sequential investment, etc.

31

31

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

32

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

33

Perceptron Algorithm (F. Rosenblatt, 1958)

One of the oldest machine learning algorithm

Online algorithm for learning a linear threshold function with small error

34

n

Perceptron Algorithm (F. Rosenblatt, 1958)

Goal: find a linear classifier with small error

35

If no error, keeping the same; otherwise, update.

Intuition Explanation

Want positive margin:

Effect of Perceptron update on margin:

So margin increases

36

37

Geometric View

w0

Initialization

37

38

Geometric View

w0

t=1

(x1, y1(=1))

Misclassification!

38

39

Geometric View

w0

Update

(x1, y1(=1))

w1=w0+x1y1

39

40

Geometric View

t=2

(x1, y1(=1))

w1

(x2, y2(=-1))

Misclassification!

40

41

Geometric View

(x1, y1)

w1

(x2, y2(=-1))

w2=w1+x2y2

Update

Continue…

41

In-class Practice

Go to practice

42

Perceptron Mistake Bound

Consider w* separate the data:

Define margin

The number of mistakes perceptron makes is at most

43

Norm of x: the larger, the larger mistake bound

The larger, the more confidence

43

Proof of Perceptron Mistake Bound [Novikoff, 1963]

Proof: Let be the hypothesis before the k-th mistake. Assume that the k-th mistake occurs on the input example .

44

First,

Second,

Hence,

In the left (first),

1. y_i*(v_k^Tx_i)<0;
2. |x|_i^2<=R^2
3. v_k+1 has made k-th mistakes
In the second,
y_ix_i^Tu = |x_i^Tu|>=gamma R

44

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

45

Overview

46

Online Learning

Non-sparse learning

Sparse learning

Unsupervised learning

Online/Stochastic Gradient Descent

Online gradient descent

Stochastic gradient descent

47

Update ft

yt

user

Update ft

yt

user

In stochastic (or “on-line”) gradient descent, the true gradient of is approximated by a gradient at a single example: w(j) w(j) – (j,i)

As the algorithm sweeps through the training set, it performs the above update for each training example. Several passes can be made over the training set until the algorithm converges. If this is done, the data can be shuffled for each pass to prevent cycles.

47

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

48

Online Non-Sparse Learning

Decision function can be linear and non-linear as

First order learning methods

Online gradient descent (Zinkevich, 2003)

Passive aggressive learning (Crammer et al., 2006)

Others (including but not limited)

ALMA: A New Approximate Maximal Margin Classification Algorithm (Gentile, 2001)

ROMMA: Relaxed Online Maximum Margin Algorithm (Li and Long, 2002)

MIRA: Margin Infused Relaxed Algorithm (Crammer and Singer, 2003)

DUOL: A Double Updating Approach for Online Learning (Zhao et al., 2009)

49

Online margin-based prediction algorithms

49

Online Gradient Descent (OGD)

(Zinkevich, 2003)

Online convex optimization

Consider a convex objective function

where is a bounded convex set

Update by Online Gradient Descent (OGD) or Stochastic Gradient Descent (SGD)

where is a learning rate

50

gradient descent

projection

Provide a framework to prove regret bound for online convex optimization

Online learning does not necessarily need gradient descent.

But in classification/regression, they have a minimization objective function. One typical solution is gradient descent.

abla f (w_t) can be derivative of w_t.

Stochastic gradient descent is equivalent to online gradient descent in this case.

The contribution of this work triggers the investigation of online convex optimization, where the objective is a convex function and is performend in the online mode.

50

Online Gradient Descent (OGD) (Zinkevich, 2003)

For t = 1, 2, …

An unlabeled sample arrives

Make a prediction based on existing weights

Observe the true class label

Update the weights by

where is a learning rate

51

Update wt+1

yt

user

regret bound is established.

This is the first generalized online convex optimization. Regret bound on the gradient descent algorithm is established.

O(sqrt{T})

51

Passive-Aggressive Online Learning (Crammer et al., 2006)

Each example defines a set of consistent hypotheses:

The new vector is set to be the projection of onto

52

Classification

Regression

Uniclass

assume that there exist a weight vector w* and an insensitivity parameter epsilon* for which the data is perfectly realizable. Denote the set by C.

For classification, C is a half space C={w: -y_t w*x_tleepsilon}

For regression, C is an epsilon-hyper-slab, C={w: |w*x_t-y_t|leepsilon}

For uniclass, it is a ball of radius epsilon centered at x_t, C={w: |w-x_t|leepsilon}

The optimization problem attempts to keep w_{t+1} as close to w_t as possible, while forcing w_{t+1} to achieve a zero loss on the most recent example.

The resulting algorithm is passive whenever the loss is zero.

52

Passive-Aggressive

53

achieve a zero loss on the most recent example

53

Passive Aggressive Online Learning

(Crammer et al., 2006)

PA (Binary classification)

PA-I (C-SVM)

PA-II (Relaxed C-SVM)

Closed-form solution

54

PA: hinge loss for binary classification

PA-I: hinge loss with a slack variable (hinge loss)

PA-II: slack variable is not necessary non-negative (square loss)

54

Passive Aggressive Online Learning

(Crammer et al., 2006)

Algorithm

Objective

Closed-form solutions

55

Perceptron: ao_t=1

Margin based online learning algorithm, binary classification, C-SVM, regression

the core of our construction can be viewed as finding a support vector machine based on a single example while replacing the norm constraint of SVM with a proximity constraint to the current classifier. The benefit of this approach is two fold. First, we get a closed form solution for the next classifier. Second, we are able to provide a unified analysis of the cumulative loss for various online algorithms used to solve different decision problems. Specifically, we derive and analyze versions for regression problems, uniclass prediction, multiclass problems, and sequence

prediction tasks.

The resulting algorithm is passive whenever the hinge is zero. If the loss is positive, the algorithm aggressively forces w_{t+1} to satisfy the constraint regardless of the step-size required.

55

Online Non-Sparse Learning

First order methods

Learn a linear weight vector (first order) of model

Pros and Cons

Simple and easy to implement

Efficient and scalable for high-dimensional data

Relatively slow convergence rate

56

Online Non-Sparse Learning

Second order online learning methods

Update the weight vector w by maintaining and exploring both first-order and second-order information

Some representative methods, but not limited

SOP: Second Order Perceptron (Cesa-Bianchi et al., 2005)

CW: Confidence Weighted learning (Dredze et al., 2008)

AROW: Adaptive Regularization of Weights (Crammer et al., 2009)

IELLIP: Online Learning by Ellipsoid Method (Yang et al., 2009)

NHERD: Gaussian Herding (Crammer & Lee 2010)

NAROW: New variant of AROW algorithm (Orabona & Crammer 2010)

SCW: Soft Confidence Weighted (SCW) (Hoi et al., 2012)

Pros and Cons

Faster convergence rate

Expensive for high-dimensional data

Relatively sensitive to noise

57

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

58

Sparse Learning

59

Natural Images

Learned bases (f1 , …, f64): “Edges”

» 0.8 * + 0.3 * + 0.5 *

Test example

x » 0.8 * f36 + 0.3 * f42 + 0.5 * f63

59

Online Sparse Learning

Motivation

Space constraint: RAM overflow

Test-time constraint

How to induce Sparsity in the weights of online learning algorithms?

60

60

Online Sparse Learning

Objective function

Problem in online learning

Standard stochastic gradient descent

It does not yield sparse solution

Some representative work

Truncated gradient (Langford et al., 2009)

FOBOS: Forward Looking Subgradients (Duchi and Singer, 2009)

Dual averaging (Xiao, 2009)

etc.

61

w0

Objective function

Stochastic gradient descent

Simple coefficient rounding

when the coefficient is small

Truncated Gradient (Langford et al., 2009)

62

Truncated gradient: impose sparsity by modifying the stochastic gradient descent

w0

SGD does not attain sparse solution. Truncated gradient: if the f(wi) is less than theta, then truncate the value of wi to 0.

62

Simple Coefficient Rounding vs. Less aggressive truncation

Truncated Gradient (Langford et al., 2009)

63

If i/K is an integer, update by rounding. How to select K is a main drawback of the method.

If K is too large, maintain too many non-zero elements.

If K=1, the step size is small, rounding pulls to zero.

Directly rounding to zero is too aggressive.

A less aggressive version is to shrink the coefficient to zero by a smaller amount.

63

K, …

Truncated Gradient (Langford et al., 2009)

The amount of shrinkage is measured by a gravity parameter

When , the update rule is identical to the standard SGD

The truncation can be performed every K online steps

Loss functions:

Logistic

SVM (hinge)

Least square

64

The larger the parameters g and heta are, the more sparsity is incurred.

64

Theoretical result (T: No. of samples)

Let , the regret is

Truncated Gradient (Langford et al., 2009)

65

regret bound is established.

Truncated gradient is consistently competitive with the other two online algorithms and significantly outperformed them in some problems. This suggests the effectiveness of truncated gradient.

it is interesting to observe that the qualitative behavior of truncated gradient is often similar to LASSO, especially when very sparse weight vectors are allowed.

LASSO usually has worse performance when the allowed number of nonzero weights is set too large.

it is worth emphasizing that the experiments in this subsection try to shed some light on the relative strengths of these algorithms in terms of feature sparsification. For large data sets such as Big_Ads only truncated gradient, coefficient rounding, and the sub-gradient algorithms are

applicable.

65

Dual Averaging (Xiao, 2010)

Objective function

Problem: truncated gradient doesn’t produce truly sparse weight due to small learning rate

Fix: dual averaging which keeps two state representations:

parameter

average gradient vector

66

Dual Averaging (Xiao, 2010)

has entry-wise closed-form solution

Advantage: sparse on the weight

Disadvantage: keep a non-sparse subgradient

67

Algorithm

Average regret

Theoretical bound: similar to gradient descent

Convergence and Regret

68

average regret bound is established.

Variants of Online Sparse Learning Models

Online feature selection (OFS)

A variant of sparse online learning

The key difference is that OFS focuses on selecting a fixed subset of features in online learning process

Could be used as an alternative tool for batch feature selection when dealing with big data

Other existing work

Online learning for Group Lasso (Yang et al., 2010) and online learning for multi-task feature selection (Yang et al. 2013) to select features in group manner or features among similar tasks

69

Online Sparse Learning

Objective

Induce sparsity in the weights of online learning algorithms

Pros and Cons

Simple and easy to implement

Efficient and scalable for high-dimensional data

Relatively slow convergence rate

No perfect way to attain sparsity solution yet

70

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

71

Online Unsupervised Learning

Assumption: data generated from some underlying parametric probabilistic density function

Goal: estimate the parameters of the density to give a suitable compact representation

72

72

Online Unsupervised Learning

Some representative work

Online singular value decomposition (SVD) (Brand, 2003)

Online principal component analysis (PCA) (Warmuth and Kuzmin, 2006)

Online dictionary learning for sparse coding (Mairal et al. 2009)

Online learning for latent Dirichlet allocation (LDA) (Hoffman et al., 2010)

Online variational inference for the hierarchical Dirichlet process (HDP) (Wang et al. 2011)

Online Learning for Collaborative Filtering (Ling et al. 2012)

…

73

Adiabatic (隔热)

73

: input data matrix

matrix (e.g. documents, terms)

: left singular vectors

matrix (documents, topics)

: singular values

diagonal matrix (strength of each “topic”)

rank of matrix

Nonnegative and sorted

: right singular vectors

matrix ( terms, topics)

SVD: Definition

74

: scalar

: vector

: vector

A

𝑉𝑇

Online SVD (Brand, 2003)

Challenges: storage and computation

Idea: an incremental algorithm computes the principal eigenvectors of a matrix without storing the entire matrix in memory

75

Online SVD (Brand, 2003)

76

m

Online SVD (Brand, 2003)

Complexity

Store

The online SVD has more error, but it is comparable to PCA (SVD)

77

77

Online SVD

Unsupervised learning: minimizing the reconstruction errors

Each update will increase the rank by at most one, until a user-specified ceiling is reached

Pros and Cons

Simple to implement

Fast computation

Comparable performance

Lack of theoretical guarantee

78

Introduction

Learning paradigms

Online learning and its applications

Online learning algorithms

Perceptron

Online non-sparse learning

Online sparse learning

Online unsupervised learning

Conclusion

79

One-slide Takeaway

Basic concepts

What is online learning?

What is regret analysis?

Online learning algorithms

Perceptron

Online gradient descent

Passive aggressive

Truncated gradient

Dual averaging

Online SVD

80

Resources

Book and Video:

Prediction Learning and Games. N. Cesa-Bianchi and G. Lugosi. Cambridge university press, 2006.

[Shal11] Online Learning and Online Convex Optimization. Shai Shalev-Shwartz. Foundations and Trends in Machine Learning, Vol. 4, No. 2, 2011, 107-194, DOI: 10.1561/2200000018

http://videolectures.net/site/search/?q=online+learning

Software:

Pegasos: http://www.cs.huji.ac.il/~shais/code/index.html

VW: hunch.net/~vw/

SGD by Leon Bottou: http://leon.bottou.org/projects/sgd

81

81

References

Cesa-Bianchi, Nicol`o, Conconi, Alex, and Gentile, Claudio. A second-order perceptron algorithm. SIAM J. Comput., 34(3):640–668, 2005.

M. Brand. Fast online svd revisions for lightweight recommender systems. In SDM, 2003.

N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. SIAM J. Comput., 34(3):640–668, 2005.

K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2006.

K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. In NIPS, pages 414–422, 2009.

K. Crammer and D. D. Lee. Learning via gaussian herding. In NIPS, pages 451–459, 2010.

K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003.

M. Dredze, K. Crammer, and F. Pereira. Confidence-weighted linear classification. In ICML, pages 264–271, 2008.

C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001.

M. D. Hoffman, D. M. Blei, and F. R. Bach. Online learning for latent dirichlet allocation. In NIPS, pages 856–864, 2010.

S. C. H. Hoi, J. Wang, and P. Zhao. Exact soft confidence-weighted learning. In ICML, 2012.

82

82

References

J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 10:777–801, 2009.

Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Machine Learning, 46(1-3):361–387, 2002.

Guang Ling, Haiqin Yang, Irwin King and M.R. Lyu. Online Learning for Collaborative Filtering, IJCNN, Brisbane, Australia, 2012.

P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal., 16(6):964–979, 1979.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In ICML, page 87, 2009.

Y. Nesterov. Gradient methods for minimizing composite objective function. CORE Discussion Paper 2007/76, Catholic University of Louvain, Center for Operations Research and Econometrics, 2007.

F. Orabona and K. Crammer. New adaptive algorithms for online classification. In NIPS, pages 1840–1848, 2010.

F. Rosenblatt. The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408, 1958.

83

References

C. Wang, J. W. Paisley, and D. M. Blei. Online variational inference for the hierarchical dirichlet process. Journal of Machine Learning Research – Proceedings Track, 15:752–760, 2011.

M. K. Warmuth and D. Kuzmin. Randomized pca algorithms with regret bounds that are logarithmic in the dimension. In NIPS, pages 1481–1488, 2006.

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009.

L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543–2596, 2010.

H. Yang, M. R. Lyu, and I. King. Efficient online learning for multi-task feature selection. ACM Transactions on Knowledge Discovery from Data, 2013.

H. Yang, Z. Xu, I. King, and M. R. Lyu. Online learning for group lasso. In ICML, pages 1191–1198, Haifa, Israel, 2010.

L. Yang, R. Jin, and J. Ye. Online learning by ellipsoid method. In ICML, page 145, 2009.

P. Zhao, S. C. H. Hoi, and R. Jin. Duol: A double updating approach for online learning. In NIPS, pages 2259–2267, 2009.

M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.

84

In-class Practice

We have two data and , how to get a classifier by Perceptron learning rule?

Assume

is in class (the first data)

is in class

Data points are linearly separable and can be applied repeatedly (for validation).

85

(

)

{

}

N

i

i

i

y

1

,

=

x

(

)

(

)

t

t

y

y

,

,

,

,

1

1

x

x

K

)

(

0

×

f

)

(

1

t

t

f

x

–

)

),

(

(

1

t

t

t

y

f

l

x

–

)

(

x

t

f

t

x

t

x

x

)

(

1

t

t

f

x

–

å

=

–

T

t

t

t

t

y

f

l

1

1

)

),

(

(

x

(

)

å

=

Î

=

×

T

t

t

t

H

f

y

f

l

f

1

*

)

),

(

(

min

arg

x

)

(

*

×

f

[

]

å

=

–

–

=

T

t

t

t

t

t

t

y

f

l

y

f

l

T

1

*

1

)

),

(

(

)

),

(

(

1

regret

x

x

[

]

å

=

–

–

=

T

t

t

t

t

t

t

y

f

l

y

f

l

T

1

*

1

)

),

(

(

)

),

(

(

1

regret

x

x

0

Lavf56.36.100

0

*

>

i

i

T

y

x

w

2

2

*

*

sup

min

i

i

i

T

i

x

w

x

w

=

g

2

–

g

k

v

(

)

i

i

y

,

x

2

2

*

*

sup

min

i

i

i

T

i

x

w

x

w

=

g

)

sgn(

ˆ

t

T

t

t

y

x

w

=

{

}

1

,

1

+

–

Î

t

y

t

y

ˆ

50100150200250300350400450500

50

100

150

200

250

300

350

400

450

500

50100150200250300350400450500

50

100

150

200

250

300

350

400

450

500

50100150200250300350400450500

50

100

150

200

250

300

350

400

450

500

t

w

(

)

å

=

=

t

i

i

i

t

w

f

t

g

1

1

thr?

<
p
/docProps/thumbnail.jpeg