CSC 311: Introduction to Machine Learning

Lecture 11 – Convolutional Neural Networks

Anthony Bonner &

Slides by , Amir-massoud Farahmand, and

Copyright By cscodehelp代写 加微信 cscodehelp

Intro ML (UofT) CSC311-Lec11 1 / 1

Neural Nets for Visual Object Recognition

People are very good at recognizing shapes

I Intrinsically dicult, computers are bad at it Why is it dicult?

CSC411 Lec11 2 / 43

Why is it a Problem?

Dicult scene conditions

[From: Grauman & Leibe]

CSC411 Lec11 3 / 43

Why is it a Problem?

Huge within-class variations. Recognition is mainly about modeling variation.

[Pic from: S. Lazebnik]

CSC411 Lec11 4 / 43

Why is it a Problem?

Tons of classes

[Biederman]

CSC411 Lec11 5 / 43

Neural Nets for Object Recognition

People are very good at recognizing object

I Intrinsically dicult, computers are bad at it Some reasons why it is dicult:

I Segmentation: Real scenes are cluttered

I Invariances: We are very good at ignoring all sorts of variations that do

not a↵ect class

I Deformations: Natural object classes allow variations (faces, letters,

I A huge amount of computation is required

CSC411 Lec11 6 / 43

How to Deal with Large Input Spaces

How can we apply neural nets to images?

Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have?

Prohibitive to have fully-connected layers

What can we do?

We can use a locally connected layer

CSC411 Lec11

Locally Connected Layer

Example: 200×200 image 40K hidden units Filter size: 10×10

4M parameters

Note: This parameterization is good when input image is registered (e.g., 34 face recognition). Ranzato

CSC411 Lec11 8 / 43

When Will this Work?

When Will this Work?

This is good when the input is (roughly) registered

CSC411 Lec11 9 / 43

General Images

The object can be anywhere

[Slide: Y. Zhu]

CSC411 Lec11 10 / 43

General Images

The object can be anywhere

[Slide: Y. Zhu]

CSC411 Lec11 11 / 43

General Images

The object can be anywhere

[Slide: Y. Zhu]

CSC411 Lec11 12 / 43

The Invariance Problem

Our perceptual systems are very good at dealing with invariances I translation, rotation, scaling

I deformation, contrast, lighting

We are so good at this that its hard to appreciate how dicult it is

I Its one of the main diculties in making computers perceive I We still don’t have generally accepted solutions

CSC411 Lec11

Locally Connected Layer

STATIONARITY? Statistics is similar at different locations

Example: 200×200 image 40K hidden units Filter size: 10×10

4M parameters

Note: This parameterization is good when input image is registered (e.g., 35 face recognition). Ranzato

CSC411 Lec11 14 / 43

The replicated feature approach

The red connections all have the same weight.

Adopt approach apparently used in monkey visual systems

Use many di↵erent copies of the same feature detector.

I Copies have slightly di↵erent positions.

I Could also replicate across scale and orientation.

I Tricky and expensive

I Replication reduces the number of

free parameters to be learned.

Use several di↵erent feature types, each

5 with its own replicated pool of detectors. I Allows each patch of image to be

represented in several ways.

CSC411 Lec11

Convolutional Neural Net

Idea: statistics are similar at di↵erent locations (Lecun 1998)

Connect each hidden unit to a small input patch and share the weight across space

This is called a convolution layer and the network is a convolutional network

CSC411 Lec11 16 / 43

Convolutional Layer

hn = max(0, hn1 ⇤ wn )

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 17 / 55

Convolutional Layer

hn = max(0, hn1 ⇤ wn )

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 18 / 55

Convolutional Layer

hn = max(0, hn1 ⇤ wn )

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 19 / 55

Convolutional Layer

hn = max(0, hn1 ⇤ wn )

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 20 / 55

Convolutional Layer

hn = max(0, hn1 ⇤ wn )

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 21 / 55

Convolutional Layer

hn = max(0, hn1 ⇤ wn )

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 22 / 55

Convolution

Convolution layers are named after the convolution operation. If a and b are two arrays,

(a⇤b)t =Xa⌧bt⌧. ⌧

CSC411 Lec11

Convolution

“Flip and Filter” interpretation:

CSC411 Lec11 18 / 43

2-D Convolution

2-D convolution is analogous:

(A⇤B)ij =XXAstBis,jt.

CSC411 Lec11 19 / 43

2-D Convolution

The thing we convolve by is called a kernel, or filter. What does this convolution kernel do?

CSC411 Lec11 20 / 43

2-D Convolution

What does this convolution kernel do?

CSC411 Lec11 21 / 43

2-D Convolution

What does this convolution kernel do?

CSC411 Lec11 22 / 43

2-D Convolution

What does this convolution kernel do?

CSC411 Lec11 23 / 43

Convolutional Layer

Learn multiple filters.

E.g.: 200×200 image 100 Filters

Filter size: 10×10 10K parameters

CSC411 Lec11 24 / 43

Convolutional Layer

Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer:

The number of filters (controls the depth of the output volume) The stride: how many units apart do we apply a filter spatially (this

controls the spatial size of the output volume)

The size w ⇥ h of the filters

[http://cs231n.github.io/convolutional-networks/]

CSC411 Lec11 25 / 43

MLP vs ConvNet

Figure : Top: MLP, bottom: Convolutional neural network

[http://cs231n.github.io/convolutional-networks/]

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 25 / 55

Pooling Layer

By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features.

CSC411 Lec11 26 / 43

Pooling Options

: return the maximal argument

Average Pooling: return the average of the arguments Other types of pooling exist.

CSC411 Lec11 27 / 43

Figure: Left: Pooling, right: max pooling example Hyperparameters of a pooling layer:

The spatial extent F The stride

[http://cs231n.github.io/convolutional-networks/]

CSC411 Lec11 28 / 43

Backpropagation with Weight Constraints

The backprop procedure from last lecture can be applied directly to conv nets.

This is covered in csc421.

As a user, you don’t need to worry about the details, since they’re handled by automatic di↵erentiation packages.

CSC411 Lec11 29 / 43

Here’s the LeNet architecture, which was applied to handwritten digit recognition on MNIST in 1998:

The!architecture!of!LeNet5!

CSC411 Lec11 30 / 43

Imagenet, biggest dataset for object classification: http://image-net.org/ 1000 classes, 1.2M training images, 150K for test

CSC411 Lec11 31 / 43

AlexNet, 2012. 8 weight layers. 16.4% top-5 error (i.e. the network gets 5 tries to guess the right category).

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities

between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts

The two processing pathways correspond to 2 GPUs. (At the time, the network

at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and

couldn’t fit on one GPU.)

the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000.

AlexNet’s stunning performance on the ILSVRC is what set o↵ the deep learning boom of the last 6 years.

neurons in a kernel map). The second convolutional layer takes as input the (response-normalized

and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48. CSC411 Lec11 32 / 43

(Krizhevsky et al., 2012)

The third, fourth, and fifth convolutional layers are connected to one another without any intervening

150 Layers!

Networks are now at 150 layers

They use a skip connections with special form

In fact, they don’t fit on this screen

Amazing performance!

A lot of “mistakes” are due to wrong ground-truth

[He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]

CSC411 Lec11

Results: Object Classification

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]

CSC411 Lec11 34 / 43

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]

CSC411 Lec11 35 / 43

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]

CSC411 Lec11 36 / 43

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition.

arXiv:1512.03385, 2016]

CSC411 Lec11 37 / 43

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]

CSC411 Lec11 38 / 43

What do CNNs Learn?

Figure: Filters in the first convolutional layer of Krizhevsky et al

CSC411 Lec11 39 / 43

What do CNNs Learn?

Figure: Filters in the second layer [http://arxiv.org/pdf/1311.2901v3.pdf]

CSC411 Lec11 40 / 43

What do CNNs Learn?

[http://arxiv.org/pdf/1311.2901v3.pdf]

Figure: Filters in the third layer

CSC411 Lec11 41 / 43

What do CNNs Learn?

[http://arxiv.org/pdf/1311.2901v3.pdf]

CSC411 Lec11 42 / 43

Tricking a Neural Net

Read about it here (and try it!): https://codewords.recurse.com/issues/five/ why-do-neural-networks-think-a-panda-is-a-vulture

Watch: https://www.youtube.com/watch?v=M2IebCN9Ht4

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 51 / 55

More on NNs

Figure : Generate images: http://arxiv.org/pdf/1511.06434v1.pdf Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 52 / 55

More on NNs

Generate text: https://vimeo.com/146492001, https://github.com/karpathy/neuraltalk2, https://github.com/ryankiros/visual-semantic-embedding

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 53 / 55

More on NNs

Figure : Compose music: https://www.youtube.com/watch?v=0VTI1BBLydE

Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 54 / 55

Great course dedicated to NN: http://cs231n.stanford.edu Over source frameworks:

I Pytorch http://pytorch.org/

I Tensorflow https://www.tensorflow.org/ I Ca↵e http://caffe.berkeleyvision.org/

Most cited NN papers:

https://github.com/terryum/awesome-deep-learning-papers

CSC411 Lec11

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com