# CS计算机代考程序代写 deep learning GPU matlab algorithm CS 4610/5335

CS 4610/5335

Deep Learning and Computer Vision

Robert Platt Northeastern University

Material adapted from:

1. Lawson Wong, CS 5100

Use features (x) to predict targets (y)

Classification

Classification

Targets y are now either: – Binary: {0, 1}

– Multi-class: {1, 2, …, K}

We will focus on binary case (Ex5 Q6 covers multi-class)

2

Classification

Focus: Supervised learning (e.g., regression, classification) Use features (x) to predict targets (y)

Input: Dataset of n samples: {x(i), y(i)}, i = 1, …, n

Each x(i) is a p-dimensional vector of feature values

Output: Hypothesis hθ(x) in some hypothesis class H

H is parameterized by d-dim. parameter vector θ

Goal: Find the best hypothesis θ* within H

What does “best” mean? Optimizes objective function:

J(θ): Error fn. L(pred, y): Loss fn.

A learning algorithm is the procedure for optimizing J(θ)

3

Dendrite

Cell body

Axon Terminal

Biological neuron

Node of Ranvier

Nucleus

Axon

Schwann cell Myelin sheath

4

Artificial neuron

McCulloch-Pitts model (1943) (fixed weights) Rosenblatt (1957) (learnable weights + bias term) Learning algorithm: Perceptron

5

Artificial neuron

Artificial neuron can represent basic logic gates (assume threshold fires when weighted sum ≥ 0)

6

Artificial neural networks

Artificial neuron can represent basic logic gates (assume threshold fires when weighted sum ≥ 0)

Artificial neural network (ANN) can represent any logical circuit / function!

7

Artificial neural networks

How do we train a neuron?

8

Artificial neural networks

How do we train a neuron?

Parameters: w (on input links) Hypothesis:Output=g(w0 +w1 a1 +…+wp ap )

Objective: Error / loss function between output and target y

9

Artificial neural networks

Objective: Error / loss function between output and target y g = Hard threshold: Perceptron algorithm

Works well for single neurons, but not for networks

10

Artificial neural networks

Objective: Error / loss function between output and target y g = Hard threshold: Perceptron algorithm

Works well for single neurons, but not for networks How about gradient descent? Need smooth g

11

How about gradient descent?

Need smooth g

Many choices of activation functions!

Artificial neural networks

12

How about gradient descent?

Need smooth g

Many choices of activation functions!

Most popular: Rectified linear unit (ReLU)

Artificial neural networks

13

How about gradient descent?

Need smooth g

Many choices of activation functions!

Most popular: Rectified linear unit (ReLU)

We will consider sigmoid (logistic)

Artificial neural networks

14

Artificial neural networks

Parameters: w (on input links) Hypothesis:Output=σ(w0 +w1 a1 +…+wp ap )

Seem familiar?

15

Artificial neural networks

Parameters: w (on input links) Hypothesis:Output=σ(w0 +w1 a1 +…+wp ap )

Seem familiar? Logistic regression = learning single neuron

16

Artificial neural networks

Input: x (x0 = bias) Parameters: w

Weighted input: z11 (sum w*x) Activation function: σ (sigmoid) Activation: a11 = σ(z11)

Prediction = a11

17

Artificial neural networks

Assume squared-error loss Compute gradient, perform SGD

18

Artificial neural networks

Input: x (x0 = bias) Parameters: v (layer 1), w (layer 2)

Weighted input: z21 (sum w*x) Activation function: σ (sigmoid) Activation: a21 = σ(z21)

Prediction = a21

19

Artificial neural networks

Assume squared-error loss Compute gradient, perform SGD

20

Artificial neural networks

21

Artificial neural networks

22

Artificial neural networks

23

Artificial neural networks

Underlined terms are the same!

24

Artificial neural networks

Underlined terms are the same!

They will appear in every gradient term in all layers

Avoid recomputing this term

25

Artificial neural networks

Underlined terms are the same!

They will appear in every gradient term in all layers

Avoid recomputing this term: Key idea of backpropagation

Bryson & Ho (1969)

Linnainmaa (1970)

Werbos (1974)

Rumelhart, Hinton, Williams (1986)

26

Artificial neural networks

Underlined terms are the same!

They will appear in every gradient term in all layers

Avoid recomputing this term: Key idea of backpropagation

Learning with backprop

= using gradient descent to learn neural networks,

where gradients are computed efficiently

27

Artificial neural networks

28

Artificial neural networks

Backpropagation Forward pass:

Compute activations (a) Backward pass:

Compute errors (Δ)) Adjust weights

29

Convolutional layers

Deep multi-layer perceptron networks – general purpose

– involve huge numbers of weights

We want:

– special purpose network for image and NLP data – fewer parameters

– fewer local minima

Answer: convolutional layers!

Convolutional layers

Image

stride Filter size

pixels

Convolutional layers

All of these weight groupings are tied to each other

Image

stride Filter size

pixels

Convolutional layers

All of these weight groupings are tied to each other

Image

stride Filter size

pixels

Because of the way weights are tied together

– reduces number of parameters (dramatically) – encodes a prior on structure of data

In practice, convolutional layers are essential to computer vision…

Convolutional layers Two dimensional example:

Why do you think they call this “convolution”?

Think-pair-share

What would the convolved feature map be for this kernel?

Example: MNIST digit classification with LeNet

MNIST dataset: images of 10,000 handwritten digits Objective: classify each image as the corresponding digit

Example: MNIST digit classification with LeNet LeNet:

two convolutional layers two fully connected layers – conv, relu, pooling – relu

– last layer has logistic activation function

Example: MNIST digit classification with LeNet Load dataset, create train/test splits

Example: MNIST digit classification with LeNet Define the neural network structure:

Input

Conv1

Conv2

FC1 FC2

Example: MNIST digit classification with LeNet

Train network, classify test set, measure accuracy

– notice we test on a different set (a holdout set) than we trained on

Using the GPU makes a huge differece…

Deep learning packages

You don’t need to use Matlab (obviously) Tensorflow is probably the most popular platform Caffe and Theano are also big

Another example: image classification w/ AlexNet

ImageNet dataset: millions of images of objects

Objective: classify each image as the corresponding object (1k categories in ILSVRC)

Another example: image classification w/ AlexNet

AlexNet has 8 layers: five conv followed by three fully connected

Another example: image classification w/ AlexNet

AlexNet has 8 layers: five conv followed by three fully connected

Another example: image classification w/ AlexNet

AlexNet won the 2012 ILSVRC challenge – sparked the deep learning craze