# 代写代考 CS 189 (CDSS offering) – cscodehelp代写

Lecture 22: Decision trees CS 189 (CDSS offering)

2022/03/16

Today’s lecture

Copyright By cscodehelp代写 加微信 cscodehelp

• Today, we will lay the foundations behind probably the second most intuitive model after nearest neighbor: decision trees

• A decision tree describes how to make a decision/prediction based on the input

• Decision tree learning concerns how to build decision trees from data

• Decision trees are widely used in real world applications, and we’ll discuss why

• One reason is that they can be ensembled to give very good performance — this is the topic of Friday’s lecture

What’s a decision tree?

Decision trees

• We are given a dataset (x1, y1), …, (xN, yN)

• A decision tree for this problem will repeatedly split on select features of x (at

each internal node of the tree) in order to predict y (at the leaf nodes of the tree)

• Each node contains the subset of the data that matches the splits made so far

• Therefore, ideally, the leaf nodes contain very little (or no) variability in y

• The prediction for a new point x! proceeds down the tree to a leaf node, where

the prediction is a majority vote (for classification) or average (for regression)

Learning decision trees

• Learning optimal decision trees, for reasonable definitions of optimality, is known to be NP-complete (very, very hard)

• Algorithms for learning trees therefore only try to find “good” trees

• By far, the most common approaches for learning decision trees are greedy

algorithms that operate in a top down manner

• Top down means we start with a tree that is a single node with all the data, and we

continually “build out” the tree to get better and better leaf nodes

• Greedy means that, every time split a node, we only consider how good the immediate split will be, rather than looking further ahead at potential future splits

Splitting metrics

• When we are learning the tree, at every node, we consider all features and all possible splitting values, and we try to figure out which will make the best split

• If this sounds expensive, that’s because it certainly can be — we’ll see next time that considering all possible splits for all features may not be necessary

• How do we define whether one split is better than another?

• Various metrics have been proposed to try and characterize the “goodness” of a

split, and they are pretty much all based on concepts from information theory 6

One metric: information gain

recall entropy onion

EIIIEEEFI.IE

IT O 811311

Another metric: Gini impurity

f randomly sample two points g how often dothey disagree

p t g pl g

in general for K classes

G EE Pull pa ÉPÉE

can use G in place of H on previous slide 8

Metrics for continuous targets

what ifeachyoEIR

node with L points is given by Y I Ect

e.g EE Yi y 51 andmaybeset

prediction

to median again used in the same way as H and G

Advantages of decision trees

• Decision trees are often said to be simple to understand and interpret

• They’re probably at least more likely to be understandable and interpretable, compared to other models we have studied and will study

• Decision trees are a natural fit for categorical data and often require little to no data preprocessing

• Decision trees can give the practitioner an intuition for which features are the most important (split first), which features are irrelevant, etc.

Disadvantages of decision trees

• Decision trees can overfit, e.g., imagine a tree where each leaf node contains only one data point

• To tackle this, we need regularization! E.g.: restrict the maximum depth and/or minimum number of points in any leaf; stop splitting when information gain is not above some minimum threshold value; pruning (more on this next time)

• Decision trees by themselves don’t actually work very well…

• But they can work very well when ensembled!

• However, this takes away some of the advantages such as being simple to understand and interpret, so it’s all a trade off

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com