# All-NBA Predict #23 – Classifying All-NBA Players (Part V – Decision Tree)

Alrighty, so we’ve taken a look at the following classifiers: Manually selected linear boundary, LDA, QDA, and logistic regression. First one, not so scientific. Next 3, a bit of math involved, nothing too crazy. We now jump to one of the most interpretable models that exist. ESL calls the decision tree as close to an off-the-shelf model you can get.

According to the book, trees have the following outstanding qualities:

• Fast to construct
• Interpretable models (if trees are relatively small)
• Can naturally incorporate numeric and categorical features
• Immune to outliers
• Perform internal feature selection

The only thing that prevents decision trees from being completely ideal is accuracy because they seldom provide the predictive accuracy compared to the best that can be achieved with the data at hand. This really depends on what your application here is, though. For many applications, interpretability may be just as, if not more important, if not A LOT more important than accuracy alone. In an application like we’re exploring now, where we’re literally trying to make sense of how VORP and WS affects all-NBA status, we’d likely welcome something that says “hey bozo, anything over a WS of 7 generally makes all-NBA”. This contrasts to something like Neural Networks, which we’ll look at later, but generally is a very complex linear combination of the inputs and usually is very complex. For much more complex applications like image detection, NN’s have become more popular because of their flexibility, but if you try to interpret the model to understand how the NN is telling the difference between a picture of a meerkat and a picture of a lion, you’d be hard-pressed to find an explanation that someone else will also be able to understand!

Now, back to trees. Essentially what we’re trying to do is split our space up with basic logic / rules that an analyst might come up with. Let’s say we’re trying to figure out peoples’ salaries based on their education. Maybe you’d say somebody with no post secondary education might have a lower salary. Perhaps someone with 0-4 years of schooling will have a bit higher salary. 4-8 even more. And maybe 8-12 is too much school and your salary goes down a bit (you stay in academia or something like that). Your rules might look like:

• If # years schooling == 0: Salary ~ \$40K • If 0 < # years schooling < 4: Salary ~ \$80K
• If 4 < # years schooling < 8: Salary ~ \$120K • If 8 < # years schooling < 12: Salary ~ \$80K

If we extended this to 2 features, maybe our 2D space would look something like that:

And if we broke down the rules, our rules could be mapped out in a plot like this:

The above two plots are saying the same thing!

From this graphic in ISL (ESL’s more application-based sister book), we can see further advantages and drawbacks of the decision tree.

We see for linear boundaries, linear models could work better. For non-linear and more compartmentalized boundaries, decision trees can perform well. Decision trees can perform well for linear boundaries as well, but like the top-right plot above, we see our decision tree becomes more complicated and to an extent less interpretable.

We actually do have more of a linear boundary here, so the decision tree might not be the best model, but let’s see what happens.

In [1]:
# Load libraries & initial config

%R library(ggplot2)
%R library(gridExtra)
%R library(scales)
%R library(ggbiplot)
%R library(dplyr)

%matplotlib nbagg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import boto3
from StringIO import StringIO
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Retrieve team stats from S3
playerAggDfAllNbaAllStar = pd.read_csv('https://s3.ca-central-1.amazonaws.com/2017edmfasatb/fas_boto/data/playerAggDfAllNbaAllStar.csv', index_col = 0)

pd.set_option('display.max_rows', len(playerAggDfAllNbaAllStar.dtypes))
print playerAggDfAllNbaAllStar.dtypes
pd.reset_option('display.max_rows')

season_start_year          int64
perGameStats_Player       object
perGameStats_Pos          object
perGameStats_Age           int64
perGameStats_Tm           object
perGameStats_G             int64
perGameStats_GS          float64
perGameStats_MP          float64
per100Stats_FG           float64
per100Stats_FGA          float64
per100Stats_FGPerc       float64
per100Stats_3P           float64
per100Stats_3PA          float64
per100Stats_3PPerc       float64
per100Stats_2P           float64
per100Stats_2PA          float64
per100Stats_2PPerc       float64
per100Stats_FT           float64
per100Stats_FTA          float64
per100Stats_FTPerc       float64
per100Stats_ORB          float64
per100Stats_DRB          float64
per100Stats_TRB          float64
per100Stats_AST          float64
per100Stats_STL          float64
per100Stats_BLK          float64
per100Stats_TOV          float64
per100Stats_PF           float64
per100Stats_PTS          float64
per100Stats_ORtg         float64
per100Stats_DRtg         float64
player_formatted          object
Tm                        object
Player_x                  object
Player_y                  object
all_star                  object
VORP_WS_sum              float64
dtype: object


Let’s remind ourselves first what the data looks like.

In [3]:
%%R -i playerAggDfAllNbaAllStar -w 700 -u px

# Plot WS vs VORP scatter with predictions colored in
allNbaPlot = ggplot(
NULL
) +
geom_point(
data = playerAggDfAllNbaAllStar,
aes(
)
)

allNbaPlot


There seems to be a few libraries capable of performing decision trees in R. It seems that the most basic one is simply a package called ‘tree’. Let’s give it a go.

In [5]:
%%R -i playerAggDfAllNbaAllStar -o treeModel
library(tree)

# Build a tree using the 'tree' function
treeModel = tree(
data = playerAggDfAllNbaAllStar
)

# Plot the tree
plot(treeModel)
text(treeModel)


Alright, so we’ve got a tree. First thing I notice here right off the bat… where’s VORP? We’re wanting to build a model off of WS and VORP right? Well where is it? My first instinct seeing the tree is that… well… does VORP not matter in the presence of WS? It seems that, from the tree, a WS > 10.55 will get you onto an all-NBA team. With the eye test, this seems to pass. WS = 10.55 is the line about where the blues start to fade out and pinks start to come in (in our first plot).

Second observation… certain terminal nodes actually repeat themselves. For example, if we look at the WS > 10.55 guys, the tree splits further into WS < 12.75 and WS > 12.75… except the problem here is that both of them lead to All-NBA! What’s the point of even making that last WS = 12.75 split if both of them lead to All-NBA? Can’t we just leave it at WS > 10.55 leads to All-NBA?

My last thought is… why did we stop at 5 terminal nodes? I like the tree and all, but tying into our last point, why not 4 terminal nodes, or better yet, 6 terminal nodes? 100 terminal nodes? 1000 terminal nodes? I know from reading about trees that we can have a ton of nodes and overfit the hell out of this model. Why did we get such a (seemingly) reasonable tree here?

To recap all my concerns:

1. Where’s VORP?
2. Why do splits lead to same results in terminal nodes?
3. Why did the tree splits stop where they stopped?

Let’s park 1. for now and look a bit more at 2. and 3.

From reading about trees in ESL, I have a basic understanding of how trees are measured in terms of accuracy. To get into this discussion, we have to explore a few options first of how exactly to measure node accuracy or “purity”.

Our classic vanilla method of measuring “accuracy” in classification in simply misclassification error. Simply a count of how many are right and how many are wrong.

$E=1-max(\hat{p}_{mk})$

where $\hat{p}_{mk}$ is the proportion of training observations in the $m$th region from the $k$th class. Generally, trees are not used with misclassification because it doesn’t answer the entire question of accuracy. Let’s say we have three classes that we’re trying to classify (A, B, C), and we get two nodes with the following results:

• Node 1:
• Class A: 60%
• Class B: 39%
• Class C: 1%
• Node 2:
• Class A: 60%
• Class B: 20%
• Class C: 20%

Which node would you say is more “pure”? The first node seems like a pretty good shot at class A, but class B is not too far behind, still capturing 39% of the population. If I told you I had a 39% of getting something right, that’s not too far off from a coin flip, right?

The second node seems to give us a lot more confidence in our estimate that the node represents class A. With each of the other classes at 20%, we feel a bit more comfortable going with class A because 20% is like pulling a yellow marble out of a bag with 4 other red marbles. 1/5th of a chance, right?

The kicker here is that both nodes would have the same misclassification error because both of them would end up with 40%.

Because of this logic, we tend to go with either the Gini index:

$G=\sum_{k=1}^{K}\hat{p}_{mk}(1-\hat{p}_{mk})$

or Cross-Entropy measurement:

$D=-\sum_{k=1}^{K}\hat{p}_{mk}log{\hat{p}_{mk}}$

which both measure more node purity as values will get smaller as each node becomes more “pure” with one class.

Reviewing our two scenarios again, we would get the following Gini and Entropy values for node 1:

$G=\sum_{k=1}^{K}\hat{p}_{mk}(1-\hat{p}_{mk})=(0.6)(1-0.6)+(0.39)(1-0.39)+(0.01)(1-0.01)=0.49$
$D=-\sum_{k=1}^{K}\hat{p}_{mk}log{\hat{p}_{mk}}=-((0.6)(log0.6)+(0.39)(log0.39)+(0.01)(log0.01))=0.31$

and the following values for node 2:
$G=\sum_{k=1}^{K}\hat{p}_{mk}(1-\hat{p}_{mk})=(0.6)(1-0.6)+(0.2)(1-0.2)+(0.2)(1-0.2)=0.56$
$D=-\sum_{k=1}^{K}\hat{p}_{mk}log{\hat{p}_{mk}}=-((0.6)(log0.6)+(0.2)(log0.2)+(0.2)(log0.2))=0.41$

ALRIGHT THEN. This is slightly embarrassing… it looks like my logic initially was wrong haha. It looks like node 1 is actually more pure. I guess my logic is counter-intuitive to what the math of Gini and Entropy say. Let’s check out the graphs of misclassification error, Gini, and Entropy:

Okay, it looks like the logic kinda checks out. If I map 0.6-0.39-0.01 out on the Gini and Entropy indexes and 0.6-0.2-0.2 as well, it looks like the sum of these values would be higher for the 0.6-0.2-0.2 case. I guess Gini and Entropy really value where the proportion is closer to 1 or 0. That 0.01 really lowers the Gini value (it basically has negligible contribution to the entire index value).

Alright, well now that I know a bit more about the types of metrics I can measure trees on, let’s see if I can get a bit more understanding about the questions I had.

Looking through the documentation for the tree function, the default metric tree uses (within its split parameter) is “deviance”, which is actually equivalent to Entropy.

Let’s check out the deviance for the nodes above:

In [6]:
# Print the tree branches
print treeModel

node), split, n, deviance, yval, (yprob)

* denotes terminal node

1) root 13220 4269.0 Not All-NBA ( 0.0379728 0.9620272 )

2) advancedStats_WS < 7.95 12081  659.5 Not All-NBA ( 0.0042215 0.9957785 )

4) advancedStats_WS < 5.55 10613  116.5 Not All-NBA ( 0.0006596 0.9993404 ) *

5) advancedStats_WS > 5.55 1468  395.3 Not All-NBA ( 0.0299728 0.9700272 ) *

3) advancedStats_WS > 7.95 1139 1529.0 Not All-NBA ( 0.3959614 0.6040386 )

6) advancedStats_WS < 10.55 719  714.6 Not All-NBA ( 0.1974965 0.8025035 ) *

7) advancedStats_WS > 10.55 420  485.1 All-NBA ( 0.7357143 0.2642857 )

14) advancedStats_WS < 12.75 240  323.0 All-NBA ( 0.6000000 0.4000000 ) *

15) advancedStats_WS > 12.75 180  103.3 All-NBA ( 0.9166667 0.0833333 ) *



Here, I’m getting a chance to see the logic behind how the tree is estimating some of these nodes. I still don’t quite understand why there are 5 terminal nodes, but I can see that many of the terminal nodes boast pretty confident probabilities.

The first thing I want to point out again is the overwhelming amount of non all-NBA players. Only 3.8% of the entire population is all-NBA calibre! This can be observed in the “root” node (first one).

The second thing I notice is how high the deviance values are… why are they so high? I only have 2 classes, and each class cannot have a deviance value of over 1… so how are we getting values in the 100’s?

After googling a bit, I’m seeing some sources calculate deviance like:

$D=-2n_{i}\sum_{k=1}^{K}\hat{p}_{mk}log{\hat{p}_{mk}}$

where we also multiply by the number of observations in each node and scale by a factor of 2. I won’t pretend like I quite understand why we do this now, and after running through some calculations by myself the numbers still don’t align completely, but it does get me closer to the value that I’m seeing in the output of the tree function. I’ll just accept this for now, assume that the underlying calculation is still based off Cross Entropy, and understand that the smaller the value, the better.

Understanding of Cross Entropy or deviance helps me gauge a bit better why the tree may have stopped growing. I’m going to poke around the documentation a bit more and see if the tree function has some default stop setting.

It turns out that there is a parameter within tree, control, which sets some default stop values. There are the minsize and mindev parameters which control the minimum size of each node and the minimum deviance (in terms of % of root node) for the node to be split respectively. The root node has deviance of about ~4200, so we’d expect the each terminal node to be split if it was above

Let’s see what happens when I change these values. Let’s lower the threshold on the minimum deviance. In fact, let’s just go all the way to 0 and tell the library to grow the whole tree. Whoop de doo!!

In [8]:
%%R -i playerAggDfAllNbaAllStar -o treeModelMinDev
library(tree)

# Build a tree using the 'tree' function with mindev = 0, indicating that we want to grow the whole tree
treeModelMinDev = tree(
data = playerAggDfAllNbaAllStar,
mindev = 0
)

# Plot the tree
plot(treeModelMinDev)
text(treeModelMinDev)

In [9]:
# Print the tree branches
print treeModelMinDev

node), split, n, deviance, yval, (yprob)

* denotes terminal node

1) root 13220 4269.000 Not All-NBA ( 0.0379728 0.9620272 )

2) advancedStats_WS < 7.95 12081  659.500 Not All-NBA ( 0.0042215 0.9957785 )

4) advancedStats_WS < 5.55 10613  116.500 Not All-NBA ( 0.0006596 0.9993404 )

8) advancedStats_VORP < 0.45 7150    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

9) advancedStats_VORP > 0.45 3463  100.800 Not All-NBA ( 0.0020214 0.9979786 )

18) advancedStats_VORP < 1.25 2277   32.150 Not All-NBA ( 0.0008783 0.9991217 )

36) advancedStats_WS < 3.75 1468   30.390 Not All-NBA ( 0.0013624 0.9986376 )

72) advancedStats_WS < 3.65 1387   16.470 Not All-NBA ( 0.0007210 0.9992790 )

144) advancedStats_VORP < 0.55 309   13.460 Not All-NBA ( 0.0032362 0.9967638 )

288) advancedStats_WS < 2.55 181    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

289) advancedStats_WS > 2.55 128   11.700 Not All-NBA ( 0.0078125 0.9921875 )

578) advancedStats_WS < 2.65 15    7.348 Not All-NBA ( 0.0666667 0.9333333 ) *

579) advancedStats_WS > 2.65 113    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

145) advancedStats_VORP > 0.55 1078    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

73) advancedStats_WS > 3.65 81   10.780 Not All-NBA ( 0.0123457 0.9876543 )

146) advancedStats_VORP < 0.95 59    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

147) advancedStats_VORP > 0.95 22    8.136 Not All-NBA ( 0.0454545 0.9545455 )

294) advancedStats_VORP < 1.05 12    6.884 Not All-NBA ( 0.0833333 0.9166667 ) *

295) advancedStats_VORP > 1.05 10    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

37) advancedStats_WS > 3.75 809    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

19) advancedStats_VORP > 1.25 1186   64.670 Not All-NBA ( 0.0042159 0.9957841 )

38) advancedStats_VORP < 1.35 165   29.990 Not All-NBA ( 0.0181818 0.9818182 )

76) advancedStats_WS < 3.45 39    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

77) advancedStats_WS > 3.45 126   28.350 Not All-NBA ( 0.0238095 0.9761905 )

154) advancedStats_WS < 3.55 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

155) advancedStats_WS > 3.55 118   20.280 Not All-NBA ( 0.0169492 0.9830508 )

310) advancedStats_WS < 4.45 66    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

311) advancedStats_WS > 4.45 52   16.950 Not All-NBA ( 0.0384615 0.9615385 )

622) advancedStats_WS < 4.55 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

623) advancedStats_WS > 4.55 47    9.679 Not All-NBA ( 0.0212766 0.9787234 )

1246) advancedStats_WS < 5.05 26    8.477 Not All-NBA ( 0.0384615 0.9615385 )

2492) advancedStats_WS < 4.95 18    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

2493) advancedStats_WS > 4.95 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

1247) advancedStats_WS > 5.05 21    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

39) advancedStats_VORP > 1.35 1021   28.940 Not All-NBA ( 0.0019589 0.9980411 )

78) advancedStats_WS < 4.85 634    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

79) advancedStats_WS > 4.85 387   25.050 Not All-NBA ( 0.0051680 0.9948320 )

158) advancedStats_WS < 5.25 214   22.670 Not All-NBA ( 0.0093458 0.9906542 )

316) advancedStats_VORP < 2.45 188   12.470 Not All-NBA ( 0.0053191 0.9946809 )

632) advancedStats_WS < 4.95 53    9.922 Not All-NBA ( 0.0188679 0.9811321 )

1264) advancedStats_VORP < 1.65 19    7.835 Not All-NBA ( 0.0526316 0.9473684 )

2528) advancedStats_VORP < 1.55 14    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

2529) advancedStats_VORP > 1.55 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

1265) advancedStats_VORP > 1.65 34    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

633) advancedStats_WS > 4.95 135    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

317) advancedStats_VORP > 2.45 26    8.477 Not All-NBA ( 0.0384615 0.9615385 )

634) advancedStats_VORP < 2.55 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

635) advancedStats_VORP > 2.55 19    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

159) advancedStats_WS > 5.25 173    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

5) advancedStats_WS > 5.55 1468  395.300 Not All-NBA ( 0.0299728 0.9700272 )

10) advancedStats_WS < 6.85 937  153.800 Not All-NBA ( 0.0160085 0.9839915 )

20) advancedStats_VORP < 2.05 518   46.880 Not All-NBA ( 0.0077220 0.9922780 )

40) advancedStats_WS < 6.45 398   44.760 Not All-NBA ( 0.0100503 0.9899497 )

80) advancedStats_VORP < 1.85 332   43.300 Not All-NBA ( 0.0120482 0.9879518 )

160) advancedStats_VORP < 0.75 48    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

161) advancedStats_VORP > 0.75 284   42.040 Not All-NBA ( 0.0140845 0.9859155 )

322) advancedStats_VORP < 0.85 17    7.606 Not All-NBA ( 0.0588235 0.9411765 )

644) advancedStats_WS < 5.85 8    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

645) advancedStats_WS > 5.85 9    6.279 Not All-NBA ( 0.1111111 0.8888889 ) *

323) advancedStats_VORP > 0.85 267   32.900 Not All-NBA ( 0.0112360 0.9887640 )

646) advancedStats_VORP < 1.35 107    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

647) advancedStats_VORP > 1.35 160   29.800 Not All-NBA ( 0.0187500 0.9812500 )

1294) advancedStats_WS < 5.85 54   17.110 Not All-NBA ( 0.0370370 0.9629630 )

2588) advancedStats_VORP < 1.45 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

2589) advancedStats_VORP > 1.45 46    9.635 Not All-NBA ( 0.0217391 0.9782609 )

5178) advancedStats_VORP < 1.75 37    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

5179) advancedStats_VORP > 1.75 9    6.279 Not All-NBA ( 0.1111111 0.8888889 ) *

1295) advancedStats_WS > 5.85 106   11.320 Not All-NBA ( 0.0094340 0.9905660 )

2590) advancedStats_WS < 6.35 86    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

2591) advancedStats_WS > 6.35 20    7.941 Not All-NBA ( 0.0500000 0.9500000 )

5182) advancedStats_VORP < 1.55 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

5183) advancedStats_VORP > 1.55 14    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

81) advancedStats_VORP > 1.85 66    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

41) advancedStats_WS > 6.45 120    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

21) advancedStats_VORP > 2.05 419  101.800 Not All-NBA ( 0.0262530 0.9737470 )

42) advancedStats_WS < 6.65 339   97.060 Not All-NBA ( 0.0324484 0.9675516 )

84) advancedStats_WS < 5.65 24    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

85) advancedStats_WS > 5.65 315   95.410 Not All-NBA ( 0.0349206 0.9650794 )

170) advancedStats_VORP < 3.15 274   72.300 Not All-NBA ( 0.0291971 0.9708029 )

340) advancedStats_VORP < 2.75 212   68.130 Not All-NBA ( 0.0377358 0.9622642 )

680) advancedStats_WS < 6.45 172   64.710 Not All-NBA ( 0.0465116 0.9534884 )

1360) advancedStats_WS < 6.15 106   27.300 Not All-NBA ( 0.0283019 0.9716981 )

2720) advancedStats_WS < 5.85 40   21.310 Not All-NBA ( 0.0750000 0.9250000 )

5440) advancedStats_VORP < 2.55 35    9.082 Not All-NBA ( 0.0285714 0.9714286 )

10880) advancedStats_VORP < 2.25 18    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

10881) advancedStats_VORP > 2.25 17    7.606 Not All-NBA ( 0.0588235 0.9411765 )

21762) advancedStats_WS < 5.75 9    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

21763) advancedStats_WS > 5.75 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

5441) advancedStats_VORP > 2.55 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

2721) advancedStats_WS > 5.85 66    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1361) advancedStats_WS > 6.15 66   35.410 Not All-NBA ( 0.0757576 0.9242424 )

2722) advancedStats_VORP < 2.45 49   32.300 Not All-NBA ( 0.1020408 0.8979592 )

5444) advancedStats_WS < 6.25 22   20.860 Not All-NBA ( 0.1818182 0.8181818 )

10888) advancedStats_VORP < 2.15 7    8.376 Not All-NBA ( 0.2857143 0.7142857 ) *

10889) advancedStats_VORP > 2.15 15   11.780 Not All-NBA ( 0.1333333 0.8666667 )

21778) advancedStats_VORP < 2.25 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

21779) advancedStats_VORP > 2.25 9    9.535 Not All-NBA ( 0.2222222 0.7777778 ) *

5445) advancedStats_WS > 6.25 27    8.554 Not All-NBA ( 0.0370370 0.9629630 )

10890) advancedStats_VORP < 2.25 16    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

10891) advancedStats_VORP > 2.25 11    6.702 Not All-NBA ( 0.0909091 0.9090909 ) *

2723) advancedStats_VORP > 2.45 17    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

681) advancedStats_WS > 6.45 40    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

341) advancedStats_VORP > 2.75 62    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

171) advancedStats_VORP > 3.15 41   21.460 Not All-NBA ( 0.0731707 0.9268293 )

342) advancedStats_VORP < 3.45 21   17.220 Not All-NBA ( 0.1428571 0.8571429 )

684) advancedStats_WS < 6.45 15    7.348 Not All-NBA ( 0.0666667 0.9333333 )

1368) advancedStats_WS < 6.1 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

1369) advancedStats_WS > 6.1 8    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

685) advancedStats_WS > 6.45 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

343) advancedStats_VORP > 3.45 20    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

43) advancedStats_WS > 6.65 80    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

11) advancedStats_WS > 6.85 531  225.000 Not All-NBA ( 0.0546139 0.9453861 )

22) advancedStats_VORP < 2.25 189   53.210 Not All-NBA ( 0.0317460 0.9682540 )

44) advancedStats_VORP < 2.05 135   49.090 Not All-NBA ( 0.0444444 0.9555556 )

88) advancedStats_WS < 7.65 109   46.460 Not All-NBA ( 0.0550459 0.9449541 )

176) advancedStats_WS < 7.35 70   18.160 Not All-NBA ( 0.0285714 0.9714286 )

352) advancedStats_VORP < 0.95 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

353) advancedStats_VORP > 0.95 65   10.330 Not All-NBA ( 0.0153846 0.9846154 )

706) advancedStats_VORP < 1.75 38    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

707) advancedStats_VORP > 1.75 27    8.554 Not All-NBA ( 0.0370370 0.9629630 )

1414) advancedStats_VORP < 1.85 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

1415) advancedStats_VORP > 1.85 19    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

177) advancedStats_WS > 7.35 39   25.790 Not All-NBA ( 0.1025641 0.8974359 )

354) advancedStats_VORP < 1.05 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

355) advancedStats_VORP > 1.05 34   20.290 Not All-NBA ( 0.0882353 0.9117647 )

710) advancedStats_VORP < 1.45 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

711) advancedStats_VORP > 1.45 28   19.070 Not All-NBA ( 0.1071429 0.8928571 )

1422) advancedStats_VORP < 1.75 12    6.884 Not All-NBA ( 0.0833333 0.9166667 ) *

1423) advancedStats_VORP > 1.75 16   12.060 Not All-NBA ( 0.1250000 0.8750000 )

2846) advancedStats_VORP < 1.85 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

2847) advancedStats_VORP > 1.85 11    6.702 Not All-NBA ( 0.0909091 0.9090909 ) *

89) advancedStats_WS > 7.65 26    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

45) advancedStats_VORP > 2.05 54    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

23) advancedStats_VORP > 2.25 342  168.600 Not All-NBA ( 0.0672515 0.9327485 )

46) advancedStats_WS < 7.35 148   56.380 Not All-NBA ( 0.0472973 0.9527027 )

92) advancedStats_VORP < 3.75 135   36.030 Not All-NBA ( 0.0296296 0.9703704 )

184) advancedStats_VORP < 3.35 112   34.510 Not All-NBA ( 0.0357143 0.9642857 )

368) advancedStats_VORP < 3.25 98   19.530 Not All-NBA ( 0.0204082 0.9795918 )

736) advancedStats_VORP < 2.75 55   17.180 Not All-NBA ( 0.0363636 0.9636364 )

1472) advancedStats_WS < 7.25 48    9.721 Not All-NBA ( 0.0208333 0.9791667 )

2944) advancedStats_VORP < 2.65 38    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

2945) advancedStats_VORP > 2.65 10    6.502 Not All-NBA ( 0.1000000 0.9000000 )

5890) advancedStats_WS < 6.95 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

5891) advancedStats_WS > 6.95 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1473) advancedStats_WS > 7.25 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

737) advancedStats_VORP > 2.75 43    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

369) advancedStats_VORP > 3.25 14   11.480 Not All-NBA ( 0.1428571 0.8571429 )

738) advancedStats_WS < 7.15 9    9.535 Not All-NBA ( 0.2222222 0.7777778 ) *

739) advancedStats_WS > 7.15 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

185) advancedStats_VORP > 3.35 23    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

93) advancedStats_VORP > 3.75 13   14.050 Not All-NBA ( 0.2307692 0.7692308 )

186) advancedStats_VORP < 4.15 7    9.561 Not All-NBA ( 0.4285714 0.5714286 ) *

187) advancedStats_VORP > 4.15 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

47) advancedStats_WS > 7.35 194  110.500 Not All-NBA ( 0.0824742 0.9175258 )

94) advancedStats_VORP < 2.35 12   13.500 Not All-NBA ( 0.2500000 0.7500000 )

188) advancedStats_WS < 7.75 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

189) advancedStats_WS > 7.75 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

95) advancedStats_VORP > 2.35 182   93.660 Not All-NBA ( 0.0714286 0.9285714 )

190) advancedStats_VORP < 2.45 11    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

191) advancedStats_VORP > 2.45 171   91.980 Not All-NBA ( 0.0760234 0.9239766 )

382) advancedStats_VORP < 3.45 120   73.530 Not All-NBA ( 0.0916667 0.9083333 )

764) advancedStats_VORP < 3.35 112   62.640 Not All-NBA ( 0.0803571 0.9196429 )

1528) advancedStats_VORP < 3.25 105   61.430 Not All-NBA ( 0.0857143 0.9142857 )

3056) advancedStats_WS < 7.65 65   30.050 Not All-NBA ( 0.0615385 0.9384615 )

6112) advancedStats_VORP < 2.7 22    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

6113) advancedStats_VORP > 2.7 43   26.620 Not All-NBA ( 0.0930233 0.9069767 )

12226) advancedStats_WS < 7.55 29   23.270 Not All-NBA ( 0.1379310 0.8620690 )

24452) advancedStats_VORP < 2.85 7    8.376 Not All-NBA ( 0.2857143 0.7142857 ) *

24453) advancedStats_VORP > 2.85 22   13.400 Not All-NBA ( 0.0909091 0.9090909 )

48906) advancedStats_WS < 7.45 8    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

48907) advancedStats_WS > 7.45 14   11.480 Not All-NBA ( 0.1428571 0.8571429 )

97814) advancedStats_VORP < 3.1 9    6.279 Not All-NBA ( 0.1111111 0.8888889 ) *

97815) advancedStats_VORP > 3.1 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

12227) advancedStats_WS > 7.55 14    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

3057) advancedStats_WS > 7.65 40   30.140 Not All-NBA ( 0.1250000 0.8750000 )

6114) advancedStats_VORP < 3.05 35   28.710 Not All-NBA ( 0.1428571 0.8571429 )

12228) advancedStats_VORP < 2.95 30   19.500 Not All-NBA ( 0.1000000 0.9000000 )

24456) advancedStats_VORP < 2.85 22   17.530 Not All-NBA ( 0.1363636 0.8636364 )

48912) advancedStats_VORP < 2.65 9    9.535 Not All-NBA ( 0.2222222 0.7777778 ) *

48913) advancedStats_VORP > 2.65 13    7.051 Not All-NBA ( 0.0769231 0.9230769 )

97826) advancedStats_WS < 7.85 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

97827) advancedStats_WS > 7.85 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

24457) advancedStats_VORP > 2.85 8    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

12229) advancedStats_VORP > 2.95 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

6115) advancedStats_VORP > 3.05 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1529) advancedStats_VORP > 3.25 7    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

765) advancedStats_VORP > 3.35 8    8.997 Not All-NBA ( 0.2500000 0.7500000 ) *

383) advancedStats_VORP > 3.45 51   16.880 Not All-NBA ( 0.0392157 0.9607843 )

766) advancedStats_WS < 7.65 23   13.590 Not All-NBA ( 0.0869565 0.9130435 )

1532) advancedStats_VORP < 3.75 10    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1533) advancedStats_VORP > 3.75 13   11.160 Not All-NBA ( 0.1538462 0.8461538 )

3066) advancedStats_VORP < 4.25 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

3067) advancedStats_VORP > 4.25 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

767) advancedStats_WS > 7.65 28    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

3) advancedStats_WS > 7.95 1139 1529.000 Not All-NBA ( 0.3959614 0.6040386 )

6) advancedStats_WS < 10.55 719  714.600 Not All-NBA ( 0.1974965 0.8025035 )

12) advancedStats_WS < 9.05 366  284.400 Not All-NBA ( 0.1311475 0.8688525 )

24) advancedStats_VORP < 4.95 353  249.500 Not All-NBA ( 0.1133144 0.8866856 )

48) advancedStats_VORP < 4.05 318  207.600 Not All-NBA ( 0.1006289 0.8993711 )

96) advancedStats_WS < 8.05 44   38.560 Not All-NBA ( 0.1590909 0.8409091 )

192) advancedStats_VORP < 2.05 11   12.890 Not All-NBA ( 0.2727273 0.7272727 ) *

193) advancedStats_VORP > 2.05 33   24.380 Not All-NBA ( 0.1212121 0.8787879 )

386) advancedStats_VORP < 3.55 25   13.940 Not All-NBA ( 0.0800000 0.9200000 )

772) advancedStats_VORP < 2.35 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

773) advancedStats_VORP > 2.35 20   13.000 Not All-NBA ( 0.1000000 0.9000000 )

1546) advancedStats_VORP < 2.75 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

1547) advancedStats_VORP > 2.75 12    6.884 Not All-NBA ( 0.0833333 0.9166667 ) *

387) advancedStats_VORP > 3.55 8    8.997 Not All-NBA ( 0.2500000 0.7500000 ) *

97) advancedStats_WS > 8.05 274  167.400 Not All-NBA ( 0.0912409 0.9087591 )

194) advancedStats_VORP < 3.95 265  165.600 Not All-NBA ( 0.0943396 0.9056604 )

388) advancedStats_VORP < 1.3 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

389) advancedStats_VORP > 1.3 260  164.600 Not All-NBA ( 0.0961538 0.9038462 )

778) advancedStats_WS < 8.95 233  141.100 Not All-NBA ( 0.0901288 0.9098712 )

1556) advancedStats_VORP < 2.35 49   40.190 Not All-NBA ( 0.1428571 0.8571429 )

3112) advancedStats_WS < 8.55 33   15.090 Not All-NBA ( 0.0606061 0.9393939 )

6224) advancedStats_VORP < 1.85 10    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

6225) advancedStats_VORP > 1.85 23   13.590 Not All-NBA ( 0.0869565 0.9130435 )

12450) advancedStats_VORP < 2.25 17   12.320 Not All-NBA ( 0.1176471 0.8823529 )

24900) advancedStats_VORP < 2.05 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

24901) advancedStats_VORP > 2.05 12    6.884 Not All-NBA ( 0.0833333 0.9166667 )

49802) advancedStats_WS < 8.15 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

49803) advancedStats_WS > 8.15 7    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

12451) advancedStats_VORP > 2.25 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

3113) advancedStats_WS > 8.55 16   19.870 Not All-NBA ( 0.3125000 0.6875000 )

6226) advancedStats_WS < 8.75 9   12.370 Not All-NBA ( 0.4444444 0.5555556 ) *

6227) advancedStats_WS > 8.75 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

1557) advancedStats_VORP > 2.35 184   99.030 Not All-NBA ( 0.0760870 0.9239130 )

3114) advancedStats_VORP < 2.55 25    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

3115) advancedStats_VORP > 2.55 159   94.770 Not All-NBA ( 0.0880503 0.9119497 )

6230) advancedStats_VORP < 2.65 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

6231) advancedStats_VORP > 2.65 153   84.130 Not All-NBA ( 0.0784314 0.9215686 )

12462) advancedStats_WS < 8.45 63   43.950 Not All-NBA ( 0.1111111 0.8888889 )

24924) advancedStats_VORP < 3.85 57   42.460 Not All-NBA ( 0.1228070 0.8771930 )

49848) advancedStats_VORP < 3.6 49   32.300 Not All-NBA ( 0.1020408 0.8979592 )

99696) advancedStats_WS < 8.35 37   20.820 Not All-NBA ( 0.0810811 0.9189189 )

199392) advancedStats_VORP < 3.15 23    8.227 Not All-NBA ( 0.0434783 0.9565217 )

398784) advancedStats_VORP < 2.85 11    6.702 Not All-NBA ( 0.0909091 0.9090909 )

797568) advancedStats_VORP < 2.75 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

797569) advancedStats_VORP > 2.75 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

398785) advancedStats_VORP > 2.85 12    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

199393) advancedStats_VORP > 3.15 14   11.480 Not All-NBA ( 0.1428571 0.8571429 )

398786) advancedStats_WS < 8.25 9    6.279 Not All-NBA ( 0.1111111 0.8888889 ) *

398787) advancedStats_WS > 8.25 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

99697) advancedStats_WS > 8.35 12   10.810 Not All-NBA ( 0.1666667 0.8333333 )

199394) advancedStats_VORP < 2.75 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

199395) advancedStats_VORP > 2.75 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

49849) advancedStats_VORP > 3.6 8    8.997 Not All-NBA ( 0.2500000 0.7500000 ) *

24925) advancedStats_VORP > 3.85 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

12463) advancedStats_WS > 8.45 90   38.620 Not All-NBA ( 0.0555556 0.9444444 )

24926) advancedStats_VORP < 2.85 16    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

24927) advancedStats_VORP > 2.85 74   36.600 Not All-NBA ( 0.0675676 0.9324324 )

49854) advancedStats_VORP < 3.85 69   30.550 Not All-NBA ( 0.0579710 0.9420290 )

99708) advancedStats_VORP < 3.45 48   27.540 Not All-NBA ( 0.0833333 0.9166667 )

199416) advancedStats_WS < 8.55 11    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

199417) advancedStats_WS > 8.55 37   25.350 Not All-NBA ( 0.1081081 0.8918919 )

398834) advancedStats_VORP < 3.15 16    7.481 Not All-NBA ( 0.0625000 0.9375000 )

797668) advancedStats_WS < 8.75 9    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

797669) advancedStats_WS > 8.75 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

398835) advancedStats_VORP > 3.15 21   17.220 Not All-NBA ( 0.1428571 0.8571429 )

797670) advancedStats_VORP < 3.25 7    8.376 Not All-NBA ( 0.2857143 0.7142857 ) *

797671) advancedStats_VORP > 3.25 14    7.205 Not All-NBA ( 0.0714286 0.9285714 )

1595342) advancedStats_VORP < 3.35 7    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1595343) advancedStats_VORP > 3.35 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

99709) advancedStats_VORP > 3.45 21    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

49855) advancedStats_VORP > 3.85 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

779) advancedStats_WS > 8.95 27   22.650 Not All-NBA ( 0.1481481 0.8518519 )

1558) advancedStats_VORP < 2.6 10    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1559) advancedStats_VORP > 2.6 17   18.550 Not All-NBA ( 0.2352941 0.7647059 )

3118) advancedStats_VORP < 3.35 9   11.460 Not All-NBA ( 0.3333333 0.6666667 ) *

3119) advancedStats_VORP > 3.35 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

195) advancedStats_VORP > 3.95 9    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

49) advancedStats_VORP > 4.05 35   37.630 Not All-NBA ( 0.2285714 0.7714286 )

98) advancedStats_VORP < 4.55 27   32.820 Not All-NBA ( 0.2962963 0.7037037 )

196) advancedStats_WS < 8.15 5    6.730 All-NBA ( 0.6000000 0.4000000 ) *

197) advancedStats_WS > 8.15 22   23.580 Not All-NBA ( 0.2272727 0.7727273 )

394) advancedStats_VORP < 4.25 10    6.502 Not All-NBA ( 0.1000000 0.9000000 ) *

395) advancedStats_VORP > 4.25 12   15.280 Not All-NBA ( 0.3333333 0.6666667 )

790) advancedStats_WS < 8.65 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

791) advancedStats_WS > 8.65 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

99) advancedStats_VORP > 4.55 8    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

25) advancedStats_VORP > 4.95 13   17.320 All-NBA ( 0.6153846 0.3846154 )

50) advancedStats_VORP < 5.25 7    8.376 All-NBA ( 0.7142857 0.2857143 ) *

51) advancedStats_VORP > 5.25 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

13) advancedStats_WS > 9.05 353  409.200 Not All-NBA ( 0.2662890 0.7337110 )

26) advancedStats_VORP < 4.45 272  299.100 Not All-NBA ( 0.2389706 0.7610294 )

52) advancedStats_WS < 10.35 247  262.000 Not All-NBA ( 0.2226721 0.7773279 )

104) advancedStats_VORP < 3.75 175  192.900 Not All-NBA ( 0.2400000 0.7600000 )

208) advancedStats_VORP < 3.65 156  163.600 Not All-NBA ( 0.2179487 0.7820513 )

416) advancedStats_VORP < 3.25 103  114.200 Not All-NBA ( 0.2427184 0.7572816 )

832) advancedStats_WS < 10.15 93  108.300 Not All-NBA ( 0.2688172 0.7311828 )

1664) advancedStats_WS < 9.45 41   40.470 Not All-NBA ( 0.1951220 0.8048780 )

3328) advancedStats_VORP < 2.65 23   28.270 Not All-NBA ( 0.3043478 0.6956522 )

6656) advancedStats_VORP < 2.35 12   10.810 Not All-NBA ( 0.1666667 0.8333333 )

13312) advancedStats_WS < 9.25 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

13313) advancedStats_WS > 9.25 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

6657) advancedStats_VORP > 2.35 11   15.160 Not All-NBA ( 0.4545455 0.5454545 )

13314) advancedStats_WS < 9.35 5    6.730 All-NBA ( 0.6000000 0.4000000 ) *

13315) advancedStats_WS > 9.35 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

3329) advancedStats_VORP > 2.65 18    7.724 Not All-NBA ( 0.0555556 0.9444444 )

6658) advancedStats_WS < 9.25 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

6659) advancedStats_WS > 9.25 12    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1665) advancedStats_WS > 9.45 52   65.730 Not All-NBA ( 0.3269231 0.6730769 )

3330) advancedStats_VORP < 3.05 32   35.990 Not All-NBA ( 0.2500000 0.7500000 )

6660) advancedStats_VORP < 2.95 26   32.100 Not All-NBA ( 0.3076923 0.6923077 )

13320) advancedStats_WS < 9.65 7    9.561 All-NBA ( 0.5714286 0.4285714 ) *

13321) advancedStats_WS > 9.65 19   19.560 Not All-NBA ( 0.2105263 0.7894737 )

26642) advancedStats_VORP < 2.5 10    6.502 Not All-NBA ( 0.1000000 0.9000000 ) *

26643) advancedStats_VORP > 2.5 9   11.460 Not All-NBA ( 0.3333333 0.6666667 ) *

6661) advancedStats_VORP > 2.95 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

3331) advancedStats_VORP > 3.05 20   27.530 Not All-NBA ( 0.4500000 0.5500000 )

6662) advancedStats_VORP < 3.15 13   17.320 Not All-NBA ( 0.3846154 0.6153846 )

13324) advancedStats_WS < 9.85 7    8.376 Not All-NBA ( 0.2857143 0.7142857 ) *

13325) advancedStats_WS > 9.85 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

6663) advancedStats_VORP > 3.15 7    9.561 All-NBA ( 0.5714286 0.4285714 ) *

833) advancedStats_WS > 10.15 10    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

417) advancedStats_VORP > 3.25 53   48.290 Not All-NBA ( 0.1698113 0.8301887 )

834) advancedStats_VORP < 3.45 25   13.940 Not All-NBA ( 0.0800000 0.9200000 )

1668) advancedStats_WS < 9.15 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

1669) advancedStats_WS > 9.15 19    7.835 Not All-NBA ( 0.0526316 0.9473684 )

3338) advancedStats_WS < 9.75 13    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

3339) advancedStats_WS > 9.75 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

835) advancedStats_VORP > 3.45 28   31.490 Not All-NBA ( 0.2500000 0.7500000 )

1670) advancedStats_WS < 9.65 13   17.320 Not All-NBA ( 0.3846154 0.6153846 )

3340) advancedStats_WS < 9.45 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

3341) advancedStats_WS > 9.45 7    9.561 Not All-NBA ( 0.4285714 0.5714286 ) *

1671) advancedStats_WS > 9.65 15   11.780 Not All-NBA ( 0.1333333 0.8666667 )

3342) advancedStats_VORP < 3.55 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

3343) advancedStats_VORP > 3.55 9    6.279 Not All-NBA ( 0.1111111 0.8888889 ) *

209) advancedStats_VORP > 3.65 19   25.860 Not All-NBA ( 0.4210526 0.5789474 )

418) advancedStats_WS < 9.95 12   15.280 Not All-NBA ( 0.3333333 0.6666667 )

836) advancedStats_WS < 9.4 5    6.730 All-NBA ( 0.6000000 0.4000000 ) *

837) advancedStats_WS > 9.4 7    5.742 Not All-NBA ( 0.1428571 0.8571429 ) *

419) advancedStats_WS > 9.95 7    9.561 All-NBA ( 0.5714286 0.4285714 ) *

105) advancedStats_VORP > 3.75 72   68.000 Not All-NBA ( 0.1805556 0.8194444 )

210) advancedStats_WS < 10.25 63   55.130 Not All-NBA ( 0.1587302 0.8412698 )

420) advancedStats_WS < 9.25 15   17.400 Not All-NBA ( 0.2666667 0.7333333 )

840) advancedStats_WS < 9.15 9    6.279 Not All-NBA ( 0.1111111 0.8888889 ) *

841) advancedStats_WS > 9.15 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

421) advancedStats_WS > 9.25 48   36.170 Not All-NBA ( 0.1250000 0.8750000 )

842) advancedStats_VORP < 4.15 27   14.260 Not All-NBA ( 0.0740741 0.9259259 )

1684) advancedStats_WS < 10.05 20   13.000 Not All-NBA ( 0.1000000 0.9000000 )

3368) advancedStats_VORP < 4.05 13   11.160 Not All-NBA ( 0.1538462 0.8461538 )

6736) advancedStats_WS < 9.5 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

6737) advancedStats_WS > 9.5 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

3369) advancedStats_VORP > 4.05 7    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1685) advancedStats_WS > 10.05 7    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

843) advancedStats_VORP > 4.15 21   20.450 Not All-NBA ( 0.1904762 0.8095238 )

1686) advancedStats_WS < 9.95 16   12.060 Not All-NBA ( 0.1250000 0.8750000 )

3372) advancedStats_WS < 9.75 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

3373) advancedStats_WS > 9.75 10    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

1687) advancedStats_WS > 9.95 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

211) advancedStats_WS > 10.25 9   11.460 Not All-NBA ( 0.3333333 0.6666667 ) *

53) advancedStats_WS > 10.35 25   33.650 Not All-NBA ( 0.4000000 0.6000000 )

106) advancedStats_VORP < 3.75 19   26.290 All-NBA ( 0.5263158 0.4736842 )

212) advancedStats_VORP < 3.25 9   11.460 All-NBA ( 0.6666667 0.3333333 ) *

213) advancedStats_VORP > 3.25 10   13.460 Not All-NBA ( 0.4000000 0.6000000 ) *

107) advancedStats_VORP > 3.75 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

27) advancedStats_VORP > 4.45 81  105.700 Not All-NBA ( 0.3580247 0.6419753 )

54) advancedStats_WS < 9.45 18   16.220 Not All-NBA ( 0.1666667 0.8333333 )

108) advancedStats_VORP < 5.1 12   13.500 Not All-NBA ( 0.2500000 0.7500000 )

216) advancedStats_WS < 9.25 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

217) advancedStats_WS > 9.25 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

109) advancedStats_VORP > 5.1 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

55) advancedStats_WS > 9.45 63   85.410 Not All-NBA ( 0.4126984 0.5873016 )

110) advancedStats_WS < 9.55 6    7.638 All-NBA ( 0.6666667 0.3333333 ) *

111) advancedStats_WS > 9.55 57   76.030 Not All-NBA ( 0.3859649 0.6140351 )

222) advancedStats_WS < 9.85 17   18.550 Not All-NBA ( 0.2352941 0.7647059 )

444) advancedStats_WS < 9.75 11   14.420 Not All-NBA ( 0.3636364 0.6363636 )

888) advancedStats_WS < 9.65 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

889) advancedStats_WS > 9.65 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

445) advancedStats_WS > 9.75 6    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

223) advancedStats_WS > 9.85 40   55.050 Not All-NBA ( 0.4500000 0.5500000 )

446) advancedStats_VORP < 4.55 5    5.004 All-NBA ( 0.8000000 0.2000000 ) *

447) advancedStats_VORP > 4.55 35   47.110 Not All-NBA ( 0.4000000 0.6000000 )

894) advancedStats_WS < 10.25 22   30.500 Not All-NBA ( 0.5000000 0.5000000 )

1788) advancedStats_VORP < 5.05 13   17.320 Not All-NBA ( 0.3846154 0.6153846 )

3576) advancedStats_VORP < 4.7 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

3577) advancedStats_VORP > 4.7 7    8.376 Not All-NBA ( 0.2857143 0.7142857 ) *

1789) advancedStats_VORP > 5.05 9   11.460 All-NBA ( 0.6666667 0.3333333 ) *

895) advancedStats_WS > 10.25 13   14.050 Not All-NBA ( 0.2307692 0.7692308 )

1790) advancedStats_VORP < 5.15 6    5.407 Not All-NBA ( 0.1666667 0.8333333 ) *

1791) advancedStats_VORP > 5.15 7    8.376 Not All-NBA ( 0.2857143 0.7142857 ) *

7) advancedStats_WS > 10.55 420  485.100 All-NBA ( 0.7357143 0.2642857 )

14) advancedStats_WS < 12.75 240  323.000 All-NBA ( 0.6000000 0.4000000 )

28) advancedStats_VORP < 4.45 93  128.700 Not All-NBA ( 0.4731183 0.5268817 )

56) advancedStats_WS < 11.8 84  114.700 Not All-NBA ( 0.4285714 0.5714286 )

112) advancedStats_WS < 11.65 79  108.900 Not All-NBA ( 0.4556962 0.5443038 )

224) advancedStats_WS < 10.65 9    9.535 All-NBA ( 0.7777778 0.2222222 ) *

225) advancedStats_WS > 10.65 70   94.970 Not All-NBA ( 0.4142857 0.5857143 )

450) advancedStats_VORP < 2.65 8    8.997 All-NBA ( 0.7500000 0.2500000 ) *

451) advancedStats_VORP > 2.65 62   81.770 Not All-NBA ( 0.3709677 0.6290323 )

902) advancedStats_WS < 11.15 37   45.030 Not All-NBA ( 0.2972973 0.7027027 )

1804) advancedStats_WS < 10.85 19   25.860 Not All-NBA ( 0.4210526 0.5789474 )

3608) advancedStats_WS < 10.75 8    8.997 Not All-NBA ( 0.2500000 0.7500000 ) *

3609) advancedStats_WS > 10.75 11   15.160 All-NBA ( 0.5454545 0.4545455 )

7218) advancedStats_VORP < 3.4 5    6.730 All-NBA ( 0.6000000 0.4000000 ) *

7219) advancedStats_VORP > 3.4 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

1805) advancedStats_WS > 10.85 18   16.220 Not All-NBA ( 0.1666667 0.8333333 )

3610) advancedStats_VORP < 3.85 9   11.460 Not All-NBA ( 0.3333333 0.6666667 ) *

3611) advancedStats_VORP > 3.85 9    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

903) advancedStats_WS > 11.15 25   34.620 Not All-NBA ( 0.4800000 0.5200000 )

1806) advancedStats_VORP < 3.85 14   18.250 Not All-NBA ( 0.3571429 0.6428571 )

3612) advancedStats_WS < 11.35 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

3613) advancedStats_WS > 11.35 8    8.997 Not All-NBA ( 0.2500000 0.7500000 ) *

1807) advancedStats_VORP > 3.85 11   14.420 All-NBA ( 0.6363636 0.3636364 )

3614) advancedStats_WS < 11.35 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

3615) advancedStats_WS > 11.35 6    5.407 All-NBA ( 0.8333333 0.1666667 ) *

113) advancedStats_WS > 11.65 5    0.000 Not All-NBA ( 0.0000000 1.0000000 ) *

57) advancedStats_WS > 11.8 9    6.279 All-NBA ( 0.8888889 0.1111111 ) *

29) advancedStats_VORP > 4.45 147  184.200 All-NBA ( 0.6802721 0.3197279 )

58) advancedStats_WS < 11.9 91  105.000 All-NBA ( 0.7362637 0.2637363 )

116) advancedStats_VORP < 5.05 42   54.750 All-NBA ( 0.6428571 0.3571429 )

232) advancedStats_VORP < 4.75 25   29.650 All-NBA ( 0.7200000 0.2800000 )

464) advancedStats_WS < 10.75 6    0.000 All-NBA ( 1.0000000 0.0000000 ) *

465) advancedStats_WS > 10.75 19   25.010 All-NBA ( 0.6315789 0.3684211 )

930) advancedStats_VORP < 4.55 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

931) advancedStats_VORP > 4.55 13   16.050 All-NBA ( 0.6923077 0.3076923 )

1862) advancedStats_VORP < 4.65 6    5.407 All-NBA ( 0.8333333 0.1666667 ) *

1863) advancedStats_VORP > 4.65 7    9.561 All-NBA ( 0.5714286 0.4285714 ) *

233) advancedStats_VORP > 4.75 17   23.510 All-NBA ( 0.5294118 0.4705882 )

466) advancedStats_WS < 10.85 6    7.638 Not All-NBA ( 0.3333333 0.6666667 ) *

467) advancedStats_WS > 10.85 11   14.420 All-NBA ( 0.6363636 0.3636364 )

934) advancedStats_WS < 11.05 5    5.004 All-NBA ( 0.8000000 0.2000000 ) *

935) advancedStats_WS > 11.05 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

117) advancedStats_VORP > 5.05 49   46.740 All-NBA ( 0.8163265 0.1836735 )

234) advancedStats_WS < 11 14    7.205 All-NBA ( 0.9285714 0.0714286 )

468) advancedStats_WS < 10.75 7    5.742 All-NBA ( 0.8571429 0.1428571 ) *

469) advancedStats_WS > 10.75 7    0.000 All-NBA ( 1.0000000 0.0000000 ) *

235) advancedStats_WS > 11 35   37.630 All-NBA ( 0.7714286 0.2285714 )

470) advancedStats_VORP < 5.45 10    6.502 All-NBA ( 0.9000000 0.1000000 )

940) advancedStats_WS < 11.55 5    0.000 All-NBA ( 1.0000000 0.0000000 ) *

941) advancedStats_WS > 11.55 5    5.004 All-NBA ( 0.8000000 0.2000000 ) *

471) advancedStats_VORP > 5.45 25   29.650 All-NBA ( 0.7200000 0.2800000 )

942) advancedStats_VORP < 5.85 10   13.860 Not All-NBA ( 0.5000000 0.5000000 ) *

943) advancedStats_VORP > 5.85 15   11.780 All-NBA ( 0.8666667 0.1333333 )

1886) advancedStats_VORP < 6.35 8    0.000 All-NBA ( 1.0000000 0.0000000 ) *

1887) advancedStats_VORP > 6.35 7    8.376 All-NBA ( 0.7142857 0.2857143 ) *

59) advancedStats_WS > 11.9 56   75.840 All-NBA ( 0.5892857 0.4107143 )

118) advancedStats_WS < 12.65 44   60.630 All-NBA ( 0.5454545 0.4545455 )

236) advancedStats_VORP < 6.25 32   44.360 Not All-NBA ( 0.5000000 0.5000000 )

472) advancedStats_WS < 12.25 17   23.030 All-NBA ( 0.5882353 0.4117647 )

944) advancedStats_VORP < 5.5 12   13.500 All-NBA ( 0.7500000 0.2500000 )

1888) advancedStats_WS < 12.15 5    6.730 All-NBA ( 0.6000000 0.4000000 ) *

1889) advancedStats_WS > 12.15 7    5.742 All-NBA ( 0.8571429 0.1428571 ) *

945) advancedStats_VORP > 5.5 5    5.004 Not All-NBA ( 0.2000000 0.8000000 ) *

473) advancedStats_WS > 12.25 15   20.190 Not All-NBA ( 0.4000000 0.6000000 )

946) advancedStats_VORP < 5.4 8    6.028 Not All-NBA ( 0.1250000 0.8750000 ) *

947) advancedStats_VORP > 5.4 7    8.376 All-NBA ( 0.7142857 0.2857143 ) *

237) advancedStats_VORP > 6.25 12   15.280 All-NBA ( 0.6666667 0.3333333 )

474) advancedStats_WS < 12.45 6    5.407 All-NBA ( 0.8333333 0.1666667 ) *

475) advancedStats_WS > 12.45 6    8.318 Not All-NBA ( 0.5000000 0.5000000 ) *

119) advancedStats_WS > 12.65 12   13.500 All-NBA ( 0.7500000 0.2500000 )

238) advancedStats_VORP < 5.15 5    5.004 All-NBA ( 0.8000000 0.2000000 ) *

239) advancedStats_VORP > 5.15 7    8.376 All-NBA ( 0.7142857 0.2857143 ) *

15) advancedStats_WS > 12.75 180  103.300 All-NBA ( 0.9166667 0.0833333 )

30) advancedStats_WS < 14.95 111   87.920 All-NBA ( 0.8648649 0.1351351 )

60) advancedStats_VORP < 4.75 13    0.000 All-NBA ( 1.0000000 0.0000000 ) *

61) advancedStats_VORP > 4.75 98   83.880 All-NBA ( 0.8469388 0.1530612 )

122) advancedStats_WS < 13.65 54   54.590 All-NBA ( 0.7962963 0.2037037 )

244) advancedStats_WS < 13.45 44   38.560 All-NBA ( 0.8409091 0.1590909 )

488) advancedStats_WS < 13.35 38   36.310 All-NBA ( 0.8157895 0.1842105 )

976) advancedStats_VORP < 6.35 25   18.350 All-NBA ( 0.8800000 0.1200000 )

1952) advancedStats_VORP < 5.55 13   14.050 All-NBA ( 0.7692308 0.2307692 )

3904) advancedStats_WS < 13.15 8   10.590 All-NBA ( 0.6250000 0.3750000 ) *

3905) advancedStats_WS > 13.15 5    0.000 All-NBA ( 1.0000000 0.0000000 ) *

1953) advancedStats_VORP > 5.55 12    0.000 All-NBA ( 1.0000000 0.0000000 ) *

977) advancedStats_VORP > 6.35 13   16.050 All-NBA ( 0.6923077 0.3076923 )

1954) advancedStats_WS < 12.95 5    5.004 All-NBA ( 0.8000000 0.2000000 ) *

1955) advancedStats_WS > 12.95 8   10.590 All-NBA ( 0.6250000 0.3750000 ) *

489) advancedStats_WS > 13.35 6    0.000 All-NBA ( 1.0000000 0.0000000 ) *

245) advancedStats_WS > 13.45 10   13.460 All-NBA ( 0.6000000 0.4000000 )

490) advancedStats_WS < 13.55 5    6.730 Not All-NBA ( 0.4000000 0.6000000 ) *

491) advancedStats_WS > 13.55 5    5.004 All-NBA ( 0.8000000 0.2000000 ) *

123) advancedStats_WS > 13.65 44   26.810 All-NBA ( 0.9090909 0.0909091 )

246) advancedStats_WS < 14.05 19    0.000 All-NBA ( 1.0000000 0.0000000 ) *

247) advancedStats_WS > 14.05 25   21.980 All-NBA ( 0.8400000 0.1600000 )

494) advancedStats_WS < 14.35 6    7.638 All-NBA ( 0.6666667 0.3333333 ) *

495) advancedStats_WS > 14.35 19   12.790 All-NBA ( 0.8947368 0.1052632 )

990) advancedStats_VORP < 7.25 13    0.000 All-NBA ( 1.0000000 0.0000000 ) *

991) advancedStats_VORP > 7.25 6    7.638 All-NBA ( 0.6666667 0.3333333 ) *

31) advancedStats_WS > 14.95 69    0.000 All-NBA ( 1.0000000 0.0000000 ) *



Welp, there are a lot of nodes unsurprisingly. I guess the tree is basically being grown until it literally cannot get any better. It may be a good model, but unfortunately it’s likely to overfit (1000+ nodes on a simple possibly linear decision boundary…).

BUT HEY!! THERE’S VORP!!! Although I never anticipated that the model was broken or anything, it warms to my heart to actually see VORP on there. I guess this just means that, within the first 5 terminal nodes (as in the first model), VORP simply doesn’t matter!

Back to the huge ass tree though, it’s common to grow out the full tree, and then prune the tree back. A little more about trees first: It’s really impossible to explore every single combination of tree that exists. There are infinite number of trees that can be fit if we don’t care about accuracy. Even if we care about accuracy, we’d still have to explore every single tree to get the sense of how it fits. Generally, trees are fit recursively for computational efficiency. A first split is made, then the next split is made after the first one is complete. Once we have grown out a tree, let’s say the 1000+ node tree above, we essentially have 1000 sub-trees that we can choose from. One common way to select the best tree is to perform cross validation on each tree to see which one is best. The ‘tree’ library gives us a built in function to do that!

In [10]:
%%R -o bestSize
# Run 10-fold CV on our tree model generated from 'tree'
treeModelMinDevCv = cv.tree(treeModelMinDev)

# Plot CV graph
plot(treeModelMinDevCv)

# Get best size
bestSize <- min(treeModelMinDevCv$size[which(treeModelMinDevCv$dev==min(treeModelMinDevCv$dev))])  In [11]: # See which number of terminal nodes yields the best tree print bestSize  [5]  In [12]: %%R -o treeModelMinDevCvPruned treeModelMinDevCvPruned <- prune.tree(treeModelMinDev, best = bestSize)  In [13]: print treeModelMinDevCvPruned  node), split, n, deviance, yval, (yprob) * denotes terminal node 1) root 13220 4269.0 Not All-NBA ( 0.0379728 0.9620272 ) 2) advancedStats_WS < 7.95 12081 659.5 Not All-NBA ( 0.0042215 0.9957785 ) 4) advancedStats_WS < 5.55 10613 116.5 Not All-NBA ( 0.0006596 0.9993404 ) * 5) advancedStats_WS > 5.55 1468 395.3 Not All-NBA ( 0.0299728 0.9700272 ) * 3) advancedStats_WS > 7.95 1139 1529.0 Not All-NBA ( 0.3959614 0.6040386 ) 6) advancedStats_WS < 10.55 719 714.6 Not All-NBA ( 0.1974965 0.8025035 ) * 7) advancedStats_WS > 10.55 420 485.1 All-NBA ( 0.7357143 0.2642857 ) 14) advancedStats_WS < 12.75 240 323.0 All-NBA ( 0.6000000 0.4000000 ) * 15) advancedStats_WS > 12.75 180 103.3 All-NBA ( 0.9166667 0.0833333 ) *  We end up with the default tree we started with with no VORP (coincidentally). Let’s see how this performs in terms of obtaining a prediction and ROC curve. In [14]: %%R library(ROCR) # Use the tree library to predict probabilities treeModelMinDevCvPrunedPred = predict(treeModelMinDevCvPruned) # Use the ROCR library to build the ROC curve treePredObj = prediction(as.data.frame(treeModelMinDevCvPrunedPred[,1]), ifelse(playerAggDfAllNbaAllStar['accolades_all_nba'] == 'All-NBA', TRUE, FALSE)) # Run performance evaluation for the metric 'total accuracy' treeRocEval = performance(treePredObj, 'sens', 'spec') plot(treeRocEval, colorize = T) text( 0.2, 0.08, labels = paste("AUC = ", round(performance(treePredObj, 'auc')@y.values[[1]], digits = 4), sep= ""), adj = 1 )  The ROC curve here looks kind of like the one yielded by logistic regression. I mean, they probably pretty much will all be the same… but this one is a bit more coarse and it doesn’t look to hug the top as much. The AUC pretty much says that it’s on par with what logistic regression was giving us. Not much difference here! In [15]: # Retrieve the iterative cut-off sensitivity analysis that logistic regression did behind the scenes %R cutoffs = data.frame(cut = treeRocEval@alpha.values[[1]], sens = treeRocEval@x.values[[1]], spec = treeRocEval@y.values[[1]]) # Calculate the metrics sensitivity + specificity. This will help us gauge the accuracy of both classes simultaneously. # E.g. if we were guessing each class 100% correctly (there is a very distinct decision boundary), then we would have 1 + 1 = 2 %R cutoffs['sens_plus_spec'] = cutoffs['sens'] + cutoffs['spec'] # See the last few rows of this dataframe where the sensitivity + specificity are at its max %R tail(cutoffs[order(cutoffs$sens_plus_spec),])

Out[15]:
cut sens spec sens_plus_spec
1 inf 1.000000 0.000000 1.000000
6 0.000660 0.000000 1.000000 1.000000
2 0.916667 0.998821 0.328685 1.327506
3 0.600000 0.991272 0.615538 1.606810
5 0.029973 0.833936 0.986056 1.819992
4 0.197497 0.945903 0.898406 1.844310

Here, we see a pretty good cutoff for All-NBA. We’re sitting at around 95% / 90%, results that logistic regression was showing us as well! Essentially, a few quick splits just on WS gets us the same results! How’s that for interpretability? Amazing.

Let’s just try out another decision tree package. There seems to be a pretty nice visual package under the ‘party’ library. From what I’ve seen of it, it’s got a more intuitive decision tree graph where you can see the node sizes and node purity without printing out the text rendition of the tree. Let’s give it a shot.

In [16]:
%%R -w 1500 -u px
library('party')

# Build tree using the ctree tool within the party package
ctreeModel = ctree(
data = playerAggDfAllNbaAllStar
)

# Plot tree
plot(ctreeModel)


Alright, well this default tree is much different than our other default tree (using default model parameters) generated by the ‘tree’ package. First of all, VORP is the first split! Our other one didn’t even have VORP, period! Not only that, but there are many more nodes as well.

The visuals here are definitely much better, we see nodes clearly labelled, the decision rules clearly labelled, and we immediately get a sense of node purity here.

Other than nodes 15 and 20, where we only have a total of ~200 samples out of 13220 (~1% of all samples), all other terminal nodes are pretty pure! The following 2 scenarios lead to successful all-NBA predictions:

• VORP > 4.4, WS > 10.5
• VORP <= 4.4, WS > 11.7

In fact, that first rule is honestly a lot like the manual box boundary that we chose in post #19! We arbitrarily chose WS > 5 and VORP > 10 then, not too far off from what we’re getting now!

Let’s check the ROC on this real quick.

In [17]:
%%R
# Use the tree library to predict probabilities
cTreeModelPred = predict(ctreeModel, type = 'prob')

# Predict on ctree returns a list of vectors instead of a data frame, we convert to dataframe here
cTreeModelPredDf = do.call(rbind, cTreeModelPred)

# Use the ROCR library to build the ROC curve
cTreePredObj = prediction(as.data.frame(cTreeModelPredDf[,1]), ifelse(playerAggDfAllNbaAllStar['accolades_all_nba'] == 'All-NBA', TRUE, FALSE))

# Run performance evaluation for the metric 'total accuracy'
cTreeRocEval = performance(cTreePredObj, 'sens', 'spec')
plot(cTreeRocEval, colorize = T)
text(
0.2,
0.08,
labels = paste("AUC = ", round(performance(cTreePredObj, 'auc')@y.values[[1]], digits = 4), sep= ""),
)

In [18]:
# Retrieve the iterative cut-off sensitivity analysis that logistic regression did behind the scenes
%R cutoffs = data.frame(cut = cTreeRocEval@alpha.values[[1]], sens = cTreeRocEval@x.values[[1]], spec = cTreeRocEval@y.values[[1]])

# Calculate the metrics sensitivity + specificity. This will help us gauge the accuracy of both classes simultaneously.
#   E.g. if we were guessing each class 100% correctly (there is a very distinct decision boundary), then we would have 1 + 1 = 2
%R cutoffs['sens_plus_spec'] = cutoffs['sens'] + cutoffs['spec']

# See the last few rows of this dataframe where the sensitivity + specificity are at its max
%R tail(cutoffs[order(cutoffs\$sens_plus_spec),])

Out[18]:
cut sens spec sens_plus_spec
12 0.004255 0.741076 0.996016 1.737092
7 0.222672 0.970436 0.818725 1.789161
8 0.103448 0.968391 0.824701 1.793092
11 0.015152 0.833071 0.986056 1.819127
9 0.091935 0.924123 0.938247 1.862370
10 0.063492 0.914845 0.954183 1.869028

Alright, that’s definitely the highest AUC we’ve seen, no doubt. Checking back to what we had with logistic regression, these prediction splits are better as well! Logistic regression gave us ~96.0% / ~89.0%, and here with the ctree library we are seeing ~95.5% / ~91.5%. Absolutely the best we’ve seen thus far!

One thing that I would like to mention at this point is that we could probably keep growing the tree and get better and better prediction results on our data. The one we’ve built seems pretty reasonable in terms of the balance between accuracy and interpretability. It’s only got a handful of terminal nodes, nothing too overwhelming, however we’ve been measuring results for ctree on the same set of data that we’re training on. I’m sure if we kept growing the tree, it would find ways to yield even better results. In the tree package, we had a cross validation option where we pruned the tree back, I wonder if ctree has anything like that.

—10 minutes later…—

So apparently the ctree (party) library documentation states that we don’t even need to prune or CV test trees generated by ctree. As the documentation states:

The implementation utilizes a unified framework for conditional inference, or permutation tests,
developed by Strasser and Weber (1999). The stop criterion in step 1) is either based on multiplicity
adjusted p-values (testtype = “Bonferroni” in ctree_control) or on the univariate
p-values (testtype = “Univariate”). In both cases, the criterion is maximized, i.e., 1 – p-value
is used. A split is implemented when the criterion exceeds the value given by mincriterion as
specified in ctree_control. For example, when mincriterion = 0.95, the p-value must be
smaller than 0.05 in order to split this node. This statistical approach ensures that the right-sized
tree is grown without additional (post-)pruning or cross-validation.

I won’t pretend like I quite know how these trees split by “conditional inference” works, but essentially the split process is based off of a statistical inference approach comparing the features to a null hypothesis rather than directly measuring misclassification error or a gini index / cross entropy etc. I guess that does it for my tree post. Again, these things are amazing!