Decision Tree Explorer - UCI Adult Income Dataset

Understanding Split Criteria

Decision trees use impurity measures to determine the best feature and threshold for splitting data. The goal is to create child nodes that are as "pure" as possible (containing mostly one class).

Gini Impurity

Measures the probability of incorrectly classifying a randomly chosen element if it were randomly labeled according to the distribution of labels in the node.

$$\text{Gini}(D) = 1 - \sum_{i=1}^{C} p_i^2$$

Where $p_i$ is the proportion of samples belonging to class $i$ in dataset $D$, and $C$ is the number of classes.

Example: For a node with 70% class A and 30% class B:
$$\text{Gini} = 1 - (0.7^2 + 0.3^2) = 1 - (0.49 + 0.09) = 0.42$$

Range: 0 (pure) to 0.5 (for binary classification, maximum impurity)
Computationally efficient (no logarithms)
Default in scikit-learn's DecisionTreeClassifier

Entropy (Information Gain)

Based on information theory, entropy measures the average amount of information needed to identify the class of an element. Information gain is the reduction in entropy after a split.

$$\text{Entropy}(D) = -\sum_{i=1}^{C} p_i \log_2(p_i)$$

The information gain from splitting on feature $A$ is:

$$\text{Gain}(D, A) = \text{Entropy}(D) - \sum_{v \in \text{values}(A)} \frac{|D_v|}{|D|} \text{Entropy}(D_v)$$

Example: For a node with 70% class A and 30% class B:
$$\text{Entropy} = -(0.7 \log_2 0.7 + 0.3 \log_2 0.3) \approx 0.88 \text{ bits}$$

Range: 0 (pure) to $\log_2(C)$ (for $C$ classes with equal distribution)
Has roots in information theory (Shannon entropy)
Used in the classic ID3 and C4.5 algorithms

Choosing a Criterion

In practice, Gini and Entropy often produce similar trees. Key considerations:

Speed: Gini is slightly faster (no log computation)
Tendency: Entropy may create slightly more balanced trees
Multi-class: Both work well, but entropy's range scales with the number of classes

The weighted impurity decrease for a split is calculated as:

$$\Delta \text{Impurity} = \text{Impurity}(D) - \frac{n_L}{n} \text{Impurity}(D_L) - \frac{n_R}{n} \text{Impurity}(D_R)$$

Where $D_L$ and $D_R$ are the left and right child datasets, and $n_L$, $n_R$, $n$ are their respective sample counts.

Decision Tree Interactive Demo

Parameters

Features

TREE STRUCTURE

Feature Importance

BUILD YOUR TREE

YOUR TREE STRUCTURE

Feature Distribution (select a feature in your tree)

Understanding Split Criteria

Gini Impurity

Entropy (Information Gain)

Choosing a Criterion