🪴 Aradinka Digital Garden

Search

Search IconIcon to open search

entropy

Last updated Nov 26, 2022

In statistics, entropy is a measure of information


Let’s assume that a dataset T associated with a node contains examples from n classes. Then, its entropy is:

where $p_j$ is the relative frequency of class $j$ in $T$.

As is the case with the gini-impurity-index, a node is pure when $entropy(T)$ takes its minimum value, zero, and impure when it takes its highest value, 1.

# Example

# Information Gain

The information gain is the difference between a parent node’s entropy and the weighted sum of its child node entropies.

Let’s assume a dataset $T$ with $N$ objects is partitioned into two datasets: $T_1$ and $T_2$ of sizes $N_1$ and $N_2$. Then, the split’s Information Gain ($Gain_{split}$) is:

In general, if splitting $T$ into m subsets $T_1, T_2, \ldots, T_m$ with $N_1, N_2, \ldots, N_m$ objects, respectively, the split’s Information Gain ($Gain_{split}$) is:

# Example Splitting by Information Gain

https://www.baeldung.com/cs/impurity-entropy-gini-index#2-example-splitting-by-information-gain

steps:

# References