Inequality Measures
Introduction
Is important understand the uses of inequality Measures to stuy the widespread of a disease over a territory, the public
Gini index
Entropy
Idea
Is a concept shanon offert a perspective… comes from information theory…
\(p_{i}\) the probability of occurnce of the \(i\) event, and \(h(p_{i})\) is the value of of get the information that \(i\) has been occur before than other one known, not have value knonw with anticipation that occur a likyely event that everyone wait.
if we have N events or in a
The entropy reach the maximun value with all the envets have the same probability.
# Bag model
We can think to \(p_{i}\) as the probability of drawm from a bag with ‘pesos’ one ‘peso’ what is the probability of the ‘peso’ belong to the \(i\) person or unity.
the theil index will be the difference between the maximun theil possible and the real.
if we assmume that \(h()\) is \(-ln()\) we can take adwhile of that some properties remember that a decresing function is good, and some logarithmic properties…
applying to the equations we have:
changing the value function we have:
applying logartimic properties we have:
therefore:
#Restricted to \((0-1)\) theil index
Another point of view in SEN:
remeber that \(\sum_{i}^{n} x_{i} = 1\). \(h(x) = ln(1/x) \) while \(x\) be more improbably more value have \(x = \sum x_{i} log( \frac{1}{x_{i}} )\) therefore with \(x_{i} = \frac{1}{n}\) then \(H(x) = log(n)\), remeber that we can write \(log(n) = \sum x_{i} log(n)\), thus
We can acotate Theil index:
if \(H(x)\) is evenely distributed then \(H(x) = log(n)\) if one person concentrate all then \(H(x) = 0\). then wen a person concentrate all income \(1*log(1/1) =0\).
therefore our index will be \(1\) to perfect inequality and zero by uniform distribution.
code:
def theil(array):
Y = sum(array)
N = len(array)
relative = [y/Y for y in array]
Theil = sum([y * math.log(y * N) for y in relative])/math.log(N)
return Theil
Theil in pandas Theil pandas
# Discrete gini
Now we can see that
the gini could be calculated with:
code:
def gini(array):
def cumsum(array):
output = [0]
counter = 0
total = sum(array)
for e in array:
counter = counter + e
output.append(counter/total)
return output
array.sort()
n = [i/len(array) for i in range(0,len(array)+1)]
acum_array = cumsum(array)
area = 0
for i in range(1,len(acum_array)):
area = area + (acum_array[i] + acum_array[i-1]) * (n[i] - n[i-1])
return 1 - area
codemaster
Theil index
Theil index was defined by the economist Theil, and is based in information theory….
This lectuer is very important to constrait the gini index to a value.
The Theil index will be defined as:
the equations are important \(h(X)=\sum_{i=1}^{N}x\)
This is a excelent renderization, however never is enough..
Hamming distance
Hamming is a important distance
Sometimes we need establish..
K-means
The ideas of similarity also been linked to distance, in a simple way, the way in which measure the distance is a fucking problem, given there are a lot of fucking distances…
Euclidean
Manhattan
Another distance
We can test that there are a lot of distances.