Computing the Entropy (H-Function)
Assume we have m classes in our clustering problem, andvectors (p1,…,pm)represent the proportions of the examples with respect to the m classes in a cluster. A clustering algorithm subdivides the examples D= (p1,…,pm) into k subsets D1=(p11,…,p1m) ,…,Dk=(pk1,…,pkm). H is defined as follows:H(D=(p1,…,pm))=Si=1(pilog2(1/pi)) (called theentropy function)Remarks:|D| denotes the number of elements in set D.When computing H, we assume ifpi=0pilog2(1/pi) returns 0.For example, H((1/2,1/4.1/8,1/8,0))= ½*log2(2)+ ¼*log2(4)+2*1/8log2(8)+0=1/2+1/2+2*1/8*3=1.75D=(p1,…,pm) implies that p1+…+ pm=1 and indicates that of the |D| examples p1*|D| examples belong to the first class, p2*|D| examples belong to the second class,…, and pm*|D| belong the m-th(last) class.H(0,1)=H(1,0)=0; H(1/2,1/2)=1, H(1/4,1/4,1/4,1/4)=2, H(1/p,…,1/p)=log2(p).
Entropy:Entropy has often been loosely associated with the amount oforderordisorder, or ofchaos,inathermodynamicsystem. (https://en.wikipedia.org/wiki/Thermodynamic_system)
Entropy of a Clustering X
Assume we have m classes in our clustering problem; for each cluster Ciwe have proportions pi=(pi1,…,pim) of examples belonging to the m different classes (for cluster numbersi=1,..,k); theentropyof a clustering X is the size-weighted sum of the entropies on the individual clusters:H(X)=r=1(|Cr|/|p=1|Cp|)*H(pr)Remarks:Inthe above formulas ”|…|” represents the setcardinality function; e.g. |{2,3,5}|=3 and ||=0!Moreover, we assume that X={C1,…,Ck} is a clustering with k clusters C1,…,Ck;In R, cluster 0 is assumed to contain outliers; it is ignored when computing H(X)
k
Example: Entropy of a Clustering X
x, o, and * represent examples belonging to 3 classes C1, C2, and C3.
Cluster 2
Cluster 1
Cluster 0 (really outliers!)
Cluster 3
X
H(X)=3/10*H((1/3,1/3,1/3))+ 4/10*H((1/2,1/2,0))+ 3/10*H((0,1,0))=3/10*log2(3) + 4/10*1 + 3/10*0=0.8754888
H(X)=r=1(|Cr|/|p=1|Cp|)*H(pr)
Example: Variance of a Clustering X
x, o, and * represent examples belonging to 3 classes C1, C2, and C3.
Cluster 2
Cluster 1
Cluster 0 (really outliers!)
Cluster 3
X
Var(X)=3/8*Var((3,3,3)+1/8*Var((4))+4/8Var((4,5,6,7))0+0+0.5*1.667=2/3=0.667
Var(X)=r=1(|Cr|/|p=1|Cp|)*Var(Cr)
1. RandomizedHill Climbing
Neighborhood
Randomized HillClimbing: Sample p points randomly in the neighborhood of the currentlybest solution; determine the best solution of the n sampled points. If it is better than thecurrentsolution, make it the new current solution and continue the search; otherwise,terminatereturning the current solution.Advantages: easy to apply, does not need many resources, usually fast.Problems: How do I define my neighborhood; what parameter p should I choose?
Eicket al., ParCo11, Ghent
Maximizef(x,y,z)=|x-y-0.2|*|x*z-0.8|*|0.3-z*z*y|withx,y,z in [0,1]Neighborhood Design: Create solutions 50 solutions s, such that:s= (min(1, max(0,x+r1)), min(1, max(0,y+r2)), min(1, max(0, z+r3))with r1, r2, r3 being random numbers in [-0.05,+0.05].
Example Randomized Hill Climbing
0
Embed
Upload