More on Choosing #Clusters in General
ReferencesBreckenridge, James N. (2000), “Validating Cluster Analysis: Consistent Replication and Symmetry,”Multivariate Behavioral Research, 35 (2), 261-285.Calinski, R. B. and J.Harabasz(1974), “A Dendrite Method for Cluster Analysis,”Communications in Statistics, 3, 1-27.Krolak-Schwerdt, Sabine and ThomasEckes(1992), “A Graph Theoretic Criterion for Determining the Number of Clusters in a Data Set,”Multivariate Behavioral Research, 27 (4), 541-565.Milligan, Glenn W. and Martha C. Cooper (1985), “An Examination of Procedures for Determining the Number of Clusters in a Data Set,”Psychometrika, 50, 159-179.Steinley, Douglas and Michael J.Brusco(2011), “Choosing the Number of Clusters in K-Means Clustering,”Psychological Methods, 16 (3), 285-297.
Goodman, Leo A. and William H.Kruskal(1954), “Measures of Association for Cross Classification”Journal of the American Statistical Association, 49, 732-764.Measures like correlations (r’s) but for categorical dataHartigan, John A. and M. A. Wong (1979), “A K-Means Clustering Algorithm,”Applied Statistics, 28, 100-108.K-means and the Fortran code (hehehe, how cool & nerdy is that?!)Johnson, Stephen C. (1967), “Hierarchical Clustering Schemes,”Psychometrika, 32 (3), 241-254.“Hierarchy” is defined, single-link & complete-link are introducedLance, G. N. and W. T. Williams (1967), “A General Theory of Classificatory Sorting Strategies, I. Hierarchical Systems,”Computer Journal, 9, 373-380.The equation that subsumes single, complete, average, Ward’s, etc.Milligan, Glenn W. (1979), “UltrametricHierarchical Clustering Algorithms,”Psychometrika, 44 (3), 343-346.ExtendsultrametricdistancesWard, Joe H., Jr. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58 (301, March), 236-244.The Ward of Ward’s method
Aldenderfer, Mark S., and Roger K.Blashfield(1984),Cluster Analysis,Newbury Park, CA: Sage.Great succinct introHartigan, John (1975),Clustering algorithms,NY: Wiley.Has thefortrancode for a bunch of algorithmsSneath, PeterH. A. and Robert R.Sokal(1973),Principles of Numerical Taxonomy,San Francisco: Freeman.Solid, examples are from a diff field (bio) but refreshing at the same timeCluster analysis also appears as a chapter in most multivariate stats books, such as:Seber, G.A.F. (1984),Multivariate Observations,NY: Wiley, Ch.7, pp.347-394.
Arabie, Phipps, J. Douglas Carroll, WayneDeSarbo, and Jerry Wind (1981), “Overlapping Clustering: A New Method for Product Positioning,”Journal of Marketing Research18 (Aug.), 310-317.Cool model for non-hierarchical clusteringPunj,Girish, and David W. Stewart (1983), “Cluster Analysis in Marketing Research: Review and Suggestions for Application,”Journal of Marketing Research20 (May), 134-148.Illustrates a wide variety of applications of clustering
Recommendation Engines & Clustering
Iacobucci, Dawn, PhippsArabieandAnandBodapati(2000), “Recommendation Agents on the Internet,”Journal of Interactive Marketing, 14 (3), 2-11.Bodapati,AnandV. (2008), “Recommendation Systems with Purchase Data,”Journal of Marketing Research, 45 (Feb.), 77-93.
Other Clustering Applications
Parkman, Margaret A. and Jack Sawyer (1967), “Dimensions of Ethnic Intermarriage in Hawaii,”American Sociological Review, 32 (4), 593-607.
McCutcheon, Allan L. (1987),Latent Class Analysis, Newbury Park, CA: Sage.Smithson, Michael and JayVerkuilen(2006),Fuzzy Set Theory: Applications in the Social Sciences, Thousand Oaks, CA: Sage.