Clustering is the task of segmenting a heterogeneous population into a num ber of more homogeneous subgroups or clusters. What distinguishes cluster ing from classification is that clustering does not rely on predefined classes. In classification, each record is assigned a predefined class on the basis of a model developed through training on preclassified examples.
CLUSTERING AS A SEGMENTING TECHNIQUE
In clustering, there are no predefined classes and no examples. The records are grouped together on the basis of self-similarity. It is up to the user to deter mine what meaning, if any, to attach to the resulting clusters. Clusters of symptoms might indicate different diseases. Clusters of customer attributes might indicate different market segments. Clustering is often done as a prelude to some other form of data mining or modeling. For example, clustering might be the first step in a market segmen tation effort: Instead of trying to come up with a one-size-fits-all rule for “what kind of promotion do customers respond to best,” first divide the customer base into clusters or people with similar buying habits, and then ask what kind of promotion works best for each cluster.