Classification consists of examining the features of a newly presented object and assigning it to one of a predefined set of classes. The objects to be classified are generally represented by records in a database table or a file, and the act of classification consists of adding a new column with a class code of some kind. The classification task is characterized by a well-defined definition of the classes, and a training set consisting of preclassified examples. The task is to build a model of some kind that can be applied to unclassified data in order to classify it.
Examples of classification tasks that have been addressed using the tech niques described in this book include: Classifying credit applicants as low, medium, or high risk Choosing content to be displayed on a Web page Determining which phone numbers correspond to fax machines Spotting fraudulent insurance claims Assigning industry codes and job designations on the basis of free-text job descriptions In all of these examples, there are a limited number of classes, and we expect to be able to assign any record into one or another of them. Decision trees and nearest neighbor techniques are techniques well suited to classification. Neural networks and link analysis are also useful for clas sification in certain circumstances.