Image Classification

The overall objective of image classification is to automatically categorize all pixels in an image into land cover classes or themes. Normally, multispectral data are used to perform the classification, and the spectral pattern present within the data for each pixel is used as numerical basis for categorization. That is, different feature types manifest different combination of DNs based on their inherent spectral reflectance and emittance properties.

The term classifier refers loosely to a computer program that implements a specific procedure for image classification. Over the years scientists have devised many classification strategies. From these alternatives the analyst must select the classifier that will best accomplish a specific task. At present it is not possible to state that a given classifier is "best" for all situations because characteristics of each image and the circumstances for each study vary so greatly. Therefore, it is essential that the analyst understands the alternative strategies for image classification.

The traditional methods of classification mainly follow two approaches: unsupervised and supervised. The unsupervised approach attempts spectral grouping that may have an unclear meaning from the user's point of view. Having established these, the analyst then tries to associate an information class with each group. The unsupervised approach is often referred to as clustering and results in statistics that are for spectral, statistical clusters. In the supervised approach to classification, the image analyst supervises the pixel categorization process by specifying to the computer algorithm; numerical descriptors of the various land cover types present in the scene. To do this, representative sample sites of known cover types, called training areas or training sites, are used to compile a numerical interpretation key that describes the spectral attributes for each feature type of interest. Each pixel in the data set is then compared numerically to each category in the interpretation key and labeled with the name of the category it looks most like. In the supervised approach the user defines useful information categories and then examines their spectral separability whereas in the unsupervised approach he first determines spectrally separable classes and then defines their informational utility.

It has been found that in areas of complex terrain, the unsupervised approach is preferable to the supervised one. In such conditions if the supervised approach is used, the user will have difficulty in selecting training sites because of the variability of spectral response within each class. Consequently, a prior ground data collection can be very time consuming. Also, the supervised approach is subjective in the sense that the analyst tries to classify information categories, which are often composed of several spectral classes whereas spectrally distinguishable classes will be revealed by the unsupervised approach, and hence ground data collection requirements may be reduced. Additionally, the unsupervised approach has the potential advantage of revealing discriminable classes unknown from previous work. However, when definition of representative training areas is possible and statistical information classes show a close correspondence, the results of supervised classification will be superior to unsupervised classification.

Unsupervised classification

Unsupervised classifiers do not utilize training data as the basis for classification. Rather, this family of classifiers involves algorithms that examine the unknown pixels in an image and aggregate them into a number of classes based on the natural groupings or clusters present in the image values. It performs very well in cases where the values within a given cover type are close together in the measurement space, data in different classes are comparatively well separated.

The classes that result from unsupervised classification are spectral classes because they are based solely on the natural groupings in the image values, the identity of the spectral classes will not be initially known. The analyst must compare the classified data with some form of reference data (such as larger scale imagery or maps) to determine the identity and informational value of the spectral classes. In the supervised approach we define useful information categories and then examine their spectral separability; in the unsupervised approach we determine spectrally separable classes and then define their informational utility.

There are numerous clustering algorithms that can be used to determine the natural spectral groupings present in data set. One common form of clustering, called the "K-means" approach also called as ISODATA (Interaction Self-Organizing Data Analysis Technique) accepts from the analyst the number of clusters to be located in the data. The algorithm then arbitrarily "seeds", or locates, that number of cluster centers in the multidimensional measurement space. Each pixel in the image is then assigned to the cluster whose arbitrary mean vector is closest. After all pixels have been classified in this manner, revised mean vectors for each of the clusters are computed. The revised means are then used as the basis of reclassification of the image data. The procedure continues until there is no significant change in the location of class mean vectors between successive iterations of the algorithm. Once this point is reached, the analyst determines the land cover identity of each spectral class. Because the K-means approach is iterative, it is computationally intensive. Therefore, it is often applied only to image sub-areas rather than to full scenes.

Supervised classification

Supervised classification can be defined normally as the process of samples of known identity to classify pixels of unknown identity. Samples of known identity are those pixels located within training areas. Pixels located within these areas term the training samples used to guide the classification algorithm to assigning specific spectral values to appropriate informational class.

The basic steps involved in a typical supervised classification procedure are illustrated on Fig. 6.

The training stage

Feature selection

Selection of appropriate classification algorithm

Post classification smoothening

Accuracy assessment

IMAGE DATA SET (Five digital numbers per pixel)

Channel: 1

IMAGE DATA SET (Five digital numbers per pixel)

Channel: 1

r

f

f-

4-

=f

-+

Sand

Forest

Urban

Corn

CATEGORIZED SET (Digital numbers replaced by category types)

(2) CLASSIFICATION STAGES

Compare each unknown pixel to spectral patterns; assign to most similar category

(1) TRAINING STAGE Collect numerical data from training areas on speotral response patterns of land cover categories

(2) CLASSIFICATION STAGES

Compare each unknown pixel to spectral patterns; assign to most similar category

CATEGORIZED SET (Digital numbers replaced by category types)

f r.r1

ht,

Um

Safer m

H ï r ' 1 f ï

(3) OUTPUT STAGE Present results; maps tables of areas data GIS data files

(3) OUTPUT STAGE Present results; maps tables of areas data GIS data files

Figure 6: Basic Steps in Supervised Classification

Training data

Training fields are areas of known identity delineated on the digital image, usually by specifying the corner points of a rectangular or polygonal area using line and column numbers within the coordinate system of the digital image. The analyst must, of course, know the correct class for each area. Usually the analyst begins by assembling maps and aerial photographs of the area to be classified. Specific training areas are identified for each informational category following the guidelines outlined below. The objective is to identify a set of pixels that accurately represents spectral variation present within each information region (Fig. 7a).

Select the Appropriate Classification Algorithm

Various supervised classification algorithms may be used to assign an unknown pixel to one of a number of classes. The choice of a particular classifier or decision rule depends on the nature of the input data and the desired output. Parametric classification algorithms assume that the observed measurement vectors Xc for each class in each spectral band during the training phase of the supervised classification are Gaussian in nature; that is, they are normally distributed. Nonparametric classification algorithms make no such assumption. Among the most frequently used classification algorithms are the parallelepiped, minimum distance, and maximum likelihood decision rules.

Parallelepiped Classification Algorithm

This is a widely used decision rule based on simple Boolean "and/or" logic. Training data in n spectral bands are used in performing the classification. Brightness values from each pixel of the multispectral imagery are used to produce an n-dimensional mean vector, Mc = (Mcki, Mc2, Mc3, ... Mcn) with Mck being the mean value of the training data obtained for class c in band k out of m possible classes, as previously defined. Sck is the standard deviation of the training data class c of band k out of m possible classes.

The decision boundaries form an n-dimensional parallelepiped in feature space. If the pixel value lies above the lower threshold and below the high threshold for all n bands evaluated, it is assigned to an unclassified category (Figs. 7c and 7d). Although it is only possible to analyze visually up to three dimensions, as described in the section on computer graphic feature analysis, it is possible to create an n-dimensional parallelepiped for classification purposes.

The parallelepiped algorithm is a computationally efficient method of classifying remote sensor data. Unfortunately, because some parallelepipeds overlap, it is possible that an unknown candidate pixel might satisfy the criteria of more than one class. In such cases it is usually assigned to the first class for which it meets all criteria. A more elegant solution is to take this pixel that can be assigned to more than one class and use a minimum distance to means decision rule to assign it to just one class.

Minimum Distance to Means Classification Algorithm

This decision rule is computationally simple and commonly used. When used properly it can result in classification accuracy comparable to other more computationally intensive algorithms, such as the maximum likelihood algorithm. Like the parallelepiped algorithm, it requires that the user provide the mean vectors for each class in each hand Mck from the training data. To perform a minimum distance classification, a program must calculate the distance to each mean vector, Mck from each unknown pixel (BVijk). It is possible to calculate this distance using Euclidean distance based on the Pythagorean theorem (Fig. 7b).

The computation of the Euclidean distance from point to the mean of Class-1 measured in band relies on the equation

Where ^ck and ^cl represent the mean vectors for class c measured in bands k and l.

Many minimum-distance algorithms let the analyst specify a distance or threshold from the class means beyond which a pixel will not be assigned to a category even though it is nearest to the mean of that category.

Maximum Likelihood Classification Algorithm

The maximum likelihood decision rule assigns each pixel having pattern measurements or features X to the class c whose units are most probable or likely to have given rise to feature vector x. It assumes that the training data statistics for each class in each band are normally distributed, that is, Gaussian. In other words, training data with bi-or trimodal histograms in a single band are not ideal. In such cases, the individual modes probably represent individual classes that should be trained upon individually and labeled as separate classes. This would then produce unimodal, Gaussian training class statistics that would fulfil the normal distribution requirement.

The Bayes's decision rule is identical to the maximum likelihood decision rule that it does not assume that each class has equal probabilities. A priori probabilities have been used successfully as a way of incorporating the effects of relief and other terrain characteristics in improving classification accuracy. The maximum likelihood and Bayes's classification require many more computations per pixel than either the parallelepiped or minimum-distance classification algorithms. They do not always produce superior results.

Classification Accuracy Assessment

Quantitatively assessing classification accuracy requires the collection of some in situ data or a priori knowledge about some parts of the terrain which can then be compared with the remote sensing derived classification map. Thus to assess classification accuracy it is necessary to compare two classification maps 1) the remote sensing derived map, and 2) assumed true map (in fact it may contain some error). The assumed true map may be derived from in situ investigation or quite often from the interpretation of remotely sensed data obtained at a larger scale or higher resolution.

Figure 7a: Pixel observations from selected training sites plotted on scatter diagram

Figure 7b: Minimum Distance to Means Classification strategy

Figure 7b: Minimum Distance to Means Classification strategy

f Ü U----f'ifi

1 v j « 11 S.ïJ

■ V „ u u » :

j u u v u;

Jc ¡"c'a; j

: v u IJ " 1__________...u

« rt"

Figure 7c: Parallelepiped classification strategy

Figure 7d: Stepped parallelepipeds to avoid overlap (source Lillesand and Kiefer, 1993)

Classification Error Matrix

One of the most common means of expressing classification accuracy is the preparation of classification error matrix sometimes called confusion or a contingency table. Error matrices compare on a category by category basis, the relationship between known reference data (ground truth) and the corresponding results of an automated classification. Such matrices are square, with the number of rows and columns equal to the number of categories whose classification accuracy is being assessed. Table 1 is an error matrix that an image analyst has prepared to determine how well a Classification has categorized a representative subset of pixels used in the training process of a supervised classification. This matrix stems from classifying the sampled training set pixels and listing the known cover types used for training (columns) versus the Pixels actually classified into each land cover category by the classifier (rows).

Table 1. Error Matrix resulting from classifying training Set pixels

W

S

F

U

C

H

Row Total

W

480

0

5

0

0

0

485

S

0

52

0

20

0

0

72

F

0

0

313

40

0

0

353

U

0

16

0

126

0

0

142

C

0

0

0

38

342

79

459

H

0

0

38

24

60

359

481

Column Total

480

68

356

248

402

438

1992

Classification data Training set data ( Known cover types)

Producer's Accuracy Users Accuracy

Overall accuracy = (480 + 52 + 313+ 126+ 342 +359)/1992= 84%

W, water; S, sand; F, forest; U, urban; C, corn; H, hay (source Lillesand and Kiefer, 1993).

An error matrix expresses several characteristics about classification performance. For example, one can study the various classification errors of omission (exclusion) and commission (inclusion). Note in Table 1 the training set pixels that are classified into the proper land cover categories are located along the major diagonal of the error matrix (running from upper left to lower right). All non-diagonal elements of the matrix represent errors of omission or commission. Omission errors correspond to non-diagonal column elements (e.g. 16 pixels that should have classified as "sand" were omitted from that category). Commission errors are represented by non-diagonal row elements (e.g. 38 urban pixels plus 79 hay pixels were improperly included in the corn category).

Several other measures for e.g. the overall accuracy of classification can be computed from the error matrix. It is determined by dividing the total number correctly classified pixels (sum of elements along the major diagonal) by the total number of reference pixels. Likewise, the accuracies of individual categories can be calculated by dividing the number of correctly classified pixels in each category by either the total number of pixels in the corresponding rows or column. Producers accuracy which indicates how well the training sets pixels of a given cover type are classified can be determined by dividing the number of correctly classified pixels in each category by number of training sets used for that category (column total). Users accuracy is computed by dividing the number of correctly classified pixels in each category by the total number of pixels that were classified in that category (row total). This figure is a measure of commission error and indicates the probability that a pixel classified into a given category actually represents that category on ground.

Note that the error matrix in the table indicates an overall accuracy of 84%. However producers accuracy ranges from just 51% (urban) to 100% (water) and users accuracy ranges from 72% (sand) to 99% (water). This error matrix is based on training data. If the results are good it indicates that the training samples are spectrally separable and the classification works well in the training areas. This aids in the training set refinement process, but indicates little about classifier performance else where in the scene.

Kappa coefficient

Kappa analysis is a discrete multivariate technique for accuracy assessment. Kappa analysis yields a Khat statistic that is the measure of agreement of accuracy. The Khat statistic is computed as

Where r is the number of rows in the matrix xii is the number of observations in row i and column i, and xi+ and x+i are the marginal totals for the row i and column i respectively and N is the total number of observations.

0 0

Post a comment