Analysing Data

Different molecular markers will generate results in different forms. Simple RFLPs and some fingerprinting methods will produce generally simple band patterns, usually of the order of between one and 20 bands. These patterns can be translated into a simple binary form where each band obtained in the analysis is considered as an independent character, and is scored as present or absent. In most cases, these binary records have then been compared by the use of one or more distance or association calculation, and represented as some form of dendrogram. There is a wide range of coefficients available for such comparisons. These include coefficients that do not consider matching negative characters, and others that provide for a double weighting of common bands to reflect the presence of common restriction sites or primer sequences at each end of the bands [e.g., Nei and Li (1979)].

It should be remembered that some of these coefficients have been described independently on more than one occasion, and others can be related to each other by simple transformations. For example Nei and Li's genetic distance is equivalent to 1-Sorensen's coefficient, and Sorensen's coefficient is mathematically identical to Dice's coefficient [see Bridge and Saddler (1998); Sneath and Sokal (1973)]. Similarly, association coefficients can be related to distance measures, and taxonomic distance can be defined as the square root of 1 minus the simple matching coefficient. It is therefore important if more than one measure is used to ensure that those selected are independent.

Although cluster analysis methods are a common way of showing relationships within and between fungal populations, this methodology does however have some limitations. One obvious limitation with any tree diagram is that all the isolates must be linked, as there is no provision for an isolate that is not related. A second limitation is that cluster analysis is a good technique for showing the membership of a group, it is less precise in showing relationships between groups. This is a particular failing of average linkage based systems, but is also true of most other clustering approaches [see Abbott et al. (1985)]. A further limitation to cluster analysis is the tendency of isolates that are unrelated to the main population to form a separate cluster together, even though they may be only distantly related to each other. Such clusters are sometimes described as sharing only the single property that they are not related to the main population. One way in which some of these limitations can be overcome is by using an ordination-based method such as principle component analysis (PCA). In these methods correlated variance between characters is combined to produce a further set of axes that are essentially made up of additive components of correlated individual characters. Each axis represents a proportion of the total variance in the data, and the placement of isolates is by plotting their positions in relation to the first 2 or 3 axes [see Alderson 1985; Dudzinski 1975). A refinement of PCA is principle coordinate analysis (PCO). PCO has been shown to be appropriate for binary data, such as obtained from band patterns, and unlike PCA, does not require the use of strictly metric coefficients (Gower 1966; Sneath and Sokal 1973). Unlike a cluster analysis, this does not produce a series of groups, but a scatter-plot where similar isolates may be placed near each other. These ordination methods can behave differently from cluster analysis, and typically are better at representing between group relationships than close within group relationships. One further aspect of PCA is that under some circumstances it may filter random variation from a complex data set, as any correlated variation will tend to be included in the first few axes [see Bridge (1998)].

Band analysis methods are essentially the same for simple and complex patterns. It however becomes necessary to consider band reading software for the very complex patterns that may be produced by techniques such as AFLP, as the large number of bands produced cannot be easily scored by eye. There are a range of band reading and matching software packages available, and most of these convert band patterns in a gel into densitometric traces where peak presence, height, and shape correspond to band presence, intensity, and thickness, respectively. These packages commonly have manual and automated routines for correcting gels for shift and stretch events, and routines for including standard size markers and other reference bands. Trace data can be readily converted to quantitative values as "x,y co-ordinates" and this is suitable for largely distance based analyses. Quantitative data can also be compared for overall pattern similarity through methods such as correlation coefficients, and concentration independent calculations such as cosine theta [see Feltham and Sneath (1979)].

Analysis of sequence data is more complicated as the likelihood of certain events may also be included in the analysis. The first stage in comparing DNA sequences is to align them to each other. Alignment routines will always seek the best alignment of the sequences being studied, and so the addition or deletion of sequences to a data set will require a new alignment to be made. In determining an alignment, and calculating a measure of difference between the sequences involved it is also necessary to consider the effects of transversions, transitions, and gaps. The bases in DNA strands pair through purine/pyrimidine bonds, and so when aligning sequences, a change from purine to purine or pyrimidine to pyrimidine (transition) may be considered of less importance than a change from purine to pyrimidine (transversion). The importance given to transitions and transversions can therefore be varied to reflect their relative importance, and this may also depend on the particular sequences being considered. When aligning sequences it may be necessary to insert a gap in some sequences to align where insertion/ deletion events have occurred. Again, the relative importance of inserting a gap, and also of extending that gap can be adjusted according to the perceived significance of the event in the sequences under consideration [see Thompson et al. (1994)]. It is common with sequence analysis to use phylogenetic techniques to produce trees [see Swofford and Olsen (1990)]. While these approaches are suitable when considering different species and genera, they are less appropriate for comparisons of closely related populations. Analysis of DNA sequence data is an area that is currently receiving further attention, and some of these developments are described in more detail in chapter 33.

Was this article helpful?

0 0
Growing Soilless

Growing Soilless

This is an easy-to-follow, step-by-step guide to growing organic, healthy vegetable, herbs and house plants without soil. Clearly illustrated with black and white line drawings, the book covers every aspect of home hydroponic gardening.

Get My Free Ebook

Post a comment