|
Molecular data - Quality checks
The analysis of molecular marker data should always begin by checking the quality
of the data.
Errors
Molecular marker data will always contain (some) errors. In general, it will be difficult to say whether data are correct or not without
going back to the source of the data, e.g. an electrophoretic gel. In linkage studies, 'unlikely' recombination events may be an
indication of errors in the data. However, to determine whether a recombination event is 'unlikely' or not, a number of assumptions about the
statistical behaviour of marker data are required.
Missing data
Missing data are present in nearly all sets of molecular marker data. A marker with many missing data
have been difficult to score (e.g. due to very faint bands on a gel). Sometimes, individuals have many missing data. This may be due to
deteriorated DNA. Graphical tools can be used to reveal patterns in the missing data. Since, they carry little information, markers
and individuals with many missing data can be excluded from further analysis.
Allele frequencies
It is important to consider the frequencies of the alleles of markers in the population under study.
Extremely small or large allele frequencies may be handy to distinguish one or a few individuals from the large majority. Markers with intermediate
allele frequencies can be used to divide a population into a few groups of reasonable size. The latter may be more sensible for association studies.
Consistency with pedigree information
One way to check to consistency of the marker data of an individual is to compare its marker data with
those of its parents or other relatives. Inconsistencies may either point at an error in the marker data or an error in the pedigree.
|