Alignments to the human genome (NCBI 36 - Ensembl 42) were run using Illumina's (Solexa) alignment program called Eland The output of the resulting alignment files is as follows: 1. Sequence name (derived from file name and line number if format is not Fasta) 2. Sequence 3. Type of match: NM - no match found. QC - no matching done: QC failure (too many Ns basically). RM - no matching done: repeat masked (may be seen if repeatFile.txt was specified). U0 - Best match found was a unique exact match. U1 - Best match found was a unique 1-error match. U2 - Best match found was a unique 2-error match. R0 - Multiple exact matches found. R1 - Multiple 1-error matches found, no exact matches. R2 - Multiple 2-error matches found, no exact or 1-error matches. 4. Number of exact matches found. 5. Number of 1-error matches found. 6. Number of 2-error matches found. Rest of fields are only seen if a unique best match was found (i.e. the match code in field 3 begins with "U"). 7. Genome file in which match was found. 8. Position of match (bases in file are numbered starting at 1). 9. Direction of match (F=forward strand, R=reverse). 10. How N characters in read were interpreted: ("."=not applicable, "D"=deletion, "I"=insertion). Rest of fields are only seen in the case of a unique inexact match (i.e. the match code was U1 or U2). 11. Position and type of first substitution error (e.g. 12A: base 12 was A, not whatever is was in read). 12. Position and type of first substitution error, as above.