README We offer four types of data files for the H3K4me1/me3 study: a) compressed (gzipped) WIG (http://genome.ucsc.edu/google/goldenPath/help/wiggle.html) format files that can be loaded directly into the UCSC genome browser to show coverage profiles. b) compact, tabular, tab-delimited 'peaks' files that support high throughput calculations and can readily be transformed into UCSC-compatible BED-format files (http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED). c) SEQ and PRB files generated by the base calling section of the Illumina data processing pipeline. For example, the HeLa 'H3K4me1 untreated' folder contains: H3K4me1_unstim_hg18_xset200_dupsN_ht7.sub.peaks 23-Jul-2008 18:06 7.9M H3K4me1_unstim_hg18_xset200_dupsN_ht7.sub.wig.gz 23-Jul-2008 18:06 35M H3K4me1_unstimulated_1_prb.txt.gz 24-Jul-2008 11:14 1.1G H3K4me1_unstimulated_1_seq.txt.gz 24-Jul-2008 11:11 123M H3K4me1_unstimulated_2_prb.txt.gz 01-Apr-2008 17:41 1G H3K4me1_unstimulated_2_seq.txt.gz 24-Jul-2008 11:11 125M H3K4me1_unstimulated_3_prb.txt.gz 24-Jul-2008 11:13 1.4G H3K4me1_unstimulated_3_seq.txt.gz 24-Jul-2008 11:10 173M H3K4me1_unstimulated_4_prb.txt.gz 24-Jul-2008 11:14 1.4G H3K4me1_unstimulated_4_seq.txt.gz 24-Jul-2008 11:11 175M For WIG and 'peaks' files, filenames indicate the assembly against which reads were aligned (hg18 or mm8) and the XSET extension length (200 bp) used by FindPeaks 2.0 to generate the coverage profile. Both types of files contain information only for enriched regions that have a maximum XSET overlap value that is at least as significant as the FDR ~0.01 threshold that we estimated independently for each Eland read set (for the above dataset, the threshold height was 7). While the thresholded WIG files display no information below the FDR threshold, they are compact enough that the whole genome dataset can be loaded as one file, which facilitates browsing genomic locations on different chromosomes. The '*.peaks' files use one-based start coordinates, rather than being zero-based, half-open (http://genome.ucsc.edu/google/FAQ/FAQtracks#tracks1). The files contain the following six fields: For SEQ and PRB files, the number in the filename is an ID for a flow cell lane. In the above example, H3K4me1_unstimulated, there were 4 lanes of data. All data are single-end reads that were generated on Illumina 1G sequencers.