Reference:

Montgomery S.B., Griffith O.L., Schuetz J.M., Brooks-Wilson, A., Jones S.J.M. "A Computational Discrimination Strategy For Regulatory Polymorphisms In The Upstream Non-Coding Regions of Homo Sapiens" (Submitted).

Supplemental Information:

Download Everything:

Download all data and scripts (warning 140Mb; 132Mb of data/8Mb of scripts)

Analysis pipeline:

There are 17 steps in the pipeline for generating and analyzing the data in this study. To run the pipeline, you will need the CHuM modules and scripts installed.

  1. rSNP to Transcript Mapping: Using the ORegAnno rSNPs, get the ENST ids from EnsEMBL

  2. rSNP to dbSNP Mapping: Using the ORegAnno rSNPs, get the dbSNP ids

  3. rSNP to GFF: Turn the ORegAnno rSNPs into GFF data files

  4. rSNP groups to process: Build a group file for the ufSNPs and rSNPs to process (grouped by ENST id)

  5. Groups to GFF: Make GFF file from Group file

  6. Generate stacks: Get orthologues from EnsEMBL using BLASTZ_NET

  7. Reciprocal blast stacks: Ensure orthologues are reciprocal best blasts

  8. Extended orthologue building (optional): Use THOR package to find orthologous sequences from incomplete genomes (trace archives)

  9. Make RSNPA XML file: Generate XML processing file from Sequences and IDs

  10. Run all property analyses for each SNP

  11. Summarize results: To verify what values were missed in the previous step

  12. Build a table of all the results

  13. SVM work: Run SVM

  14. SVM scoring: Run crossfold analysis on SVM

  15. Visualization: Generate plots of SNPs in transcripts

  16. Range testing: Generate group files based on specific ranges (such as the 152bp range used in the study)

  17. Population testing: Supplementary analyses using HapMap population

  18. Download the pipeline scripts

Perl Modules:

Data:

Statistics:

(c) Stephen Montgomery, Creative Commons Attribution-NonCommercial, 2007