### ARCS: Assembly Roundup by Chromium Scaffolding Sarah Yeo, Lauren Coombe, Justin Chu, Rene L Warren, and Inanc Birol BC Cancer Agency, Genome Sciences Centre, Vancouver, BC V5Z 4S6, Canada, ### Summary of the commands we ran for each scaffolding tool A) ARCS was run in 3 stages with the commands: ---------------------------------------------- 1) arcs -f hsapiens.fa -a human-alignments.fof -s 98 -c 5 -m 50-1000 -d 0 -r 0.05 -e 30000 -i 16 -z 3000 -v 1 2) python makeTSVfile.py hsapiens.fa.scaff_s98_c5_l0_d0_e30000_r0.05_original.gv human_c5r0.05e30000.tigpair_checkpoint.tsv hsapiens.fa 3) LINKS -f hsapiens.fa -s empty.fof -k 20 -b human_c5r0.05e30000 -l 5 -t 2 -a 0.3 - human-alignments.fof is file of bam filenames (bam files contain Chromium reads aligned to draft assembly) - the script makeTSVfile.py can be found at: https://github.com/bcgsc/arcs/blob/binomialx2/Examples/makeTSVfile.py - empty.fof is an empty file created with the command: touch empty.fof. - Stage 3 (LINKS) was run multiple times with the following values of -a: 0.3, 0.5, 0.7, 0.9 B) Architect was run in 4 stages with the commands: ----------------------------------------------------- 1) python bam2containment.py -b human-scaffolds_over3kb.sorted.bam -t 5 -c human-scaffold_t5.containment 2) python barcode-to-int.py human-scaffold_t5.containment human-scaffold_t5-intBarcodes.containment 3) python architect.py scaffold --fasta human-scaffolds_over3kb.fa --containment human-scaffold_t5-intBarcodes.containment --out human-scaffolds-architect_t5-abs3-rel0.1-prun0.2 --rc-abs-thr 3 --rc-rel-edge-thr 0.1 --rc-rel-prun-thr 0.2 4) cat human-scaffolds-architect_t5-abs3-rel0.1-prun0.2'.fasta' |perl -ne 'chomp; if(/(>\S+)/){print "$1_architect\n";} else {print "$_\n";} ' > human-scaffolds-architect_t5-abs3-rel0.1-prun0.2-headerRename.fasta cat human-scaffolds-architect_t5-abs3-rel0.1-prun0.2-headerRename.fasta human-scaffolds_under3kb.fasta > hsapiens-scaffolds_t5_abs3_rel0.1_prun0.2.fasta - the script barcode-to-int.py can be found in the scripts folder in assemblies/Architect - the bam2containment.py was altered to accept the barcodes as strings instead of integers to generate the containment file - Stage 2 was used to convert the barcodes from strings to integers in the containment file to make the file compatible with architect.py and Stage 4 was used to add back scaffolds/contigs less than 3kb to make analysis consistent with other tools. - Stage 1 was run multiple times with the following values of -t: 5, 10, 20 - Stage 3 was run multiple times with the following values of --rc-abs-thr: 1, 3, 5, 7; --rc-rel-edge-thr: 0.1, 0.2, 0.3, 0.4; --rc-rel-prun-thr: contigs-0.2 scaffolds-0.1, 0.2, 0.3, 0.4 C) fragScaff was run in 3 stages with the commands: --------------------------------------------------- 1) fragScaff.pl -B human-alignments.bam -b 1 -G H -N hsapiens_Nbase.bed -J hsapiens_repeats.bed -m 3000 -E 30000 2) fragScaff.pl -B human-alignments.E30000.o10000.J.N.bamParse -A -C 5 -t 64 -m 3000 3) fragScaff.pl -B human-alignments.E30000.o10000.J.N.bamParse -K human-alignments.bam.E30000.o10000.J.N.r1.links.txt -F hsapiens.fa -j 1 -u 5 - FragScaff requires @RG tags in the header of the alignment file specifing the read groups (@RG\t). Add these tags to the header of the alignment file using samtools reformat. - To make use of the -G H option in stage 1, change the fragScaff.pl script to detect the barcode of an aligned read based on the same "readname_" format as recognized by ARCS. Change this line in fragScaff.pl: ($null,$group_name) = split(/#/, $P[0]); to: ($null,$group_name) = split(/_/, $P[0]); - The hsapiens_Nbase.bed file was generated with the following command: 1) fasta_make_Nbase_bed.pl hsapiens.fa > hsapiens_Nbase.bed - The hsapiens_repeats.bed file was generated with the following commands: 1) blastn -query hsapiens.fa -word_size 36 -perc_identity 95 -outfmt 6 -db hsapiens.fa -out hsapiens-blast.out -num_threads 64 2) blast_self_alignment_filter.pl hsapiens-blast.out 95 &> out.bed & 3) sortBed -i out.bed &> out.srt.bed & 4) mergeBed -i out.srt.bed &> hsapiens_repeats.bed & - Stage 1 was run multiple times with the following values of -E: 5000, 30000 - Stage 2 was run multiple times with the following values of -C: contigs-5, 10 scaffolds-1, 3, 5 - Stage 3 was run multiple times with the following values of -j: 1, 1.25, 2, 3 and -u: 2, 3, 4, 5 D) Supernova was run in 2 stages with the commands: --------------------------------------------------- 1) supernova run --id=NA12878 --fastqs=/path_to_fastqs/NA12878_fastqs --localcores=64 --localmem=500 2) supernova mkoutput --asmdir=/path_to_supernova_output_files/outs/assembly --outprefix=NA12878 --style=pseudohap