constants module

module responsible for small utility functions and constants used throughout the structural_variant package

mavis.constants.CALL_METHOD = MavisNamespace(CONTIG='contig', FLANK='flanking reads', INPUT='input', SPAN='spanning reads', SPLIT='split reads', _defns={}, _types={'CONTIG': <class 'str'>, 'SPLIT': <class 'str'>, 'FLANK': <class 'str'>, 'SPAN': <class 'str'>, 'INPUT': <class 'str'>})

MavisNamespace – holds controlled vocabulary for allowed call methods

mavis.constants.CIGAR = MavisNamespace(D=2, EQ=7, H=5, I=1, M=0, N=3, P=6, S=4, X=8, _defns={}, _types={'M': <class 'int'>, 'I': <class 'int'>, 'D': <class 'int'>, 'N': <class 'int'>, 'S': <class 'int'>, 'H': <class 'int'>, 'P': <class 'int'>, 'X': <class 'int'>, 'EQ': <class 'int'>})

MavisNamespace – Enum-like. For readable cigar values

  • M: alignment match (can be a sequence match or mismatch)
  • I: insertion to the reference
  • D: deletion from the reference
  • N: skipped region from the reference
  • S: soft clipping (clipped sequences present in SEQ)
  • H: hard clipping (clipped sequences NOT present in SEQ)
  • P: padding (silent deletion from padded reference)
  • EQ: sequence match (=)
  • X: sequence mismatch

note: descriptions are taken from the samfile documentation

mavis.constants.CODON_SIZE = 3

int – the number of bases making up a codon

mavis.constants.COLUMNS = MavisNamespace(_defns={}, _types={'tracking_id': <class 'str'>, 'library': <class 'str'>, 'cluster_id': <class 'str'>, 'cluster_size': <class 'str'>, 'validation_id': <class 'str'>, 'annotation_id': <class 'str'>, 'product_id': <class 'str'>, 'event_type': <class 'str'>, 'pairing': <class 'str'>, 'inferred_pairing': <class 'str'>, 'gene1': <class 'str'>, 'gene1_direction': <class 'str'>, 'gene2': <class 'str'>, 'gene2_direction': <class 'str'>, 'gene1_aliases': <class 'str'>, 'gene2_aliases': <class 'str'>, 'gene_product_type': <class 'str'>, 'transcript1': <class 'str'>, 'transcript2': <class 'str'>, 'fusion_splicing_pattern': <class 'str'>, 'fusion_cdna_coding_start': <class 'str'>, 'fusion_cdna_coding_end': <class 'str'>, 'fusion_mapped_domains': <class 'str'>, 'fusion_sequence_fasta_id': <class 'str'>, 'fusion_sequence_fasta_file': <class 'str'>, 'annotation_figure': <class 'str'>, 'annotation_figure_legend': <class 'str'>, 'genes_encompassed': <class 'str'>, 'genes_overlapping_break1': <class 'str'>, 'genes_overlapping_break2': <class 'str'>, 'genes_proximal_to_break1': <class 'str'>, 'genes_proximal_to_break2': <class 'str'>, 'break1_chromosome': <class 'str'>, 'break1_position_start': <class 'str'>, 'break1_position_end': <class 'str'>, 'break1_orientation': <class 'str'>, 'exon_last_5prime': <class 'str'>, 'exon_first_3prime': <class 'str'>, 'break1_strand': <class 'str'>, 'break1_seq': <class 'str'>, 'break2_chromosome': <class 'str'>, 'break2_position_start': <class 'str'>, 'break2_position_end': <class 'str'>, 'break2_orientation': <class 'str'>, 'break2_strand': <class 'str'>, 'break2_seq': <class 'str'>, 'opposing_strands': <class 'str'>, 'stranded': <class 'str'>, 'protocol': <class 'str'>, 'disease_status': <class 'str'>, 'tools': <class 'str'>, 'call_method': <class 'str'>, 'break1_ewindow': <class 'str'>, 'break1_ewindow_count': <class 'str'>, 'break1_ewindow_practical_coverage': <class 'str'>, 'break1_homologous_seq': <class 'str'>, 'break1_split_read_names': <class 'str'>, 'break1_split_reads': <class 'str'>, 'break1_split_reads_forced': <class 'str'>, 'break2_ewindow': <class 'str'>, 'break2_ewindow_count': <class 'str'>, 'break2_ewindow_practical_coverage': <class 'str'>, 'break2_homologous_seq': <class 'str'>, 'break2_split_read_names': <class 'str'>, 'break2_split_reads': <class 'str'>, 'break2_split_reads_forced': <class 'str'>, 'contig_alignment_query_consumption': <class 'str'>, 'contig_alignment_score': <class 'str'>, 'contig_alignment_query_name': <class 'str'>, 'contig_read_depth': <class 'str'>, 'contig_break1_read_depth': <class 'str'>, 'contig_break2_read_depth': <class 'str'>, 'contig_blat_rank': <class 'str'>, 'contig_build_score': <class 'str'>, 'contig_remap_score': <class 'str'>, 'contig_remap_coverage': <class 'str'>, 'contig_remapped_read_names': <class 'str'>, 'contig_remapped_reads': <class 'str'>, 'contig_seq': <class 'str'>, 'contig_strand_specific': <class 'str'>, 'contigs_aligned': <class 'str'>, 'contigs_assembled': <class 'str'>, 'spanning_reads': <class 'str'>, 'spanning_read_names': <class 'str'>, 'flanking_median_fragment_size': <class 'str'>, 'flanking_pairs': <class 'str'>, 'flanking_pairs_compatible': <class 'str'>, 'flanking_pairs_read_names': <class 'str'>, 'flanking_pairs_compatible_read_names': <class 'str'>, 'flanking_stdev_fragment_size': <class 'str'>, 'linking_split_read_names': <class 'str'>, 'linking_split_reads': <class 'str'>, 'raw_break1_half_mapped_reads': <class 'str'>, 'raw_break1_split_reads': <class 'str'>, 'raw_break2_half_mapped_reads': <class 'str'>, 'raw_break2_split_reads': <class 'str'>, 'raw_flanking_pairs': <class 'str'>, 'raw_spanning_reads': <class 'str'>, 'untemplated_seq': <class 'str'>, 'filter_comment': <class 'str'>, 'cdna_synon': <class 'str'>, 'protein_synon': <class 'str'>}, annotation_figure='annotation_figure', annotation_figure_legend='annotation_figure_legend', annotation_id='annotation_id', break1_chromosome='break1_chromosome', break1_ewindow='break1_ewindow', break1_ewindow_count='break1_ewindow_count', break1_ewindow_practical_coverage='break1_ewindow_practical_coverage', break1_homologous_seq='break1_homologous_seq', break1_orientation='break1_orientation', break1_position_end='break1_position_end', break1_position_start='break1_position_start', break1_seq='break1_seq', break1_split_read_names='break1_split_read_names', break1_split_reads='break1_split_reads', break1_split_reads_forced='break1_split_reads_forced', break1_strand='break1_strand', break2_chromosome='break2_chromosome', break2_ewindow='break2_ewindow', break2_ewindow_count='break2_ewindow_count', break2_ewindow_practical_coverage='break2_ewindow_practical_coverage', break2_homologous_seq='break2_homologous_seq', break2_orientation='break2_orientation', break2_position_end='break2_position_end', break2_position_start='break2_position_start', break2_seq='break2_seq', break2_split_read_names='break2_split_read_names', break2_split_reads='break2_split_reads', break2_split_reads_forced='break2_split_reads_forced', break2_strand='break2_strand', call_method='call_method', cdna_synon='cdna_synon', cluster_id='cluster_id', cluster_size='cluster_size', contig_alignment_query_consumption='contig_alignment_query_consumption', contig_alignment_query_name='contig_alignment_query_name', contig_alignment_score='contig_alignment_score', contig_blat_rank='contig_blat_rank', contig_break1_read_depth='contig_break1_read_depth', contig_break2_read_depth='contig_break2_read_depth', contig_build_score='contig_build_score', contig_read_depth='contig_read_depth', contig_remap_coverage='contig_remap_coverage', contig_remap_score='contig_remap_score', contig_remapped_read_names='contig_remapped_read_names', contig_remapped_reads='contig_remapped_reads', contig_seq='contig_seq', contig_strand_specific='contig_strand_specific', contigs_aligned='contigs_aligned', contigs_assembled='contigs_assembled', disease_status='disease_status', event_type='event_type', exon_first_3prime='exon_first_3prime', exon_last_5prime='exon_last_5prime', filter_comment='filter_comment', flanking_median_fragment_size='flanking_median_fragment_size', flanking_pairs='flanking_pairs', flanking_pairs_compatible='flanking_pairs_compatible', flanking_pairs_compatible_read_names='flanking_pairs_compatible_read_names', flanking_pairs_read_names='flanking_pairs_read_names', flanking_stdev_fragment_size='flanking_stdev_fragment_size', fusion_cdna_coding_end='fusion_cdna_coding_end', fusion_cdna_coding_start='fusion_cdna_coding_start', fusion_mapped_domains='fusion_mapped_domains', fusion_sequence_fasta_file='fusion_sequence_fasta_file', fusion_sequence_fasta_id='fusion_sequence_fasta_id', fusion_splicing_pattern='fusion_splicing_pattern', gene1='gene1', gene1_aliases='gene1_aliases', gene1_direction='gene1_direction', gene2='gene2', gene2_aliases='gene2_aliases', gene2_direction='gene2_direction', gene_product_type='gene_product_type', genes_encompassed='genes_encompassed', genes_overlapping_break1='genes_overlapping_break1', genes_overlapping_break2='genes_overlapping_break2', genes_proximal_to_break1='genes_proximal_to_break1', genes_proximal_to_break2='genes_proximal_to_break2', inferred_pairing='inferred_pairing', library='library', linking_split_read_names='linking_split_read_names', linking_split_reads='linking_split_reads', opposing_strands='opposing_strands', pairing='pairing', product_id='product_id', protein_synon='protein_synon', protocol='protocol', raw_break1_half_mapped_reads='raw_break1_half_mapped_reads', raw_break1_split_reads='raw_break1_split_reads', raw_break2_half_mapped_reads='raw_break2_half_mapped_reads', raw_break2_split_reads='raw_break2_split_reads', raw_flanking_pairs='raw_flanking_pairs', raw_spanning_reads='raw_spanning_reads', spanning_read_names='spanning_read_names', spanning_reads='spanning_reads', stranded='stranded', tools='tools', tracking_id='tracking_id', transcript1='transcript1', transcript2='transcript2', untemplated_seq='untemplated_seq', validation_id='validation_id')

MavisNamespace – Column names for i/o files used throughout the pipeline

mavis.constants.DISEASE_STATUS = MavisNamespace(DISEASED='diseased', NORMAL='normal', _defns={}, _types={'DISEASED': <class 'str'>, 'NORMAL': <class 'str'>})

MavisNamespace – holds controlled vocabulary for allowed disease status

  • DISEASED: diseased
  • NORMAL: normal
mavis.constants.GENE_PRODUCT_TYPE = MavisNamespace(ANTI_SENSE='anti-sense', SENSE='sense', _defns={}, _types={'SENSE': <class 'str'>, 'ANTI_SENSE': <class 'str'>})

MavisNamespace – controlled vocabulary for gene products

  • SENSE: the gene product is a sense fusion
  • ANTI_SENSE: the gene product is anti-sense
mavis.constants.GIEMSA_STAIN = MavisNamespace(ACEN='acen', GNEG='gneg', GPOS100='gpos100', GPOS25='gpos25', GPOS50='gpos50', GPOS75='gpos75', GVAR='gvar', STALK='stalk', _defns={}, _types={'GNEG': <class 'str'>, 'GPOS50': <class 'str'>, 'GPOS75': <class 'str'>, 'GPOS25': <class 'str'>, 'GPOS100': <class 'str'>, 'ACEN': <class 'str'>, 'GVAR': <class 'str'>, 'STALK': <class 'str'>})

MavisNamespace – holds controlled vocabulary relating to stains of chromosome bands

class mavis.constants.MavisNamespace(**kwargs)[source]

Bases: argparse.Namespace

Namespace to hold module constants

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.thing
1
>>> nspace.otherthing
2
add(attr, *pos, **kwargs)[source]

Add an attribute to the name space. Optionally include cast_type and definition

Example

>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1, int, 'I am a thing')
>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1, int)
>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1)
>>> nspace = MavisNamespace()
>>> nspace.add('thing', value=1, cast_type=int, defn='I am a thing')
define(attr, *pos)[source]

Get the definition of a given attribute or return a default (when given) if the attribute does not exist

Returns:definition for the attribute
Return type:str
Raises:KeyError – the attribute does not exist and a default was not given

Example

>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1, defn='I am a thing')
>>> nspace.add('otherthing', 2)
>>> nspace.define('thing')
'I am a thing'
>>> nspace.define('otherthing')
Traceback (most recent call last):
....
>>> nspace.define('otherthing', 'I am some other thing')
'I am some other thing'
enforce(value)[source]

checks that the current namespace has a given value

Returns:the input value
Raises:KeyError – the value did not exist

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.enforce(1)
1
>>> nspace.enforce(3)
Traceback (most recent call last):
....
flatten()[source]

returns the namespace (minus types and definitions) as a dictionary

Example

>>> MavisNamespace(thing=1, otherthing=2).flatten()
{'thing': 1, 'otherthing': 2}
get(key, *pos)[source]

get an attribute, return a default (if given) if the attribute does not exist

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.get('thing', 2)
1
>>> nspace.get('nonexistant_thing', 2)
2
>>> nspace.get('nonexistant_thing')
Traceback (most recent call last):
....
items()[source]

Example

>>> MavisNamespace(thing=1, otherthing=2).items()
[('thing', 1), ('otherthing', 2)]
keys()[source]

get the attribute keys as a list

Example

>>> MavisNamespace(thing=1, otherthing=2).keys()
['thing', 'otherthing']
reserved_attr = ['_types', '_defns']
reverse(value)[source]

for a given value, return the associated key

Parameters:

value – the value to get the key/attribute name for

Raises:

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.reverse(1)
'thing'
type(attr)[source]

returns the type

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.type('thing')
<class 'int'>
values()[source]

get the attribute values as a list

Example

>>> MavisNamespace(thing=1, otherthing=2).values()
[1, 2]
mavis.constants.NA_MAPPING_QUALITY = 255

int – mapping quality value to indicate mapping was not performed/calculated

mavis.constants.ORIENT = MavisNamespace(LEFT='L', NS='?', RIGHT='R', _defns={}, _types={'LEFT': <class 'str'>, 'RIGHT': <class 'str'>, 'NS': <class 'str'>}, compare=<function <lambda>>, expand=<function <lambda>>)

MavisNamespace – holds controlled vocabulary for allowed orientation values

  • LEFT: left wrt to the positive/forward strand
  • RIGHT: right wrt to the positive/forward strand
  • NS: orientation is not specified
mavis.constants.PRIME = MavisNamespace(FIVE=5, THREE=3, _defns={}, _types={'FIVE': <class 'int'>, 'THREE': <class 'int'>})

MavisNamespace – holds controlled vocabulary

  • FIVE: five prime
  • THREE: three prime
mavis.constants.PROTOCOL = MavisNamespace(GENOME='genome', TRANS='transcriptome', _defns={}, _types={'GENOME': <class 'str'>, 'TRANS': <class 'str'>})

MavisNamespace – holds controlled vocabulary for allowed protocol values

  • GENOME: genome
  • TRANS: transcriptome
mavis.constants.PYSAM_READ_FLAGS = MavisNamespace(BLAT_ALIGNMENTS='ba', BLAT_PERCENT_IDENTITY='bi', BLAT_PMS='bp', BLAT_RANK='br', BLAT_SCORE='bs', FIRST_IN_PAIR=64, LAST_IN_PAIR=128, MATE_REVERSE=32, MATE_UNMAPPED=8, MULTIMAP=1, RECOMPUTED_CIGAR='rc', REVERSE=16, SECONDARY=256, SUPPLEMENTARY=2048, TARGETED_ALIGNMENT='ta', UNMAPPED=4, _defns={}, _types={'REVERSE': <class 'int'>, 'MATE_REVERSE': <class 'int'>, 'UNMAPPED': <class 'int'>, 'MATE_UNMAPPED': <class 'int'>, 'FIRST_IN_PAIR': <class 'int'>, 'LAST_IN_PAIR': <class 'int'>, 'SECONDARY': <class 'int'>, 'MULTIMAP': <class 'int'>, 'SUPPLEMENTARY': <class 'int'>, 'TARGETED_ALIGNMENT': <class 'str'>, 'RECOMPUTED_CIGAR': <class 'str'>, 'BLAT_RANK': <class 'str'>, 'BLAT_SCORE': <class 'str'>, 'BLAT_ALIGNMENTS': <class 'str'>, 'BLAT_PERCENT_IDENTITY': <class 'str'>, 'BLAT_PMS': <class 'str'>})

MavisNamespace – Enum-like. For readable PYSAM flag constants

  • MULTIMAP: template having multiple segments in sequencing
  • UNMAPPED: segment unmapped
  • MATE_UNMAPPED: next segment in the template unmapped
  • REVERSE: SEQ being reverse complemented
  • MATE_REVERSE: SEQ of the next segment in the template being reverse complemented
  • FIRST_IN_PAIR: the first segment in the template
  • LAST_IN_PAIR: the last segment in the template
  • SECONDARY: secondary alignment
  • SUPPLEMENTARY: supplementary alignment

note: descriptions are taken from the samfile documentation

mavis.constants.START_AA = 'M'

str – The amino acid expected to start translation

mavis.constants.STOP_AA = '*'

str – The amino acid expected to end translation

mavis.constants.STRAND = MavisNamespace(NEG='-', NS='?', POS='+', _defns={}, _types={'POS': <class 'str'>, 'NEG': <class 'str'>, 'NS': <class 'str'>}, compare=<function <lambda>>, expand=<function <lambda>>)

MavisNamespace – holds controlled vocabulary for allowed strand values

  • POS: the positive/forward strand
  • NEG: the negative/reverse strand
  • NS: strand is not specified
mavis.constants.SUBCOMMAND = MavisNamespace(ANNOTATE='annotate', CHECKER='checker', CLUSTER='cluster', CONFIG='config', CONVERT='convert', OVERLAY='overlay', PAIR='pairing', PIPELINE='pipeline', SUMMARY='summary', VALIDATE='validate', _defns={}, _types={'ANNOTATE': <class 'str'>, 'VALIDATE': <class 'str'>, 'PIPELINE': <class 'str'>, 'CLUSTER': <class 'str'>, 'PAIR': <class 'str'>, 'SUMMARY': <class 'str'>, 'CHECKER': <class 'str'>, 'CONFIG': <class 'str'>, 'CONVERT': <class 'str'>, 'OVERLAY': <class 'str'>})

MavisNamespace – holds controlled vocabulary for allowed pipeline stage values

  • annotate
  • validate
  • pipeline
  • cluster
  • pairing
  • summary
  • checker
  • config
  • convert
mavis.constants.SVTYPE = MavisNamespace(DEL='deletion', DUP='duplication', INS='insertion', INV='inversion', ITRANS='inverted translocation', TRANS='translocation', _defns={}, _types={'DEL': <class 'str'>, 'TRANS': <class 'str'>, 'ITRANS': <class 'str'>, 'INV': <class 'str'>, 'INS': <class 'str'>, 'DUP': <class 'str'>})

MavisNamespace – holds controlled vocabulary for acceptable structural variant classifications

  • DEL: deletion
  • TRANS: translocation
  • ITRANS: inverted translocation
  • INV: inversion
  • INS: insertion
  • DUP: duplication
mavis.constants.float_fraction(num)[source]

cast input to a float

Parameters:num – input to cast
Returns:float
Raises:TypeError – if the input cannot be cast to a float or the number is not between 0 and 1
mavis.constants.reverse_complement(s)[source]

wrapper for the Bio.Seq reverse_complement method

Parameters:s (str) – the input DNA sequence
Returns:the reverse complement of the input sequence
Return type:str

Warning

assumes the input is a DNA sequence

Example

>>> reverse_complement('ATCCGGT')
'ACCGGAT'
mavis.constants.sort_columns(input_columns)[source]
mavis.constants.translate(s, reading_frame=0)[source]

given a DNA sequence, translates it and returns the protein amino acid sequence

Parameters:
  • s (str) – the input DNA sequence
  • reading_frame (int) – where to start translating the sequence
Returns:

the amino acid sequence

Return type:

str