variant module

class mavis.annotate.variant.Annotation(bpp, transcript1=None, transcript2=None, data={}, event_type=None, proximity=5000)[source]

Bases: mavis.breakpoint.BreakpointPair

a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes

Holds a breakpoint call and a set of transcripts, other information is gathered relative to these

Parameters:
  • bpp (BreakpointPair) – the breakpoint pair call. Will be adjusted and then stored based on the transcripts
  • transcript1 (Transcript) – transcript at the first breakpoint
  • transcript2 (Transcript) – Transcript at the second breakpoint
  • data (dict) – optional dictionary to hold related attributes
  • event_type (SVTYPE) – the type of event
add_gene(gene)[source]

adds a gene to the current set of annotations. Checks which set it should be added to

Parameters:gene (Gene) – the gene being added
flatten()[source]

generates a dictionary of the annotation information as strings

Returns:dictionary of attribute names and values
Return type:dict of str by str
class mavis.annotate.variant.FusionTranscript[source]

Bases: mavis.annotate.genomic.usTranscript

classmethod build(ann, REFERENCE_GENOME, min_orf_size=None, max_orf_cap=None, min_domain_mapping_match=None)[source]
Parameters:
  • ann (Annotation) – the annotation object we want to build a FusionTranscript for
  • REFERENCE_GENOME (dict of Bio.SeqRecord by str) – dict of reference sequence by template/chr name
Returns:

the newly built fusion transcript

Return type:

FusionTranscript

exon_number(exon)[source]
Parameters:exon (Exon) – the exon to be numbered
Returns:the number of the exon in the original transcript (prior to fusion)
Return type:int
get_seq(REFERENCE_GENOME=None, ignore_cache=False)[source]
get_spliced_cdna_seq(splicing_pattern, REFERENCE_GENOME=None, ignore_cache=False)[source]
Parameters:
  • splicing_pattern (list of int) – the list of splicing positions
  • REFERENCE_GENOME (dict of Bio.SeqRecord by str) – dict of reference seq by template/chr name
Returns:

the spliced cDNA seq

Return type:

str

mavis.annotate.variant.annotate_events(bpps, annotations, reference_genome, max_proximity=5000, min_orf_size=200, min_domain_mapping_match=0.95, max_orf_cap=3, log=<function <lambda>>)[source]
mavis.annotate.variant.determine_prime(transcript, breakpoint)[source]

determine the side of the transcript 5’ or 3’ which is ‘kept’ given the breakpoint

Parameters:
Returns:

5’ or 3’

Return type:

PRIME

Raises:

AttributeError – if the orientation of the breakpoint or the strand of the transcript is not specified

mavis.annotate.variant.overlapping_transcripts(ref_ann, breakpoint)[source]
Parameters:
  • ref_ann (dict of list of Gene by str) – the reference list of genes split by chromosome
  • breakpoint (Breakpoint) – the breakpoint in question
Returns:

a list of possible transcripts

Return type:

list of usTranscript