variant module¶

class mavis.annotate.variant.Annotation(bpp, transcript1=None, transcript2=None, proximity=5000, data=None, **kwargs)[source]¶

Bases: mavis.breakpoint.BreakpointPair

a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes

Holds a breakpoint call and a set of transcripts, other information is gathered relative to these

Parameters:	bpp (BreakpointPair) – the breakpoint pair call. Will be adjusted and then stored based on the transcripts transcript1 (Transcript) – transcript at the first breakpoint transcript2 (Transcript) – Transcript at the second breakpoint data (dict) – optional dictionary to hold related attributes event_type (SVTYPE) – the type of event

add_gene(gene)[source]¶

adds a gene to the current set of annotations. Checks which set it should be added to

Parameters:	gene (Gene) – the gene being added

flatten()[source]¶

generates a dictionary of the annotation information as strings

Returns:	dictionary of attribute names and values
Return type:	`dict` of `str` by `str`

class mavis.annotate.variant.FusionTranscript[source]¶

Bases: mavis.annotate.genomic.usTranscript

classmethod build(ann, REFERENCE_GENOME, min_orf_size=None, max_orf_cap=None, min_domain_mapping_match=None)[source]¶

Parameters:	ann (Annotation) – the annotation object we want to build a FusionTranscript for REFERENCE_GENOME (`dict` of `Bio.SeqRecord` by `str`) – dict of reference sequence by template/chr name
Returns:	the newly built fusion transcript
Return type:	FusionTranscript

exon_number(exon)[source]¶

Parameters:	exon (Exon) – the exon to be numbered
Returns:	the number of the exon in the original transcript (prior to fusion)
Return type:	int

get_seq(REFERENCE_GENOME=None, ignore_cache=False)[source]¶

get_spliced_cdna_seq(splicing_pattern, REFERENCE_GENOME=None, ignore_cache=False)[source]¶

Parameters:	splicing_pattern (`list` of `int`) – the list of splicing positions REFERENCE_GENOME (`dict` of `Bio.SeqRecord` by `str`) – dict of reference seq by template/chr name
Returns:	the spliced cDNA seq
Return type:	str

map_region_to_genome(chr, interval_on_fusion, genome_interval, flipped=False)[source]¶

mavis.annotate.variant.annotate_events(bpps, annotations, reference_genome, max_proximity=5000, min_orf_size=200, min_domain_mapping_match=0.95, max_orf_cap=3, log=<function devnull>, filters=None)[source]¶

Parameters:	bpps (list of `BreakpointPair`) – list of events annotations – reference annotations reference_genome (dict of string by string) – dictionary of reference sequences by name max_proximity (int) – see max_proximity min_orf_size (int) – see min_orf_size min_domain_mapping_match (float) – see min_domain_mapping_match max_orf_cap (int) – see max_orf_cap log (callable) – callable function to take in strings and time_stamp args filters (list of callable) – list of functions taking in a list and returning a list for filtering
Returns:	list of the putative annotations
Return type:	list of `Annotation`

mavis.annotate.variant.choose_more_annotated(ann_list)[source]¶

for a given set of annotations if there are annotations which contain transcripts and annotations that are simply intergenic regions, discard the intergenic region annotations

similarly if there are annotations where both breakpoints fall in a transcript and annotations where one or more breakpoints lands in an intergenic region, discard those that land in the intergenic region

Parameters:	ann_list (list of `Annotation`) – list of input annotations

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

Returns:	the filtered list
Return type:	list of `Annotation`

mavis.annotate.variant.choose_transcripts_by_priority(ann_list)[source]¶

for each set of annotations with the same combinations of genes, choose the annotation with the most “best_transcripts” or most “alphanumeric” choices of transcript. Throw an error if they are identical

Parameters:	ann_list (list of `Annotation`) – input annotations

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

Returns:	the filtered list
Return type:	list of `Annotation`

mavis.annotate.variant.determine_prime(transcript, breakpoint)[source]¶

determine the side of the transcript 5’ or 3’ which is ‘kept’ given the breakpoint

Parameters:	transcript (Transcript) – the transcript breakpoint (Breakpoint) – the breakpoint
Returns:	5’ or 3’
Return type:	PRIME
Raises:	`AttributeError` – if the orientation of the breakpoint or the strand of the transcript is not specified

mavis.annotate.variant.overlapping_transcripts(ref_ann, breakpoint)[source]¶

Parameters:	ref_ann (`dict` of `list` of `Gene` by `str`) – the reference list of genes split by chromosome breakpoint (Breakpoint) – the breakpoint in question
Returns:	a list of possible transcripts
Return type:	`list` of `usTranscript`