file_io module

module which holds all functions relating to loading reference files

mavis.annotate.file_io.load_masking_regions(filepath)[source]

reads a file of regions. The expect input format for the file is tab-delimited and the header should contain the following columns

  • chr: the chromosome
  • start: start of the region, 1-based inclusive
  • end: end of the region, 1-based inclusive
  • name: the name/label of the region

For example:

#chr    start   end     name
chr20   25600000        27500000        centromere
Parameters:filepath (str) – path to the input tab-delimited file
Returns:a dictionary keyed by chromosome name with values of lists of regions on the chromosome
Return type:dict of list of BioInterval by str

Example

>>> m = load_masking_regions('filename')
>>> m['1']
[BioInterval(), BioInterval(), ...]
mavis.annotate.file_io.load_reference_genes(filepath, verbose=True, REFERENCE_GENOME=None, filetype=None, best_transcripts_only=False)[source]

loads gene models from an input file. Expects a tabbed or json file.

Parameters:
  • filepath (str) – path to the input file
  • verbose (bool) – output extra information to stdout
  • REFERENCE_GENOME (dict of Bio.SeqRecord by str) – dict of reference sequence by template/chr name
  • filetype (str) – json or tab/tsv. only required if the file type can’t be interpolated from the path extenstion
Returns:

lists of genes keyed by chromosome name

Return type:

dict of list of Gene by str

mavis.annotate.file_io.load_reference_genome(filename, low_mem=False)[source]
Parameters:filename (str) – the path to the file containing the input fasta genome
Returns:
a dictionary representing the sequences in the
fasta file
Return type:dict of Bio.SeqRecord by str
mavis.annotate.file_io.load_templates(filename)[source]

primarily useful if template drawings are required and is not necessary otherwise assumes the input file is 0-indexed with [start,end) style. Columns are expected in the following order, tab-delimited. A header should not be given

  1. name
  2. start
  3. end
  4. band_name
  5. giesma_stain

for example

chr1    0       2300000 p36.33  gneg
chr1    2300000 5400000 p36.32  gpos25
Parameters:filename (str) – the path to the file with the cytoband template information
Returns:list of the templates loaded
Return type:list of Template