assemble module

class mavis.assemble.Contig(sequence, score)[source]

Bases: object

add_mapped_sequence(read, multimap=1)[source]
remap_score()[source]
class mavis.assemble.DeBruijnGraph(data=None, **attr)[source]

Bases: networkx.classes.digraph.DiGraph

wrapper for a basic digraph enforces edge weights

Initialize a graph with edges, name, graph attributes.

Parameters:
  • data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a NumPy matrix or 2d ndarray, a SciPy sparse matrix, or a PyGraphviz graph.
  • name (string, optional (default='')) – An optional name for the graph.
  • attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.

See also

convert

Examples

>>> G = nx.Graph()   # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G = nx.Graph(name='my graph')
>>> e = [(1,2),(2,3),(3,4)] # list of edges
>>> G = nx.Graph(e)

Arbitrary graph attribute pairs (key=value) may be assigned

>>> G=nx.Graph(e, day="Friday")
>>> G.graph
{'day': 'Friday'}
add_edge(n1, n2, freq=1)[source]

add a given edge to the graph, if it exists add the frequency to the existing frequency count

get_edge_freq(n1, n2)[source]

returns the freq from the data attribute for a specified edge

get_sinks(subgraph=None)[source]

returns all nodes with an outgoing degree of zero

get_sources(subgraph=None)[source]

returns all nodes with an incoming degree of zero

trim_noncutting_paths_by_freq(min_weight)[source]

trim any low weight edges where another path exists between the source and target of higher weight

trim_tails_by_freq(min_weight)[source]

for any paths where all edges are lower than the minimum weight trim

Parameters:min_weight (int) – the minimum weight for an edge to be retained
mavis.assemble.assemble(sequences, assembly_max_kmer_size=None, assembly_min_edge_weight=3, assembly_min_match_quality=0.95, assembly_min_read_mapping_overlap=None, assembly_min_contig_length=None, assembly_min_exact_match_to_remap=6, assembly_max_paths=20, assembly_max_kmer_strict=False, log=<function <lambda>>)[source]

for a set of sequences creates a DeBruijnGraph simplifies trailing and leading paths where edges fall below a weight threshold and the return all possible unitigs/contigs

Parameters:
  • sequences (list of str) – a list of strings/sequences to assemble
  • assembly_max_kmer_size (int) – the size of the kmer to use
  • assembly_min_edge_weight (int) – see assembly_min_edge_weight
  • assembly_min_match_quality (float) – percent match for re-aligned reads to contigs
  • assembly_min_read_mapping_overlap (int) – the minimum amount of overlap required when aligning reads to contigs
  • assembly_max_paths (int) – see assembly_max_paths
Returns:

a list of putative contigs

Return type:

list of Contig

mavis.assemble.digraph_connected_components(graph, subgraph=None)[source]

the networkx module does not support deriving connected components from digraphs (only simple graphs) this function assumes that connection != reachable this means there is no difference between connected components in a simple graph and a digraph

Parameters:graph (networkx.DiGraph) – the input graph to gather components from
Returns:returns a list of compnents which are lists of node names
Return type:list of list
mavis.assemble.kmers(s, size)[source]

for a sequence, compute and return a list of all kmers of a specified size

Parameters:
  • s (str) – the input sequence
  • size (int) – the size of the kmers
Returns:

the list of kmers

Return type:

list of str

Example

>>> kmers('abcdef', 2)
['ab', 'bc', 'cd', 'de', 'ef']