lib5c.parsers.genes module

Module for parsing .bed files containing gene track information.

lib5c.parsers.genes.load_gene_table(tablefile)[source]

Similar to load_genes(), but reads in a gzipped UCSC table file instead.

The main advantage of this approach is that genes parsed this way include human-readable gene symbols.

Parameters

tablefile (str) – String reference to location of the gzipped table file to read.

Returns

The keys are chromosome names. The values are lists of genes for that chromosome. The genes are represented as dicts with the following structure:

{
    'start' : int,
    'end'   : int,
    'name'  : str,
    'id': str,
    'strand': '+' or '-',
    'blocks': list of dicts
}

Blocks typically represent exons and are represented as dicts with the following structure:

{
    'start': int,
    'end'  : int
}

Return type

dict of lists of dicts

lib5c.parsers.genes.load_genes(bedfile)[source]

Loads information for genes from a .bed file into dicts and returns them.

Parameters

bedfile (str) – String reference to location of .bed file to load genes from.

Returns

The keys are chromosome names. The values are lists of genes for that chromosome. The genes are represented as dicts with the following structure:

{
    'start' : int,
    'end'   : int,
    'name'  : str,
    'strand': '+' or '-',
    'blocks': list of dicts
}

Blocks typically represent exons and are represented as dicts with the following structure:

{
    'start': int,
    'end'  : int
}

Return type

dict of lists of dicts

lib5c.parsers.genes.main()[source]