lib5c.parsers.primers module

Module for parsing .bed files containing 5C primer and bin information.

lib5c.parsers.primers.get_pixelmap_legacy(bedfile, name_parser=<function default_bin_parser>)[source]
Parameters
  • bedfile (str) – String reference to a binned primer bedfile to use to generate the pixelmap.

  • name_parser (Optional[Callable[[str], Dict[str, Any]]]) –

    Function that takes in the bin name column of the bedfile (the fourth column) and returns a dict containing key-value pairs to be added to the dict that represents that bin. At a minimum, this dict must have the following structure:

    {
        'region': str
    }
    

    If the dict includes any keys that are already typically included in the bin dict, the values returned by this function will overwrite the usual values.

Returns

The keys of the outer dict are region names. The values are lists, where the \(i\) th entry represents the \(i\) th bin in that region. Bins are represented as dicts with the following structure:

{
    'chrom': str,
    'start': int,
    'end'  : int,
    'name' : str
}

Additional keys may be present if returned by name_parser.

Return type

Dict[str, List[Dict[str, Any]]]

Notes

A pixelmap is a mapping from bins (specified by a region name and bin or primer index) to the genomic range covered by those bins.

lib5c.parsers.primers.load_primermap(bedfile, name_parser=None, strand_index=5, region_index=None, column_names=None)[source]
Parameters
  • bedfile (str) – String reference to a primer bedfile to use to generate the primermap.

  • name_parser (Optional[Callable[[str], Dict[str, Any]]]) –

    Function that takes in the primer name column of the bedfile (the fourth column) and returns a dict containing key-value pairs to be added to the dict that represents that primer. At a minimum, this dict must have the following structure:

    {
        'region': string
    }
    

    If the dict includes any keys that are already typically included in the primer dict, the values returned by this function will overwrite the usual values. If None is passed, an appropriate name parser will be guessed based on the primer/bin names.

  • strand_index (Optional[int]) – If an int is passed, the column with that index will be used to determine strand information for the primer. If None is passed, the algorithm will try to guess which column contains this information. If this fails, strand information will not be included in the primer dict. Acceptable strings to indicate primer strand are ‘F’/’R’, ‘FOR’/’REV’, and ‘+’/’-‘. Primers on the + strand will be assumed to be oriented in the 3’ direction, and primers on the - strand will be assumed to be oriented in the 5’ direction, unless an ‘orientation’ key is provided in the dict returned by name_parser.

  • region_index (Optional[int]) – If an int is passed, the column with that index will be used to determine the region the primer is in. This makes specifying region_parser optional and overrides the region it returns.

  • column_names (Optional[List[str]]) – Pass a list of strings equal to the number of columns in the bedfile, describing the columns. The first four elements will be ignored. Special values include ‘strand’, which will set strand_index, and ‘region’, which will override region_index. All other values will end up as keys in the primer dicts. If this is not passed, this function will look for a header line in the primerfile, and if one is not found, a default header will be assumed.

Returns

The keys of the outer dict are region names. The values are lists, where the \(i\) th entry represents the \(i\) th primer in that region. Primers are represented as dicts with the following structure:

{
    'region'     : str
    'chrom'      : str,
    'start'      : int,
    'end'        : int,
    'name'       : str,
    'strand'     : '+' or '-',
    'orientation': "3'" or "5'"
}

though strand and orientation may not be present, and additional keys may be present if returned by name_parser, passed in column_names, or if a header line is present.

Return type

Dict[str, List[Dict[str, Any]]]

Notes

A primermap is a mapping from primers (specified by a region name and primer index) to the genomic range covered by those primers.

lib5c.parsers.primers.main()[source]