lib5c.parsers.primers module¶
Module for parsing .bed files containing 5C primer and bin information.
-
lib5c.parsers.primers.
get_pixelmap_legacy
(bedfile, name_parser=<function default_bin_parser>)[source]¶ - Parameters
bedfile (str) – String reference to a binned primer bedfile to use to generate the pixelmap.
name_parser (Optional[Callable[[str], Dict[str, Any]]]) –
Function that takes in the bin name column of the bedfile (the fourth column) and returns a dict containing key-value pairs to be added to the dict that represents that bin. At a minimum, this dict must have the following structure:
{ 'region': str }
If the dict includes any keys that are already typically included in the bin dict, the values returned by this function will overwrite the usual values.
- Returns
The keys of the outer dict are region names. The values are lists, where the \(i\) th entry represents the \(i\) th bin in that region. Bins are represented as dicts with the following structure:
{ 'chrom': str, 'start': int, 'end' : int, 'name' : str }
Additional keys may be present if returned by
name_parser
.- Return type
Dict[str, List[Dict[str, Any]]]
Notes
A pixelmap is a mapping from bins (specified by a region name and bin or primer index) to the genomic range covered by those bins.
-
lib5c.parsers.primers.
load_primermap
(bedfile, name_parser=None, strand_index=5, region_index=None, column_names=None)[source]¶ - Parameters
bedfile (str) – String reference to a primer bedfile to use to generate the primermap.
name_parser (Optional[Callable[[str], Dict[str, Any]]]) –
Function that takes in the primer name column of the bedfile (the fourth column) and returns a dict containing key-value pairs to be added to the dict that represents that primer. At a minimum, this dict must have the following structure:
{ 'region': string }
If the dict includes any keys that are already typically included in the primer dict, the values returned by this function will overwrite the usual values. If None is passed, an appropriate name parser will be guessed based on the primer/bin names.
strand_index (Optional[int]) – If an int is passed, the column with that index will be used to determine strand information for the primer. If
None
is passed, the algorithm will try to guess which column contains this information. If this fails, strand information will not be included in the primer dict. Acceptable strings to indicate primer strand are ‘F’/’R’, ‘FOR’/’REV’, and ‘+’/’-‘. Primers on the + strand will be assumed to be oriented in the 3’ direction, and primers on the - strand will be assumed to be oriented in the 5’ direction, unless an ‘orientation’ key is provided in the dict returned byname_parser
.region_index (Optional[int]) – If an int is passed, the column with that index will be used to determine the region the primer is in. This makes specifying
region_parser
optional and overrides the region it returns.column_names (Optional[List[str]]) – Pass a list of strings equal to the number of columns in the bedfile, describing the columns. The first four elements will be ignored. Special values include ‘strand’, which will set
strand_index
, and ‘region’, which will overrideregion_index
. All other values will end up as keys in the primer dicts. If this is not passed, this function will look for a header line in the primerfile, and if one is not found, a default header will be assumed.
- Returns
The keys of the outer dict are region names. The values are lists, where the \(i\) th entry represents the \(i\) th primer in that region. Primers are represented as dicts with the following structure:
{ 'region' : str 'chrom' : str, 'start' : int, 'end' : int, 'name' : str, 'strand' : '+' or '-', 'orientation': "3'" or "5'" }
though strand and orientation may not be present, and additional keys may be present if returned by
name_parser
, passed incolumn_names
, or if a header line is present.- Return type
Dict[str, List[Dict[str, Any]]]
Notes
A primermap is a mapping from primers (specified by a region name and primer index) to the genomic range covered by those primers.