lib5c.util.bed module¶
Module containing utilities for manipulating BED files and BED features.
BED features are commonly represented as dicts with the following structure:
{
'chrom': str
'start': int,
'end' : int,
}
but may also contain additional fields.
-
lib5c.util.bed.
check_intersect
(a, b)[source]¶ Checks to see if two features intersect.
Parameters: b (a,) – The two features to check for intersection. Returns: True if the features intersect, False otherwise. Return type: bool Notes
Features are represented as dicts with the following structure:
{ 'chrom': str 'start': int, 'end' : int, }
See
lib5c.parsers.bed.load_features()
.
-
lib5c.util.bed.
count_intersections
(query_feature, feature_set)[source]¶ Counts the number of times a query feature is hit by a set of other features.
Parameters: - query_feature (Dict[str, Any]) – The feature to count intersections for.
- feature_set (List[Dict[str, Any]]) – The set of features to intersect with the query feature.
Returns: The number of intersections
Return type: int
Notes
Features are represented as dicts with the following structure:
{ 'chrom': str 'start': int, 'end' : int, }
See
lib5c.parsers.bed.load_features()
.
-
lib5c.util.bed.
flatten_features
(features)[source]¶ Flattens a features dict and returns a flat list of features.
Typically, BED features are kept in dicts organized by chromosome. For example, this is the data structure returned by
lib5c.parsers.bed.load_features()
. When a flat list is desired, this function can be used to flatten the dictionary into a simple list.Parameters: features (Dict[str, List[Dict[str, Any]]]) – The keys are chromosome names. The values are lists of features for that chromosome. The features are represented as dicts with at least the following keys:
{ 'start': int, 'end' : int }
Returns: These dicts, which represent the same features as those contained in the original dict, have the following keys: { 'chrom': str, 'start': int, 'end' : int }
as well as any additional keys that were present in the inner dicts of the features dict passed to this function.
Return type: List[Dict[str, Any]] Notes
If the dicts that describe the features already contain a ‘chrom’ key, that key’s value will get overwritten during the flattening.
-
lib5c.util.bed.
get_mid_to_mid_distance
(fragment_a, fragment_b)[source]¶ Gets the mid-to-mid distance between two fragments.
Parameters: fragment_b (fragment_a,) – The fragments to find the distance between. The fragments must be represented as dicts with at least the following keys:
{ 'start': int, 'end': int }
Returns: The mid-to-mid distance Return type: float
-
lib5c.util.bed.
get_midpoint
(fragment, force_int=False)[source]¶ Gets the midpoint of a fragment.
Parameters: - fragment (Dict[str, Any]) –
The fragment to find the midpoint of. The fragment must be represented as a dict with at least the following keys:
{ 'start': int, 'end': int }
- force_int (bool) – Return an int rounded towards zero instead of a float.
Returns: The midpoint of the fragment, rounded towards zero if force_int is True.
Return type: float
Examples
>>> fragment = {'start': 50, 'end': 100} >>> get_midpoint(fragment) 75.0
- fragment (Dict[str, Any]) –
-
lib5c.util.bed.
parse_feature_from_string
(grange_string)[source]¶ Parses BED feature from a string specifying the genomic range.
Parameters: grange_string (str) – The genomic range to parse, specified as a string of the form <chrom>:<start>-<end>. The interval is interpreted as a BED interval (0-based index, half-open interval). Returns: The BED feature dict, which has keys ‘chrom’, ‘start’, and ‘end’. Return type: dict