lib5c.util.bed module¶
Module containing utilities for manipulating BED files and BED features.
BED features are commonly represented as dicts with the following structure:
{
'chrom': str
'start': int,
'end' : int,
}
but may also contain additional fields.
-
lib5c.util.bed.
check_intersect
(a, b)[source]¶ Checks to see if two features intersect.
- Parameters
b (a,) – The two features to check for intersection.
- Returns
True if the features intersect, False otherwise.
- Return type
bool
Notes
Features are represented as dicts with the following structure:
{ 'chrom': str 'start': int, 'end' : int, }
See
lib5c.parsers.bed.load_features()
.
-
lib5c.util.bed.
count_intersections
(query_feature, feature_set)[source]¶ Counts the number of times a query feature is hit by a set of other features.
- Parameters
query_feature (Dict[str, Any]) – The feature to count intersections for.
feature_set (List[Dict[str, Any]]) – The set of features to intersect with the query feature.
- Returns
The number of intersections
- Return type
int
Notes
Features are represented as dicts with the following structure:
{ 'chrom': str 'start': int, 'end' : int, }
See
lib5c.parsers.bed.load_features()
.
-
lib5c.util.bed.
flatten_features
(features)[source]¶ Flattens a features dict and returns a flat list of features.
Typically, BED features are kept in dicts organized by chromosome. For example, this is the data structure returned by
lib5c.parsers.bed.load_features()
. When a flat list is desired, this function can be used to flatten the dictionary into a simple list.- Parameters
features (Dict[str, List[Dict[str, Any]]]) –
The keys are chromosome names. The values are lists of features for that chromosome. The features are represented as dicts with at least the following keys:
{ 'start': int, 'end' : int }
- Returns
These dicts, which represent the same features as those contained in the original dict, have the following keys:
{ 'chrom': str, 'start': int, 'end' : int }
as well as any additional keys that were present in the inner dicts of the features dict passed to this function.
- Return type
List[Dict[str, Any]]
Notes
If the dicts that describe the features already contain a ‘chrom’ key, that key’s value will get overwritten during the flattening.
-
lib5c.util.bed.
get_mid_to_mid_distance
(fragment_a, fragment_b)[source]¶ Gets the mid-to-mid distance between two fragments.
- Parameters
fragment_b (fragment_a,) –
The fragments to find the distance between. The fragments must be represented as dicts with at least the following keys:
{ 'start': int, 'end': int }
- Returns
The mid-to-mid distance
- Return type
float
-
lib5c.util.bed.
get_midpoint
(fragment, force_int=False)[source]¶ Gets the midpoint of a fragment.
- Parameters
fragment (Dict[str, Any]) –
The fragment to find the midpoint of. The fragment must be represented as a dict with at least the following keys:
{ 'start': int, 'end': int }
force_int (bool) – Return an int rounded towards zero instead of a float.
- Returns
The midpoint of the fragment, rounded towards zero if force_int is True.
- Return type
float
Examples
>>> fragment = {'start': 50, 'end': 100} >>> get_midpoint(fragment) 75.0
-
lib5c.util.bed.
parse_feature_from_string
(grange_string)[source]¶ Parses BED feature from a string specifying the genomic range.
- Parameters
grange_string (str) – The genomic range to parse, specified as a string of the form <chrom>:<start>-<end>. The interval is interpreted as a BED interval (0-based index, half-open interval).
- Returns
The BED feature dict, which has keys ‘chrom’, ‘start’, and ‘end’.
- Return type
dict