lib5c.util.bed module

Module containing utilities for manipulating BED files and BED features.

BED features are commonly represented as dicts with the following structure:

{
    'chrom': str
    'start': int,
    'end'  : int,
}

but may also contain additional fields.

lib5c.util.bed.check_intersect(a, b)[source]

Checks to see if two features intersect.

Parameters:b (a,) – The two features to check for intersection.
Returns:True if the features intersect, False otherwise.
Return type:bool

Notes

Features are represented as dicts with the following structure:

{
    'chrom': str
    'start': int,
    'end'  : int,
}

See lib5c.parsers.bed.load_features().

lib5c.util.bed.count_intersections(query_feature, feature_set)[source]

Counts the number of times a query feature is hit by a set of other features.

Parameters:
  • query_feature (Dict[str, Any]) – The feature to count intersections for.
  • feature_set (List[Dict[str, Any]]) – The set of features to intersect with the query feature.
Returns:

The number of intersections

Return type:

int

Notes

Features are represented as dicts with the following structure:

{
    'chrom': str
    'start': int,
    'end'  : int,
}

See lib5c.parsers.bed.load_features().

lib5c.util.bed.flatten_features(features)[source]

Flattens a features dict and returns a flat list of features.

Typically, BED features are kept in dicts organized by chromosome. For example, this is the data structure returned by lib5c.parsers.bed.load_features(). When a flat list is desired, this function can be used to flatten the dictionary into a simple list.

Parameters:features (Dict[str, List[Dict[str, Any]]]) –

The keys are chromosome names. The values are lists of features for that chromosome. The features are represented as dicts with at least the following keys:

{
    'start': int,
    'end'  : int
}
Returns:These dicts, which represent the same features as those contained in the original dict, have the following keys:
{
    'chrom': str,
    'start': int,
    'end'  : int
}

as well as any additional keys that were present in the inner dicts of the features dict passed to this function.

Return type:List[Dict[str, Any]]

Notes

If the dicts that describe the features already contain a ‘chrom’ key, that key’s value will get overwritten during the flattening.

lib5c.util.bed.get_mid_to_mid_distance(fragment_a, fragment_b)[source]

Gets the mid-to-mid distance between two fragments.

Parameters:fragment_b (fragment_a,) –

The fragments to find the distance between. The fragments must be represented as dicts with at least the following keys:

{
    'start': int,
    'end': int
}
Returns:The mid-to-mid distance
Return type:float
lib5c.util.bed.get_midpoint(fragment, force_int=False)[source]

Gets the midpoint of a fragment.

Parameters:
  • fragment (Dict[str, Any]) –

    The fragment to find the midpoint of. The fragment must be represented as a dict with at least the following keys:

    {
        'start': int,
        'end': int
    }
    
  • force_int (bool) – Return an int rounded towards zero instead of a float.
Returns:

The midpoint of the fragment, rounded towards zero if force_int is True.

Return type:

float

Examples

>>> fragment = {'start': 50, 'end': 100}
>>> get_midpoint(fragment)
75.0
lib5c.util.bed.main()[source]
lib5c.util.bed.parse_feature_from_string(grange_string)[source]

Parses BED feature from a string specifying the genomic range.

Parameters:grange_string (str) – The genomic range to parse, specified as a string of the form <chrom>:<start>-<end>. The interval is interpreted as a BED interval (0-based index, half-open interval).
Returns:The BED feature dict, which has keys ‘chrom’, ‘start’, and ‘end’.
Return type:dict