lib5c.util.bed module

Module containing utilities for manipulating BED files and BED features.

BED features are commonly represented as dicts with the following structure:

{
    'chrom': str
    'start': int,
    'end'  : int,
}

but may also contain additional fields.

lib5c.util.bed.check_intersect(a, b)[source]

Checks to see if two features intersect.

Parameters

b (a,) – The two features to check for intersection.

Returns

True if the features intersect, False otherwise.

Return type

bool

Notes

Features are represented as dicts with the following structure:

{
    'chrom': str
    'start': int,
    'end'  : int,
}

See lib5c.parsers.bed.load_features().

lib5c.util.bed.count_intersections(query_feature, feature_set)[source]

Counts the number of times a query feature is hit by a set of other features.

Parameters
  • query_feature (Dict[str, Any]) – The feature to count intersections for.

  • feature_set (List[Dict[str, Any]]) – The set of features to intersect with the query feature.

Returns

The number of intersections

Return type

int

Notes

Features are represented as dicts with the following structure:

{
    'chrom': str
    'start': int,
    'end'  : int,
}

See lib5c.parsers.bed.load_features().

lib5c.util.bed.flatten_features(features)[source]

Flattens a features dict and returns a flat list of features.

Typically, BED features are kept in dicts organized by chromosome. For example, this is the data structure returned by lib5c.parsers.bed.load_features(). When a flat list is desired, this function can be used to flatten the dictionary into a simple list.

Parameters

features (Dict[str, List[Dict[str, Any]]]) –

The keys are chromosome names. The values are lists of features for that chromosome. The features are represented as dicts with at least the following keys:

{
    'start': int,
    'end'  : int
}

Returns

These dicts, which represent the same features as those contained in the original dict, have the following keys:

{
    'chrom': str,
    'start': int,
    'end'  : int
}

as well as any additional keys that were present in the inner dicts of the features dict passed to this function.

Return type

List[Dict[str, Any]]

Notes

If the dicts that describe the features already contain a ‘chrom’ key, that key’s value will get overwritten during the flattening.

lib5c.util.bed.get_mid_to_mid_distance(fragment_a, fragment_b)[source]

Gets the mid-to-mid distance between two fragments.

Parameters

fragment_b (fragment_a,) –

The fragments to find the distance between. The fragments must be represented as dicts with at least the following keys:

{
    'start': int,
    'end': int
}

Returns

The mid-to-mid distance

Return type

float

lib5c.util.bed.get_midpoint(fragment, force_int=False)[source]

Gets the midpoint of a fragment.

Parameters
  • fragment (Dict[str, Any]) –

    The fragment to find the midpoint of. The fragment must be represented as a dict with at least the following keys:

    {
        'start': int,
        'end': int
    }
    

  • force_int (bool) – Return an int rounded towards zero instead of a float.

Returns

The midpoint of the fragment, rounded towards zero if force_int is True.

Return type

float

Examples

>>> fragment = {'start': 50, 'end': 100}
>>> get_midpoint(fragment)
75.0
lib5c.util.bed.main()[source]
lib5c.util.bed.parse_feature_from_string(grange_string)[source]

Parses BED feature from a string specifying the genomic range.

Parameters

grange_string (str) – The genomic range to parse, specified as a string of the form <chrom>:<start>-<end>. The interval is interpreted as a BED interval (0-based index, half-open interval).

Returns

The BED feature dict, which has keys ‘chrom’, ‘start’, and ‘end’.

Return type

dict