lib5c.algorithms.filtering.fragment_bin_filtering module

lib5c.algorithms.filtering.fragment_bin_filtering.find_nearby_fragments(index, regional_pixelmap, regional_primermap, upstream_primer_mapping, threshold, midpoint=False)[source]

Finds the primers near a target bin as specified by an index.

Parameters
  • index (int) – The index of the bin to look near.

  • regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • regional_primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.

  • upstream_primer_mapping (Dict[int, int]) – A mapping from each bin index to the index of its nearest upstream primer. See lib5c.algorithms.filtering.fragment_bin_filtering .find_upstream_primers().

  • threshold (int) – The threshold for deciding if a fragment is “nearby” or not, as a distance in base pairs.

  • midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.

Returns

A list of nearby fragments, where each nearby fragment is represented as a dict of the following form:

{
    'index': int,
    'distance': int
}

where ‘index’ is the index of the fragment within the region and ‘distance’ is the distance between this fragment and the target bin.

Return type

List[Dict[str, int]]

lib5c.algorithms.filtering.fragment_bin_filtering.find_upstream_primers(regional_pixelmap, regional_primermap)[source]

Creates a mapping from a bin index to the index of its nearest upstream primer.

Parameters
  • regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • regional_primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.

Returns

A map from each bin index to the index of its nearest upstream primer.

Return type

Dict[int, int]

lib5c.algorithms.filtering.fragment_bin_filtering.fragment_bin_filter(array, filter_function, regional_pixelmap, regional_primermap, threshold, filter_kwargs=None, midpoint=False)[source]

Convenience function for filtering a fragment-level matrix to a bin-level matrix.

Parameters
  • array (np.ndarray) – The matrix to filter.

  • filter_function (Callable[[List[Dict[str, Any]]], float]) –

    The filter function to use when filtering. This function should take in a “neighborhood” and return the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:

    {
        'value': float,
        'x_dist': int,
        'y_dist': int
    }
    

    where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs. See lib5c.algorithms.filtering.filter_functions for examples of filter functions and how they can be created.

  • regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • regional_primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.

  • threshold (int) – The threshold for defining the size of the neighborhood passed to the filter function, in base pairs.

  • filter_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to the filter_function.

  • midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.

Returns

The filtered matrix.

Return type

np.ndarray

lib5c.algorithms.filtering.fragment_bin_filtering.fragment_bin_filter_counts(counts, function, pixelmap, primermap, threshold, function_kwargs=None, midpoint=False)[source]

Non-parallel wrapper for fragment_bin_filter(). Deprecated now that fragment_bin_filter() is decorated with @parallelize_regions.

Parameters
  • counts (Dict[str, np.ndarray]) – The counts dict to filter.

  • function (Callable[[List[Dict[str, Any]]], float]) – The filter function to use for filtering. See the description of the filter_function arg in fragment_bin_filter().

  • pixelmap (Dict[str, List[Dict[str, Any]]]) – The pixelmap describing the bins.

  • primermap (Dict[str, List[Dict[str, Any]]]) – The primermap describing the fragments.

  • threshold (int) – The threshold for defining the size of the neighborhood passed to the filter function, in base pairs.

  • function_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to the function.

  • midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.

Returns

The dict of filtered counts.

Return type

Dict[str, np.ndarray]