lib5c.algorithms.filtering.unsmoothable_columns module

Module for identifying “unsmoothable columns” - sets of bins that don’t contain any non-zero fragments and are too wide to smooth over.

lib5c.algorithms.filtering.unsmoothable_columns.find_prebinned_unsmoothable_columns(regional_counts, regional_pixelmap, window_width)[source]

Identifies the unsmoothable columns in a region assuming that the smoothing was a filtering operation applied on data that was already bin-level.

Parameters
  • regional_counts (np.ndarray) – The matrix of counts for this region.

  • regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • window_width (int) – The width of the filtering window in base pairs.

Returns

A list of boolean values with length equal to the number of bins in the region. The i th element of this list is True if the i th bin in the region is an “unsmoothable column”.

Return type

List[bool]

lib5c.algorithms.filtering.unsmoothable_columns.find_unsmoothable_columns(regional_primermap, regional_pixelmap, window_width, upstream_primer_mapping=None, midpoint=False)[source]

Identifies the unsmoothable columns in a region assuming that the smoothing was a filtering operation applied on fragment-level data.

Parameters
  • regional_primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.

  • regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • window_width (int) – The width of the filtering window in base pairs.

  • upstream_primer_mapping (Dict[int, int]) – A mapping from each bin index to the index of its nearest upstream primer. See lib5c.algorithms.filtering.fragment_bin_filtering .find_upstream_primers().

  • midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.

Returns

A list of boolean values with length equal to the number of bins in the region. The i th element of this list is True if the i th bin in the region is an “unsmoothable column”.

Return type

List[bool]

lib5c.algorithms.filtering.unsmoothable_columns.unsmoothable_column_threshold_heuristic(window_width, bin_step)[source]

This function defines the heuristic that determines how long a run of fragment-less bins must be before it is considered “unsmoothable”.

Parameters
  • window_width (int) – The width of the filtering window in base pairs.

  • bin_step (int) – The “sampling rate” or “bin step”.

Returns

The maximum length of a run of fragment-less bins must be before it is considered “unsmoothable”.

Return type

int

lib5c.algorithms.filtering.unsmoothable_columns.wipe_prebinned_unsmoothable_columns(smoothed_counts, prebinned_counts, pixelmap, window_width)[source]

Convenience function for wiping the unsmoothable columns in a binned counts matrix assuming that the smoothing was a filtering operation applied on bin-level data.

Parameters
  • smoothed_counts (np.ndarray) – The matrix of smoothed counts to wipe unsmoothable columns from.

  • prebinned_counts (np.ndarray) – The original binned counts matrix to use to identify zero-count columns.

  • pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • window_width (int) – The width of the filtering window in base pairs.

Returns

The wiped matrix of binned counts.

Return type

np.ndarray

lib5c.algorithms.filtering.unsmoothable_columns.wipe_unsmoothable_columns(binned_counts, primermap, pixelmap, window_width, midpoint=False)[source]

Convenience function for wiping the unsmoothable columns in a binned counts matrix assuming that the smoothing was a filtering operation applied on fragment-level data.

Parameters
  • binned_counts (np.ndarray) – The matrix of binned counts to wipe unsmoothable columns from.

  • primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.

  • pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.

  • window_width (int) – The width of the filtering window in base pairs.

  • midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.

Returns

The wiped matrix of binned counts.

Return type

np.ndarray