lib5c.algorithms.filtering.unsmoothable_columns module¶
Module for identifying “unsmoothable columns” - sets of bins that don’t contain any non-zero fragments and are too wide to smooth over.
-
lib5c.algorithms.filtering.unsmoothable_columns.
find_prebinned_unsmoothable_columns
(regional_counts, regional_pixelmap, window_width)[source]¶ Identifies the unsmoothable columns in a region assuming that the smoothing was a filtering operation applied on data that was already bin-level.
- Parameters
regional_counts (np.ndarray) – The matrix of counts for this region.
regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.
window_width (int) – The width of the filtering window in base pairs.
- Returns
A list of boolean values with length equal to the number of bins in the region. The
i
th element of this list is True if thei
th bin in the region is an “unsmoothable column”.- Return type
List[bool]
-
lib5c.algorithms.filtering.unsmoothable_columns.
find_unsmoothable_columns
(regional_primermap, regional_pixelmap, window_width, upstream_primer_mapping=None, midpoint=False)[source]¶ Identifies the unsmoothable columns in a region assuming that the smoothing was a filtering operation applied on fragment-level data.
- Parameters
regional_primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.
regional_pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.
window_width (int) – The width of the filtering window in base pairs.
upstream_primer_mapping (Dict[int, int]) – A mapping from each bin index to the index of its nearest upstream primer. See
lib5c.algorithms.filtering.fragment_bin_filtering .find_upstream_primers()
.midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.
- Returns
A list of boolean values with length equal to the number of bins in the region. The
i
th element of this list is True if thei
th bin in the region is an “unsmoothable column”.- Return type
List[bool]
-
lib5c.algorithms.filtering.unsmoothable_columns.
unsmoothable_column_threshold_heuristic
(window_width, bin_step)[source]¶ This function defines the heuristic that determines how long a run of fragment-less bins must be before it is considered “unsmoothable”.
- Parameters
window_width (int) – The width of the filtering window in base pairs.
bin_step (int) – The “sampling rate” or “bin step”.
- Returns
The maximum length of a run of fragment-less bins must be before it is considered “unsmoothable”.
- Return type
int
-
lib5c.algorithms.filtering.unsmoothable_columns.
wipe_prebinned_unsmoothable_columns
(smoothed_counts, prebinned_counts, pixelmap, window_width)[source]¶ Convenience function for wiping the unsmoothable columns in a binned counts matrix assuming that the smoothing was a filtering operation applied on bin-level data.
- Parameters
smoothed_counts (np.ndarray) – The matrix of smoothed counts to wipe unsmoothable columns from.
prebinned_counts (np.ndarray) – The original binned counts matrix to use to identify zero-count columns.
pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.
window_width (int) – The width of the filtering window in base pairs.
- Returns
The wiped matrix of binned counts.
- Return type
np.ndarray
-
lib5c.algorithms.filtering.unsmoothable_columns.
wipe_unsmoothable_columns
(binned_counts, primermap, pixelmap, window_width, midpoint=False)[source]¶ Convenience function for wiping the unsmoothable columns in a binned counts matrix assuming that the smoothing was a filtering operation applied on fragment-level data.
- Parameters
binned_counts (np.ndarray) – The matrix of binned counts to wipe unsmoothable columns from.
primermap (List[Dict[str, Any]]) – The primermap describing the primers for this region.
pixelmap (List[Dict[str, Any]]) – The pixelmap describing the bins for this region.
window_width (int) – The width of the filtering window in base pairs.
midpoint (bool) – Pass True to restore legacy behavior when distances to fragments were based on their midpoints. The new behavior (with this kwarg set to False) is to compute distances to fragments based on their closest endpoint.
- Returns
The wiped matrix of binned counts.
- Return type
np.ndarray