lib5c.algorithms.outliers module¶
Module for identifying and removing high spatial outliers from 5C contact matrices.
-
lib5c.algorithms.outliers.
flag_array_high_spatial_outliers
(array, size=5, fold_threshold=8.0)[source]¶ Identifies which elements of an array are high spatial outliers.
- Parameters
array (np.ndarray) – The array to look for outliers in.
size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in
size
).
- Returns
A matrix of the same size and shape as the input matrix, with 1’s at positions flagged as high spatial outliers and 0’s everywhere else.
- Return type
np.ndarray
-
lib5c.algorithms.outliers.
remove_high_spatial_outliers
(counts, size=5, fold_threshold=8.0, overwrite_value='nan', primermap=None, level='fragment')[source]¶ Convenience function for removing high spatial outliers from counts matrices.
- Parameters
counts (np.ndarray) – The matrix to remove outliers from.
size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in
size
).overwrite_value ({'nan', 'zero', 'median'}) – The value to overwrite elements flagged as outliers with.
primermap (List[Dict[str, Any]]) – The list of fragments for this region corresponding to
counts
.level ({'fragment', 'bin'}) – Whether to interpret
counts
as bin- or fragment-level. The difference is that bin-level matrices are assumed to have equal distance between elements.
- Returns
The input matrix with all spatial outliers overwritten.h
- Return type
np.ndarray
-
lib5c.algorithms.outliers.
remove_primer_primer_pairs
(counts_superdict, primermap, threshold=5.0, num_reps=None, fraction_reps=None, all_reps=False, inplace=True)[source]¶ Removes primer-primer pairs from a set of replicates according to criteria specified by the kwargs.
Legacy code inherited from https://bitbucket.org/creminslab/primer-primer-pair-remover
- Parameters
counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict data structure to remove primer-primer pairs from.
primermap (Dict[str, List[Dict[str, Any]]]) – The primermap or pixelmap describing the loci whose interaction frequencies are quantified in the
counts_superdict
.threshold (float) – Sets the threshold. A rep passes the threshold if it is greater than or equal to this number.
num_reps (Optional[int]) – Pass an int to make the condition be that this many reps must clear the threshold.
fraction_reps (Optional[float]) – Pass a fraction (between 0 and 1) as a float to make the condition be that this fraction of the reps must clear the threshold.
all_reps (bool) – Pass True to make the condition be that the sum across all replicates must clear the threshold. This is the default mode if niether
num_reps
norpercentage_reps
is passed.inplace (bool) – Pass True to operate in-place on the passed
counts_superdict
; pass False to return a new counts superdict.
- Returns
The result of the primer-primer pair removal, in the form of a counts superdict data structure analagous to the
counts_superdict
passed to this function.- Return type
Dict[str, Dict[str, np.ndarray]]