lib5c.algorithms.outliers module¶
Module for identifying and removing high spatial outliers from 5C contact matrices.
-
lib5c.algorithms.outliers.
flag_array_high_spatial_outliers
(array, size=5, fold_threshold=8.0)[source]¶ Identifies which elements of an array are high spatial outliers.
Parameters: - array (np.ndarray) – The array to look for outliers in.
- size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
- fold_threshold (float) – Elements will be flagged as outliers if they are greater than this
number or greater than this many times the local median (as estimated
using the window size in
size
).
Returns: A matrix of the same size and shape as the input matrix, with 1’s at positions flagged as high spatial outliers and 0’s everywhere else.
Return type: np.ndarray
-
lib5c.algorithms.outliers.
remove_high_spatial_outliers
(counts, size=5, fold_threshold=8.0, overwrite_value='nan', primermap=None, level='fragment')[source]¶ Convenience function for removing high spatial outliers from counts matrices.
Parameters: - counts (np.ndarray) – The matrix to remove outliers from.
- size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
- fold_threshold (float) – Elements will be flagged as outliers if they are greater than this
number or greater than this many times the local median (as estimated
using the window size in
size
). - overwrite_value ({'nan', 'zero', 'median'}) – The value to overwrite elements flagged as outliers with.
- primermap (List[Dict[str, Any]]) – The list of fragments for this region corresponding to
counts
. - level ({'fragment', 'bin'}) – Whether to interpret
counts
as bin- or fragment-level. The difference is that bin-level matrices are assumed to have equal distance between elements.
Returns: The input matrix with all spatial outliers overwritten.h
Return type: np.ndarray
-
lib5c.algorithms.outliers.
remove_primer_primer_pairs
(counts_superdict, primermap, threshold=5.0, num_reps=None, fraction_reps=None, all_reps=False, inplace=True)[source]¶ Removes primer-primer pairs from a set of replicates according to criteria specified by the kwargs.
Legacy code inherited from https://bitbucket.org/creminslab/primer-primer-pair-remover
Parameters: - counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict data structure to remove primer-primer pairs from.
- primermap (Dict[str, List[Dict[str, Any]]]) – The primermap or pixelmap describing the loci whose interaction
frequencies are quantified in the
counts_superdict
. - threshold (float) – Sets the threshold. A rep passes the threshold if it is greater than or equal to this number.
- num_reps (Optional[int]) – Pass an int to make the condition be that this many reps must clear the threshold.
- fraction_reps (Optional[float]) – Pass a fraction (between 0 and 1) as a float to make the condition be that this fraction of the reps must clear the threshold.
- all_reps (bool) – Pass True to make the condition be that the sum across all replicates
must clear the threshold. This is the default mode if niether
num_reps
norpercentage_reps
is passed. - inplace (bool) – Pass True to operate in-place on the passed
counts_superdict
; pass False to return a new counts superdict.
Returns: The result of the primer-primer pair removal, in the form of a counts superdict data structure analagous to the
counts_superdict
passed to this function.Return type: Dict[str, Dict[str, np.ndarray]]