lib5c.algorithms.outliers module¶
Module for identifying and removing high spatial outliers from 5C contact matrices.
-
lib5c.algorithms.outliers.flag_array_high_spatial_outliers(array, size=5, fold_threshold=8.0)[source]¶ Identifies which elements of an array are high spatial outliers.
- Parameters
array (np.ndarray) – The array to look for outliers in.
size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in
size).
- Returns
A matrix of the same size and shape as the input matrix, with 1’s at positions flagged as high spatial outliers and 0’s everywhere else.
- Return type
np.ndarray
-
lib5c.algorithms.outliers.remove_high_spatial_outliers(counts, size=5, fold_threshold=8.0, overwrite_value='nan', primermap=None, level='fragment')[source]¶ Convenience function for removing high spatial outliers from counts matrices.
- Parameters
counts (np.ndarray) – The matrix to remove outliers from.
size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in
size).overwrite_value ({'nan', 'zero', 'median'}) – The value to overwrite elements flagged as outliers with.
primermap (List[Dict[str, Any]]) – The list of fragments for this region corresponding to
counts.level ({'fragment', 'bin'}) – Whether to interpret
countsas bin- or fragment-level. The difference is that bin-level matrices are assumed to have equal distance between elements.
- Returns
The input matrix with all spatial outliers overwritten.h
- Return type
np.ndarray
-
lib5c.algorithms.outliers.remove_primer_primer_pairs(counts_superdict, primermap, threshold=5.0, num_reps=None, fraction_reps=None, all_reps=False, inplace=True)[source]¶ Removes primer-primer pairs from a set of replicates according to criteria specified by the kwargs.
Legacy code inherited from https://bitbucket.org/creminslab/primer-primer-pair-remover
- Parameters
counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict data structure to remove primer-primer pairs from.
primermap (Dict[str, List[Dict[str, Any]]]) – The primermap or pixelmap describing the loci whose interaction frequencies are quantified in the
counts_superdict.threshold (float) – Sets the threshold. A rep passes the threshold if it is greater than or equal to this number.
num_reps (Optional[int]) – Pass an int to make the condition be that this many reps must clear the threshold.
fraction_reps (Optional[float]) – Pass a fraction (between 0 and 1) as a float to make the condition be that this fraction of the reps must clear the threshold.
all_reps (bool) – Pass True to make the condition be that the sum across all replicates must clear the threshold. This is the default mode if niether
num_repsnorpercentage_repsis passed.inplace (bool) – Pass True to operate in-place on the passed
counts_superdict; pass False to return a new counts superdict.
- Returns
The result of the primer-primer pair removal, in the form of a counts superdict data structure analagous to the
counts_superdictpassed to this function.- Return type
Dict[str, Dict[str, np.ndarray]]