lib5c.algorithms.outliers module

Module for identifying and removing high spatial outliers from 5C contact matrices.

lib5c.algorithms.outliers.flag_array_high_spatial_outliers(array, size=5, fold_threshold=8.0)[source]

Identifies which elements of an array are high spatial outliers.

Parameters
  • array (np.ndarray) – The array to look for outliers in.

  • size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.

  • fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in size).

Returns

A matrix of the same size and shape as the input matrix, with 1’s at positions flagged as high spatial outliers and 0’s everywhere else.

Return type

np.ndarray

lib5c.algorithms.outliers.remove_high_spatial_outliers(counts, size=5, fold_threshold=8.0, overwrite_value='nan', primermap=None, level='fragment')[source]

Convenience function for removing high spatial outliers from counts matrices.

Parameters
  • counts (np.ndarray) – The matrix to remove outliers from.

  • size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.

  • fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in size).

  • overwrite_value ({'nan', 'zero', 'median'}) – The value to overwrite elements flagged as outliers with.

  • primermap (List[Dict[str, Any]]) – The list of fragments for this region corresponding to counts.

  • level ({'fragment', 'bin'}) – Whether to interpret counts as bin- or fragment-level. The difference is that bin-level matrices are assumed to have equal distance between elements.

Returns

The input matrix with all spatial outliers overwritten.h

Return type

np.ndarray

lib5c.algorithms.outliers.remove_primer_primer_pairs(counts_superdict, primermap, threshold=5.0, num_reps=None, fraction_reps=None, all_reps=False, inplace=True)[source]

Removes primer-primer pairs from a set of replicates according to criteria specified by the kwargs.

Legacy code inherited from https://bitbucket.org/creminslab/primer-primer-pair-remover

Parameters
  • counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict data structure to remove primer-primer pairs from.

  • primermap (Dict[str, List[Dict[str, Any]]]) – The primermap or pixelmap describing the loci whose interaction frequencies are quantified in the counts_superdict.

  • threshold (float) – Sets the threshold. A rep passes the threshold if it is greater than or equal to this number.

  • num_reps (Optional[int]) – Pass an int to make the condition be that this many reps must clear the threshold.

  • fraction_reps (Optional[float]) – Pass a fraction (between 0 and 1) as a float to make the condition be that this fraction of the reps must clear the threshold.

  • all_reps (bool) – Pass True to make the condition be that the sum across all replicates must clear the threshold. This is the default mode if niether num_reps nor percentage_reps is passed.

  • inplace (bool) – Pass True to operate in-place on the passed counts_superdict; pass False to return a new counts superdict.

Returns

The result of the primer-primer pair removal, in the form of a counts superdict data structure analagous to the counts_superdict passed to this function.

Return type

Dict[str, Dict[str, np.ndarray]]