lib5c.algorithms.outliers module

Module for identifying and removing high spatial outliers from 5C contact matrices.

lib5c.algorithms.outliers.flag_array_high_spatial_outliers(array, size=5, fold_threshold=8.0)[source]

Identifies which elements of an array are high spatial outliers.

  • array (np.ndarray) – The array to look for outliers in.
  • size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
  • fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in size).

A matrix of the same size and shape as the input matrix, with 1’s at positions flagged as high spatial outliers and 0’s everywhere else.

Return type:


lib5c.algorithms.outliers.remove_high_spatial_outliers(counts, size=5, fold_threshold=8.0, overwrite_value='nan', primermap=None, level='fragment')[source]

Convenience function for removing high spatial outliers from counts matrices.

  • counts (np.ndarray) – The matrix to remove outliers from.
  • size (int) – The size of the window to look in around each element when deciding if it is an outlier. Should be an odd integer.
  • fold_threshold (float) – Elements will be flagged as outliers if they are greater than this number or greater than this many times the local median (as estimated using the window size in size).
  • overwrite_value ({'nan', 'zero', 'median'}) – The value to overwrite elements flagged as outliers with.
  • primermap (List[Dict[str, Any]]) – The list of fragments for this region corresponding to counts.
  • level ({'fragment', 'bin'}) – Whether to interpret counts as bin- or fragment-level. The difference is that bin-level matrices are assumed to have equal distance between elements.

The input matrix with all spatial outliers overwritten.h

Return type:


lib5c.algorithms.outliers.remove_primer_primer_pairs(counts_superdict, primermap, threshold=5.0, num_reps=None, fraction_reps=None, all_reps=False, inplace=True)[source]

Removes primer-primer pairs from a set of replicates according to criteria specified by the kwargs.

Legacy code inherited from

  • counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict data structure to remove primer-primer pairs from.
  • primermap (Dict[str, List[Dict[str, Any]]]) – The primermap or pixelmap describing the loci whose interaction frequencies are quantified in the counts_superdict.
  • threshold (float) – Sets the threshold. A rep passes the threshold if it is greater than or equal to this number.
  • num_reps (Optional[int]) – Pass an int to make the condition be that this many reps must clear the threshold.
  • fraction_reps (Optional[float]) – Pass a fraction (between 0 and 1) as a float to make the condition be that this fraction of the reps must clear the threshold.
  • all_reps (bool) – Pass True to make the condition be that the sum across all replicates must clear the threshold. This is the default mode if niether num_reps nor percentage_reps is passed.
  • inplace (bool) – Pass True to operate in-place on the passed counts_superdict; pass False to return a new counts superdict.

The result of the primer-primer pair removal, in the form of a counts superdict data structure analagous to the counts_superdict passed to this function.

Return type:

Dict[str, Dict[str, np.ndarray]]