lib5c.algorithms.thresholding module

lib5c.algorithms.thresholding.color_confusion(d)[source]

Extract the across-condition color confusion matrix.

Parameters

d (Dataset) – Dataset processed by two_way_thresholding().

Returns

The 2x2 confusion matrix.

Return type

np.ndarray

lib5c.algorithms.thresholding.concordance_confusion(d)[source]

Extract the within-condition concordance confusion matrices.

Parameters

d (Dataset) – Dataset processed by two_way_thresholding().

Returns

The keys are condition names, the values are the 2x2 confusion matrices.

Return type

dict

lib5c.algorithms.thresholding.count_clusters(d)[source]

Extract the final cluster counts.

Parameters

d (Dataset) – Dataset processed by two_way_thresholding() called with report_clusters=True.

Returns

The keys are the color names as strings, the values are integers representing the cluster counts.

Return type

dict

lib5c.algorithms.thresholding.filter_near_diagonal(df, distance=24000, drop=True)[source]

Drops rows from df where its ‘distance’ column is less than k.

Dropping occurs in-place.

Parameters
  • df (pd.DataFrame) – Must have a ‘distance’ column.

  • distance (int) – Threshold for distance (in bp).

  • drop (bool) – Pass True to drop the filtered rows in-place. Pass False to return an index subset for the filtered rows instead.

lib5c.algorithms.thresholding.kappa(d)[source]

Compute the Cohen’s kappa values between the replicates of each condition.

Parameters

d (Dataset) – Dataset processed by two_way_thresholding().

Returns

The keys are condition names, the values are the kappa values.

Return type

dict

lib5c.algorithms.thresholding.label_connected_components(colors, color)[source]

Labels the connected components of a specific loop color.

Parameters
  • colors (np.ndarray with string dtype) – The matrix of colors.

  • color (str) – The color to label.

Returns

Same size and shape as colors, entries are ints which are the labels

Return type

np.ndarray

Examples

>>> colors = np.array([['a', 'a', 'b', 'a'],
...                    ['a', 'a', 'b', 'b'],
...                    ['b', 'b', 'b', 'a'],
...                    ['a', 'b', 'a', 'a']])
>>> print(label_connected_components(colors, 'a'))
[[1 1 0 2]
 [1 1 0 0]
 [0 0 0 3]
 [2 0 3 3]]
lib5c.algorithms.thresholding.size_filter(calls, threshold)[source]

Removes calls which are in connected components smaller than a threshold.

Parameters
  • calls (np.ndarray) – Boolean matrix of calls.

  • threshold (int) – Connected components smaller than this will be removed.

Returns

The filtered calls.

Return type

np.ndarray

Examples

>>> calls = np.array([[ True,  True, False,  True],
...                   [ True,  True, False, False],
...                   [False, False, False,  True],
...                   [ True, False,  True,  True]])
>>> size_filter(calls, 3)
array([[ True,  True, False, False],
       [ True,  True, False, False],
       [False, False, False, False],
       [False, False, False, False]])
lib5c.algorithms.thresholding.two_way_thresholding(pvalues_superdict, primermap, conditions=None, significance_threshold=1e-15, bh_fdr=False, two_tail=False, concordant=False, distance_threshold=24000, size_threshold=3, background_threshold=0.6, report_clusters=True)[source]

All-in-one heavy-lifting function for thresholding.

Parameters
  • pvalues_superdict (dict of dict of np.ndarray) – The p-values to threshold.

  • primermap (primermap) – The primermap associated with the pvalues_superdict.

  • conditions (list of str, optional) – The list of condition names. Pass None to skip condition comparisons.

  • significance_threshold (float) – The p-value or q-value to threshold significance with.

  • bh_fdr (bool) – Pass True to apply multiple testing correction (BH-FDR) before checking the significance_threshold.

  • two_tail (bool) – If bh_fdr=True, pass True here to perform the BH-FDR on two-tailed p-values, but only report the significant right-tail events as loops. Note that two-tailed p-values are only accurate if p-values were called using a continuous distribution.

  • concordant (bool) – Pass True to report only those interactions which are significant in all replicates in each condition. Pass False to combine evidence from all replicates within each condition instead.

  • distance_threshold (int) – Interactions with interaction distance (in bp) shorter than this will not be called.

  • size_threshold (int) – Interactions within connected components smaller than this will not be called.

  • background_threshold (float, optional) – The p-value threshold to use to call a background loop class. Pass None to skip calling a background class.

  • report_clusters (bool) – Pass True to perform a second pass of connected component counting at the very end, reporting the numbers of clusters in each color category to the returned Dataset.

Returns

The results of the thresholding.

Return type

Dataset