lib5c.algorithms.thresholding module¶
-
lib5c.algorithms.thresholding.
color_confusion
(d)[source]¶ Extract the across-condition color confusion matrix.
- Parameters
d (Dataset) – Dataset processed by two_way_thresholding().
- Returns
The 2x2 confusion matrix.
- Return type
np.ndarray
-
lib5c.algorithms.thresholding.
concordance_confusion
(d)[source]¶ Extract the within-condition concordance confusion matrices.
- Parameters
d (Dataset) – Dataset processed by two_way_thresholding().
- Returns
The keys are condition names, the values are the 2x2 confusion matrices.
- Return type
dict
-
lib5c.algorithms.thresholding.
count_clusters
(d)[source]¶ Extract the final cluster counts.
- Parameters
d (Dataset) – Dataset processed by two_way_thresholding() called with report_clusters=True.
- Returns
The keys are the color names as strings, the values are integers representing the cluster counts.
- Return type
dict
-
lib5c.algorithms.thresholding.
filter_near_diagonal
(df, distance=24000, drop=True)[source]¶ Drops rows from df where its ‘distance’ column is less than k.
Dropping occurs in-place.
- Parameters
df (pd.DataFrame) – Must have a ‘distance’ column.
distance (int) – Threshold for distance (in bp).
drop (bool) – Pass True to drop the filtered rows in-place. Pass False to return an index subset for the filtered rows instead.
-
lib5c.algorithms.thresholding.
kappa
(d)[source]¶ Compute the Cohen’s kappa values between the replicates of each condition.
- Parameters
d (Dataset) – Dataset processed by two_way_thresholding().
- Returns
The keys are condition names, the values are the kappa values.
- Return type
dict
-
lib5c.algorithms.thresholding.
label_connected_components
(colors, color)[source]¶ Labels the connected components of a specific loop color.
- Parameters
colors (np.ndarray with string dtype) – The matrix of colors.
color (str) – The color to label.
- Returns
Same size and shape as colors, entries are ints which are the labels
- Return type
np.ndarray
Examples
>>> colors = np.array([['a', 'a', 'b', 'a'], ... ['a', 'a', 'b', 'b'], ... ['b', 'b', 'b', 'a'], ... ['a', 'b', 'a', 'a']]) >>> print(label_connected_components(colors, 'a')) [[1 1 0 2] [1 1 0 0] [0 0 0 3] [2 0 3 3]]
-
lib5c.algorithms.thresholding.
size_filter
(calls, threshold)[source]¶ Removes calls which are in connected components smaller than a threshold.
- Parameters
calls (np.ndarray) – Boolean matrix of calls.
threshold (int) – Connected components smaller than this will be removed.
- Returns
The filtered calls.
- Return type
np.ndarray
Examples
>>> calls = np.array([[ True, True, False, True], ... [ True, True, False, False], ... [False, False, False, True], ... [ True, False, True, True]]) >>> size_filter(calls, 3) array([[ True, True, False, False], [ True, True, False, False], [False, False, False, False], [False, False, False, False]])
-
lib5c.algorithms.thresholding.
two_way_thresholding
(pvalues_superdict, primermap, conditions=None, significance_threshold=1e-15, bh_fdr=False, two_tail=False, concordant=False, distance_threshold=24000, size_threshold=3, background_threshold=0.6, report_clusters=True)[source]¶ All-in-one heavy-lifting function for thresholding.
- Parameters
pvalues_superdict (dict of dict of np.ndarray) – The p-values to threshold.
primermap (primermap) – The primermap associated with the pvalues_superdict.
conditions (list of str, optional) – The list of condition names. Pass None to skip condition comparisons.
significance_threshold (float) – The p-value or q-value to threshold significance with.
bh_fdr (bool) – Pass True to apply multiple testing correction (BH-FDR) before checking the
significance_threshold
.two_tail (bool) – If
bh_fdr=True
, pass True here to perform the BH-FDR on two-tailed p-values, but only report the significant right-tail events as loops. Note that two-tailed p-values are only accurate if p-values were called using a continuous distribution.concordant (bool) – Pass True to report only those interactions which are significant in all replicates in each condition. Pass False to combine evidence from all replicates within each condition instead.
distance_threshold (int) – Interactions with interaction distance (in bp) shorter than this will not be called.
size_threshold (int) – Interactions within connected components smaller than this will not be called.
background_threshold (float, optional) – The p-value threshold to use to call a background loop class. Pass None to skip calling a background class.
report_clusters (bool) – Pass True to perform a second pass of connected component counting at the very end, reporting the numbers of clusters in each color category to the returned Dataset.
- Returns
The results of the thresholding.
- Return type