lib5c.algorithms.clustering.valley module¶
Module for splitting clusters using a valley heuristic.
-
lib5c.algorithms.clustering.valley.
split_cluster
(parent_cluster, guides, reassign=1.0)[source]¶ Splits a cluster using the valley heuristic.
- Parameters
parent_cluster (cluster) – The cluster to split.
guides (list of clusters) – The guides used to seed the child clusters. See
lib5c.algorithms.clustering.valley.split_clusters()
.reassign (float between 0.0 and 1.0) – When a cluster is split, the peaks in the parent cluster with pvalues above this threshold may be discarded instead of being assigned to one or the other of the child clusters. When the value of this kwarg is 1.0, no peaks are discarded and all peaks are reassigned to one or the other of the child clusters. When the value of this kwarg is 0.0, all peaks are discarded and no peaks are reassigned to the child clusters.
- Returns
The child clusters of the parent cluster.
- Return type
list of clusters
Notes
See
lib5c.algorithms.clustering.valley.split_clusters()
.
-
lib5c.algorithms.clustering.valley.
split_clusters
(clusters, reassign=1.0, size_threshold=3)[source]¶ Splits all clusters in a list of clusters using a recursive valley-splitting heuristic.
- Parameters
clusters (list of clusters) – The list of clusters to split.
reassign (float between 0.0 and 1.0) – When a cluster is split, the peaks in the parent cluster with pvalues above this threshold may be discarded instead of being assigned to one or the other of the child clusters. When the value of this kwarg is 1.0, no peaks are discarded and all peaks are reassigned to one or the other of the child clusters. When the value of this kwarg is 0.0, all peaks are discarded and no peaks are reassigned to the child clusters.
size_threshold (int) – The minimum size of child clusters that will be created by the splitting. If a splitting operation would result in clusters smaller than this number, that splitting operation will not be performed.
- Returns
The split clusters.
- Return type
list of clusters
Notes
A cluster will get split if there exists a p-value such that thresholding the cluster at that p-value results in the creation of at least two contiguous groups of high-confidence peaks (p-value below the threshold) larger than the
size_threshold
separated by a “valley” of low-confidence peaks (p-value above the threshold). This rule is applied recursively until only unsplittable, or “atomic”, clusters remain.