lib5c.algorithms.clustering.valley module

Module for splitting clusters using a valley heuristic.

lib5c.algorithms.clustering.valley.split_cluster(parent_cluster, guides, reassign=1.0)[source]

Splits a cluster using the valley heuristic.

Parameters
  • parent_cluster (cluster) – The cluster to split.

  • guides (list of clusters) – The guides used to seed the child clusters. See lib5c.algorithms.clustering.valley.split_clusters().

  • reassign (float between 0.0 and 1.0) – When a cluster is split, the peaks in the parent cluster with pvalues above this threshold may be discarded instead of being assigned to one or the other of the child clusters. When the value of this kwarg is 1.0, no peaks are discarded and all peaks are reassigned to one or the other of the child clusters. When the value of this kwarg is 0.0, all peaks are discarded and no peaks are reassigned to the child clusters.

Returns

The child clusters of the parent cluster.

Return type

list of clusters

Notes

See lib5c.algorithms.clustering.valley.split_clusters().

lib5c.algorithms.clustering.valley.split_clusters(clusters, reassign=1.0, size_threshold=3)[source]

Splits all clusters in a list of clusters using a recursive valley-splitting heuristic.

Parameters
  • clusters (list of clusters) – The list of clusters to split.

  • reassign (float between 0.0 and 1.0) – When a cluster is split, the peaks in the parent cluster with pvalues above this threshold may be discarded instead of being assigned to one or the other of the child clusters. When the value of this kwarg is 1.0, no peaks are discarded and all peaks are reassigned to one or the other of the child clusters. When the value of this kwarg is 0.0, all peaks are discarded and no peaks are reassigned to the child clusters.

  • size_threshold (int) – The minimum size of child clusters that will be created by the splitting. If a splitting operation would result in clusters smaller than this number, that splitting operation will not be performed.

Returns

The split clusters.

Return type

list of clusters

Notes

A cluster will get split if there exists a p-value such that thresholding the cluster at that p-value results in the creation of at least two contiguous groups of high-confidence peaks (p-value below the threshold) larger than the size_threshold separated by a “valley” of low-confidence peaks (p-value above the threshold). This rule is applied recursively until only unsplittable, or “atomic”, clusters remain.