lib5c.algorithms.clustering.knn module¶
Module for assembling clusters using an unassisted k-nearest neighbors heuristic.
-
lib5c.algorithms.clustering.knn.
classify_peak
(peak, neighbors, clusters, peak_to_clusters, automove=True, weighted=False)[source]¶ Assigns the most fitting cluster to a peak given a list of its neighbors.
- Parameters
peak (peak) – The peak to classify.
neighbors (list of peaks) – The peaks that should determine the query peak’s classification.
clusters (list of clusters) – A list of clusters to classify the query peak into.
peak_to_clusters (dict of (int, int) tuples) – The keys are (x, y) tuples that represent peak locations. The values are the indices of the cluster for that peak. If the value is -1, it indicates that the peak does not belong to any cluster.
automove (bool) – If True, the peak will automatically be moved to the appropriate cluster, or a new cluster will be appended to clusters. If False, the index of the appropriate target cluster will be returned. If there is no appropriate target cluster, -1 will be returned.
weighted (bool) – If True, weigh the votes using peak weight and distance. If False, treat the votes from each peak as equal.
- Returns
The index of the appropriate target cluster. This function has no return value unless automove is False.
- Return type
int (sometimes)
Notes
This function uses a simple unweighted voting heuristic to determine which existing cluster in clusters the query peak should be classified into. If no suitable existing cluster is found, a new cluster is created containing only the query peak. This cluster is then appended to clusters. Pass the kwarg weighted=True to use a weighted voting heuristic.
-
lib5c.algorithms.clustering.knn.
direction_score
(peak, neighbors)[source]¶ Calculates a direction-score for a peak given its neighbors.
- Parameters
peak (peak) – The query peak.
neighbors (list of peaks) – The query peak’s neighbors.
- Returns
The direction-score.
- Return type
float
Notes
Higher direction-scores are better.
-
lib5c.algorithms.clustering.knn.
distance_score
(neighbors)[source]¶ Calculates a distance-score for a peak given its neighbors.
- Parameters
neighbors (list of peaks) – The query peak’s neighbors.
- Returns
The distance-score.
- Return type
float
Notes
Lower distance-scores are better.
-
lib5c.algorithms.clustering.knn.
get_knn
(peak, peaks, k)[source]¶ Given a list of peaks and a query peak, returns the k nearest neighbors of the query peak.
- Parameters
peak (peak) – The query peak to for which nearest neighbors should be identified.
peaks (list of peaks) – The peaks that are candidates to be nearest neighbors.
k (int) –
- Returns
The k nearest neighbors of the query peak among peaks. If fewer than k peaks were provided, the length of this list will be shorter than k.
- Return type
list of peaks
Notes
peak may be present in peaks, but it will not be returned as a neighbor.
-
lib5c.algorithms.clustering.knn.
make_clusters
(peaks, k=8, dist_score=5, dist_k=8, dir_score=0.3, dir_k=8)[source]¶ Performs k-nearest neighbors clustering of peaks.
- Parameters
k (int) – The number of nearest neighbors to consider when clustering.
dist_score (float) – The distance-score threshold to use when clustering.
dist_k (int) – The number of nearest neighbors to consider when calculating the distance score.
dir_score (float) – The direction-score threshold to use when clustering.
dir_k (int) – The number of nearest neighbors to consider when calculating the direction score.
- Returns
A tuple whose first element is the list of merged clusters, whose second element is the list of peaks that did not pass the distance score threshold, and whose third element is the list of peaks that did not pass the direction score threshold.
- Return type
list of clusters, list of peaks, list of peaks