lib5c.algorithms.clustering.knn module

Module for assembling clusters using an unassisted k-nearest neighbors heuristic.

lib5c.algorithms.clustering.knn.classify_peak(peak, neighbors, clusters, peak_to_clusters, automove=True, weighted=False)[source]

Assigns the most fitting cluster to a peak given a list of its neighbors.

Parameters
  • peak (peak) – The peak to classify.

  • neighbors (list of peaks) – The peaks that should determine the query peak’s classification.

  • clusters (list of clusters) – A list of clusters to classify the query peak into.

  • peak_to_clusters (dict of (int, int) tuples) – The keys are (x, y) tuples that represent peak locations. The values are the indices of the cluster for that peak. If the value is -1, it indicates that the peak does not belong to any cluster.

  • automove (bool) – If True, the peak will automatically be moved to the appropriate cluster, or a new cluster will be appended to clusters. If False, the index of the appropriate target cluster will be returned. If there is no appropriate target cluster, -1 will be returned.

  • weighted (bool) – If True, weigh the votes using peak weight and distance. If False, treat the votes from each peak as equal.

Returns

The index of the appropriate target cluster. This function has no return value unless automove is False.

Return type

int (sometimes)

Notes

This function uses a simple unweighted voting heuristic to determine which existing cluster in clusters the query peak should be classified into. If no suitable existing cluster is found, a new cluster is created containing only the query peak. This cluster is then appended to clusters. Pass the kwarg weighted=True to use a weighted voting heuristic.

lib5c.algorithms.clustering.knn.direction_score(peak, neighbors)[source]

Calculates a direction-score for a peak given its neighbors.

Parameters
  • peak (peak) – The query peak.

  • neighbors (list of peaks) – The query peak’s neighbors.

Returns

The direction-score.

Return type

float

Notes

Higher direction-scores are better.

lib5c.algorithms.clustering.knn.distance_score(neighbors)[source]

Calculates a distance-score for a peak given its neighbors.

Parameters

neighbors (list of peaks) – The query peak’s neighbors.

Returns

The distance-score.

Return type

float

Notes

Lower distance-scores are better.

lib5c.algorithms.clustering.knn.get_knn(peak, peaks, k)[source]

Given a list of peaks and a query peak, returns the k nearest neighbors of the query peak.

Parameters
  • peak (peak) – The query peak to for which nearest neighbors should be identified.

  • peaks (list of peaks) – The peaks that are candidates to be nearest neighbors.

  • k (int) –

Returns

The k nearest neighbors of the query peak among peaks. If fewer than k peaks were provided, the length of this list will be shorter than k.

Return type

list of peaks

Notes

peak may be present in peaks, but it will not be returned as a neighbor.

lib5c.algorithms.clustering.knn.make_clusters(peaks, k=8, dist_score=5, dist_k=8, dir_score=0.3, dir_k=8)[source]

Performs k-nearest neighbors clustering of peaks.

Parameters
  • k (int) – The number of nearest neighbors to consider when clustering.

  • dist_score (float) – The distance-score threshold to use when clustering.

  • dist_k (int) – The number of nearest neighbors to consider when calculating the distance score.

  • dir_score (float) – The direction-score threshold to use when clustering.

  • dir_k (int) – The number of nearest neighbors to consider when calculating the direction score.

Returns

A tuple whose first element is the list of merged clusters, whose second element is the list of peaks that did not pass the distance score threshold, and whose third element is the list of peaks that did not pass the direction score threshold.

Return type

list of clusters, list of peaks, list of peaks