lib5c.algorithms.clustering.util module¶
Module containing utility functions for clustering 5C interactions.
-
lib5c.algorithms.clustering.util.
array_index_to_peaks
(idx)[source]¶ Convert a dense boolean array to a sparse list of peaks.
- Parameters
idx (np.ndarray) – Boolean array to convert.
- Returns
The peaks.
- Return type
list of peaks
-
lib5c.algorithms.clustering.util.
belongs_to
(peak, cluster)[source]¶ Checks if a peak belongs to a cluster.
- Parameters
peak (peak) – The query peak.
cluster (cluster) – The cluster to search for it in.
- Returns
True if peak belongs to cluster, False otherwise.
- Return type
bool
-
lib5c.algorithms.clustering.util.
belongs_to_which
(peak, clusters)[source]¶ Identifies which cluster out of a list of clusters, if any, a peak belongs to.
- Parameters
peak (peak) – The query peak to consider.
clusters (list of clusters) – The clusters to look for the query peak in.
- Returns
The index of the cluster within the list of clusters which contains the query peak, or -1 if no cluster in the list of clusters contains the query peak.
- Return type
int
-
lib5c.algorithms.clustering.util.
center_of_mass
(cluster)[source]¶ Computes the center of mass, or centroid, of a cluster.
- Parameters
cluster (cluster) – The cluster to consider.
- Returns
The centroid of the cluster.
- Return type
1D numpy array of length 2
Notes
For the purpose of this calculation, the mass of a peak is taken to be its value.
-
lib5c.algorithms.clustering.util.
clusters_to_array
(clusters, size)[source]¶ Assembles clusters into a 2-D array for plotting on a heatmap.
- Parameters
clusters (list of clusters) – The clusters to be converted to a 2-D array.
size (int) – The height and width of the array to generate. This should be equal to the number of bins in the region.
- Returns
A 2-D array with each clusters having been assigned a different sequential integer value for all its pixels. The next consecutive integer is a gap value, and then the one after that is a default value for all pixels not in a cluster.
- Return type
2-D array
Notes
It is recommended to plot the resulting array using a rapidly-changing colorscale such as plt.get_cmap(‘gist_ncar’).
-
lib5c.algorithms.clustering.util.
flatten_clusters
(clusters)[source]¶ Flattens a list of clusters to a flat list of peaks.
- Parameters
clusters (list of list of peaks) – The clusters to flatten.
- Returns
The flattened peaks.
- Return type
list of peaks
-
lib5c.algorithms.clustering.util.
get_cluster
(x, y, clusters)[source]¶ Identifies which cluster, if any, a specified point belongs to.
- Parameters
x (int) – x-coordinate of the point to consider.
y (int) – y-coordinate of the point to consider.
clusters (list of clusters) – List of clusters to search.
- Returns
The index of the cluster which contains the point (x,y). If no cluster in the list contains the point, the value is -1.
- Return type
int
-
lib5c.algorithms.clustering.util.
get_vector
(peak)[source]¶ Gets an array representing a peak’s location.
- Parameters
peak (peak) –
- Returns
The peak’s location as an (x, y) ordered pair.
- Return type
1D numpy array of length 2
-
lib5c.algorithms.clustering.util.
ident
(peak1, peak2)[source]¶ Checks whether two peaks are the same peak.
- Parameters
peak1 (peak) –
peak2 (peak) –
- Returns
True if the peaks are the same peak, False otherwise.
- Return type
bool
-
lib5c.algorithms.clustering.util.
identify_nearby_clusters
(cluster, clusters)[source]¶ Figures out which other clusters from a list of clusters are adjacent to a query cluster.
- Parameters
cluster (cluster) – The query cluster to consider.
clusters (list of clusters) – The clusters to check for adjacency to the query cluster.
- Returns
The indices of clusters within the list of clusters that were found to be adjacent to the query cluster.
- Return type
list of int
Notes
This may include duplicates. To get rid of them, just use:
set(identify_nearby_clusters(cluster, clusters))
To identify one nearby cluster at random, use:
nearby_clusters = identify_nearby_clusters(cluster, clusters) if nearby_clusters: nearby_clusters[0]
To see what clusters are near a single peak, use:
identify_nearby_clusters([peak], clusters)
If cluster is in clusters, the cluster will be reported as adjacent to itself. As an example of how to avoid this in cases where it is undesirable, use:
filter(lambda x: x > 0, identify_nearby_clusters(clusters[0], clusters))
-
lib5c.algorithms.clustering.util.
merge_clusters
(clusters, merge_to_which)[source]¶ Recursively merges clusters together from smallest to largest according to a specified merge function.
- Parameters
clusters (list of clusters) – The clusters to be merged. All elements will be removed from this list when this function is called.
merge_to_which (function) – Function that takes in a list of clusters and returns the index of the cluster the first cluster in the list should be merged into. If the first cluster in the list should not be merged, this function should return -1.
- Returns
The list of merged clusters.
- Return type
list of clusters
-
lib5c.algorithms.clustering.util.
peaks_to_array_index
(peaks, shape)[source]¶ Convert a sparse list of peaks to a dense boolean array.
- Parameters
peaks (list of peaks) – The peaks to convert.
shape (tuple of int) – The shape of the resulting array.
- Returns
The dense boolean array.
- Return type
np.ndarray
-
lib5c.algorithms.clustering.util.
reshape_cluster_array_to_dict
(cluster_array, ignored_values=None)[source]¶ Reshapes loops dict structure into a nested dict structure.
- Parameters
cluster_array (np.ndarray) – The entries of this array are cluster ID’s. Values that will be ignored include ‘’, ‘n.s.’, ‘NA’, ‘NaN’, np.nan.
ignored_values (set, optional) – Set of values in cluster_array that should not be treated as cluster ID’s. By default this will be {‘’, ‘n.s.’, ‘NA’, ‘NaN’, np.nan}
- Returns
The outer dict’s keys are cluster ID’s, its values are lists of points belonging to that cluster, with the points being provided as dicts with the following strucure:
{ 'x': int, 'y': int, 'value': 0 }
- Return type
Dict[Any, List[Dict[str, Any]]]
Notes
To rectangularize the returned data structure against a full list of cluster ID’s, use something like:
cluster_dict = reshape_cluster_array_to_dict(cluster_array) for cluster_id in all_cluster_ids: if cluster_id not in cluster_dict: cluster_dict[cluster_id] = []
Examples
>>> import numpy as np >>> cluster_array = np.array([['', 'cow'], ['cow', 'grass']]) >>> reshape_cluster_array_to_dict(cluster_array) == \ ... {'grass': [{'x': 1, 'y': 1, 'value': 0}], ... 'cow': [{'x': 0, 'y': 1, 'value': 0}, ... {'x': 1, 'y': 0, 'value': 0}]} True