lib5c.algorithms.clustering.util module¶

Module containing utility functions for clustering 5C interactions.

lib5c.algorithms.clustering.util.array_index_to_peaks(idx)[source]¶

Convert a dense boolean array to a sparse list of peaks.

Parameters: idx (np.ndarray) – Boolean array to convert.
Returns: The peaks.
Return type: list of peaks

lib5c.algorithms.clustering.util.belongs_to(peak, cluster)[source]¶

Checks if a peak belongs to a cluster.

Parameters

peak (peak) – The query peak.
cluster (cluster) – The cluster to search for it in.

Returns

True if peak belongs to cluster, False otherwise.

Return type

bool

lib5c.algorithms.clustering.util.belongs_to_which(peak, clusters)[source]¶

Identifies which cluster out of a list of clusters, if any, a peak belongs to.

Parameters

peak (peak) – The query peak to consider.
clusters (list of clusters) – The clusters to look for the query peak in.

Returns

The index of the cluster within the list of clusters which contains the query peak, or -1 if no cluster in the list of clusters contains the query peak.

Return type

int

lib5c.algorithms.clustering.util.center_of_mass(cluster)[source]¶

Computes the center of mass, or centroid, of a cluster.

Parameters: cluster (cluster) – The cluster to consider.
Returns: The centroid of the cluster.
Return type: 1D numpy array of length 2

Notes

For the purpose of this calculation, the mass of a peak is taken to be its value.

lib5c.algorithms.clustering.util.clusters_to_array(clusters, size)[source]¶

Assembles clusters into a 2-D array for plotting on a heatmap.

Parameters

clusters (list of clusters) – The clusters to be converted to a 2-D array.
size (int) – The height and width of the array to generate. This should be equal to the number of bins in the region.

Returns

A 2-D array with each clusters having been assigned a different sequential integer value for all its pixels. The next consecutive integer is a gap value, and then the one after that is a default value for all pixels not in a cluster.

Return type

2-D array

Notes

It is recommended to plot the resulting array using a rapidly-changing colorscale such as plt.get_cmap(‘gist_ncar’).

lib5c.algorithms.clustering.util.flatten_clusters(clusters)[source]¶

Flattens a list of clusters to a flat list of peaks.

Parameters: clusters (list of list of peaks) – The clusters to flatten.
Returns: The flattened peaks.
Return type: list of peaks

lib5c.algorithms.clustering.util.get_cluster(x, y, clusters)[source]¶

Identifies which cluster, if any, a specified point belongs to.

Parameters

x (int) – x-coordinate of the point to consider.
y (int) – y-coordinate of the point to consider.
clusters (list of clusters) – List of clusters to search.

Returns

The index of the cluster which contains the point (x,y). If no cluster in the list contains the point, the value is -1.

Return type

int

lib5c.algorithms.clustering.util.get_vector(peak)[source]¶

Gets an array representing a peak’s location.

Parameters: peak (peak) –
Returns: The peak’s location as an (x, y) ordered pair.
Return type: 1D numpy array of length 2

lib5c.algorithms.clustering.util.ident(peak1, peak2)[source]¶

Checks whether two peaks are the same peak.

Parameters

peak1 (peak) –
peak2 (peak) –

Returns

True if the peaks are the same peak, False otherwise.

Return type

bool

lib5c.algorithms.clustering.util.identify_nearby_clusters(cluster, clusters)[source]¶

Figures out which other clusters from a list of clusters are adjacent to a query cluster.

Parameters

cluster (cluster) – The query cluster to consider.
clusters (list of clusters) – The clusters to check for adjacency to the query cluster.

Returns

The indices of clusters within the list of clusters that were found to be adjacent to the query cluster.

Return type

list of int

Notes

This may include duplicates. To get rid of them, just use:

set(identify_nearby_clusters(cluster, clusters))

To identify one nearby cluster at random, use:

nearby_clusters = identify_nearby_clusters(cluster, clusters)
if nearby_clusters:
    nearby_clusters[0]

To see what clusters are near a single peak, use:

identify_nearby_clusters([peak], clusters)

If cluster is in clusters, the cluster will be reported as adjacent to itself. As an example of how to avoid this in cases where it is undesirable, use:

filter(lambda x: x > 0, identify_nearby_clusters(clusters[0], clusters))

lib5c.algorithms.clustering.util.merge_clusters(clusters, merge_to_which)[source]¶

Recursively merges clusters together from smallest to largest according to a specified merge function.

Parameters

clusters (list of clusters) – The clusters to be merged. All elements will be removed from this list when this function is called.
merge_to_which (function) – Function that takes in a list of clusters and returns the index of the cluster the first cluster in the list should be merged into. If the first cluster in the list should not be merged, this function should return -1.

Returns

The list of merged clusters.

Return type

list of clusters

lib5c.algorithms.clustering.util.peaks_to_array_index(peaks, shape)[source]¶

Convert a sparse list of peaks to a dense boolean array.

Parameters

peaks (list of peaks) – The peaks to convert.
shape (tuple of int) – The shape of the resulting array.

Returns

The dense boolean array.

Return type

np.ndarray

lib5c.algorithms.clustering.util.reshape_cluster_array_to_dict(cluster_array, ignored_values=None)[source]¶

Reshapes loops dict structure into a nested dict structure.

Parameters

cluster_array (np.ndarray) – The entries of this array are cluster ID’s. Values that will be ignored include ‘’, ‘n.s.’, ‘NA’, ‘NaN’, np.nan.
ignored_values (set, optional) – Set of values in cluster_array that should not be treated as cluster ID’s. By default this will be {‘’, ‘n.s.’, ‘NA’, ‘NaN’, np.nan}

Returns

The outer dict’s keys are cluster ID’s, its values are lists of points belonging to that cluster, with the points being provided as dicts with the following strucure:

{
    'x': int,
    'y': int,
    'value': 0
}

Return type

Dict[Any, List[Dict[str, Any]]]

Notes

To rectangularize the returned data structure against a full list of cluster ID’s, use something like:

cluster_dict = reshape_cluster_array_to_dict(cluster_array)
for cluster_id in all_cluster_ids:
    if cluster_id not in cluster_dict:
        cluster_dict[cluster_id] = []

Examples

>>> import numpy as np
>>> cluster_array = np.array([['', 'cow'], ['cow', 'grass']])
>>> reshape_cluster_array_to_dict(cluster_array) == \
...     {'grass': [{'x': 1, 'y': 1, 'value': 0}],
...      'cow': [{'x': 0, 'y': 1, 'value': 0},
...              {'x': 1, 'y': 0, 'value': 0}]}
True