lib5c.util.sorting module¶

Module providing utility functions related to sorting data.

lib5c.util.sorting.rankdata_plus(a, method='average')[source]¶

Assign ranks to data, dealing with ties appropriately.

Slight modification of scipy.stats.rankdata() that returns the sorter in addition to the ranks; this allows the sort information to be re-used without having to sort again.

Ranks begin at 1. The method argument controls how ranks are assigned to equal values. See 1 for further discussion of ranking methods.

Parameters

a (array_like) – The array of values to be ranked. The array is first flattened.
method (str, optional) –
The method used to assign ranks to tied elements. The options are ‘average’, ‘min’, ‘max’, ‘dense’ and ‘ordinal’.

’average’:
The average of the ranks that would have been assigned to all the tied values is assigned to each value.

’min’:
The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)

’max’:
The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.

’dense’:
Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.

’ordinal’:
All values are given a distinct rank, corresponding to the order that the values occur in a.

The default is ‘average’.

Returns

ranks, sorter – Ranks is an array of length equal to the size of a, containing rank scores. Sorter is the argsort result from the initial sorting.

Return type

ndarray, ndarray

References

1: “Ranking”, http://en.wikipedia.org/wiki/Ranking

Examples

>>> from lib5c.util.sorting import rankdata_plus
>>> np.array_equal(rankdata_plus([0, 2, 3, 2])[1],  # the sorter
...                np.array([0, 1, 3, 2]))
True
>>> np.array_equal(rankdata_plus([0, 2, 3, 2])[0],  # the ranks
...                np.array([1., 2.5, 4., 2.5]))
True
>>> np.array_equal(rankdata_plus([0, 2, 3, 2], 'min')[0],  # the ranks
...                np.array([1, 2, 4, 2]))
True
>>> np.array_equal(rankdata_plus([0, 2, 3, 2], 'max')[0],  # the ranks
...                np.array([1, 3, 4, 3]))
True
>>> np.array_equal(rankdata_plus([0, 2, 3, 2], 'dense')[0],  # the ranks
...                np.array([1, 2, 3, 2]))
True
>>> np.array_equal(rankdata_plus([0, 2, 3, 2], 'ordinal')[0],  # the ranks
...                np.array([1, 2, 4, 3]))
True