lib5c.util.stratification module¶
Module containing utility functions for stratifying data.
-
lib5c.util.stratification.
conservative_qcut
(array, num_quantiles, add_zero=True, pad_right_endpoint=False)[source]¶ Similar to pd.qcut(), but designed for stratifying quantities with zero-inflation. All zeros get put into the first stratum, and then the rest of the data is qcut.
- Parameters
array (np.ndarray) – The data to stratify.
num_quantiles (int) – How many strata to generate.
add_zero (bool) – Pass True to include the zeros in the final stratification in their own bin (increasing the number of bins returned by 1). Pass False to exclude zeros from the stratification.
pad_right_endpoint (False) – If the right endpoint of your last bin is interpreted as open, pass True here to extend this right endpoint by a small number so that the highest value is not excluded from the stratification.
- Returns
The binning scheme, as a list of n+1 bin endpoints. This is the format expected by pd.cut() or plt.hist().
- Return type
np.ndarray