lib5c.util.stratification module¶
Module containing utility functions for stratifying data.
-
lib5c.util.stratification.
conservative_qcut
(array, num_quantiles, add_zero=True, pad_right_endpoint=False)[source]¶ Similar to pd.qcut(), but designed for stratifying quantities with zero-inflation. All zeros get put into the first stratum, and then the rest of the data is qcut.
Parameters: - array (np.ndarray) – The data to stratify.
- num_quantiles (int) – How many strata to generate.
- add_zero (bool) – Pass True to include the zeros in the final stratification in their own bin (increasing the number of bins returned by 1). Pass False to exclude zeros from the stratification.
- pad_right_endpoint (False) – If the right endpoint of your last bin is interpreted as open, pass True here to extend this right endpoint by a small number so that the highest value is not excluded from the stratification.
Returns: The binning scheme, as a list of n+1 bin endpoints. This is the format expected by pd.cut() or plt.hist().
Return type: np.ndarray