lib5c.util.distributions module

Module containing utility functions for parametrizing statistical distributions.

lib5c.util.distributions.call_pvalues(obs, exp, var, dist_gen, log=False)[source]

Call right-tail p-values for obs against a theoretical distribution whose family is specified by dist_gen an whose first two moments are specified by exp and var, respectively.

Parameters
  • obs (float) – The observed value to call a p-value for.

  • var (exp,) – The first two moments of the null distribution.

  • dist_gen (scipy.stats.rv_generic or str) – The null distribution to parameterize. If a string is passed this will be replaced with getattr(scipy.stats, dist_gen).

  • log (bool) – Pass True to attempt to convert exp and var to log-scale.

Returns

The right-tail p-value.

Return type

float

Notes

This function is array-safe.

lib5c.util.distributions.convert_parameters(mu, sigma_2, dist_gen, log=False)[source]

Obtain correct scipy.stats parameterizations for selected one- and two- parameter distributions given a desired mean and variance.

Parameters
  • mu (float) – The mean of the desired distribution.

  • sigma_2 (float) – The variance of the desired distribution.

  • dist_gen (scipy.stats.rv_generic) – The target distribution.

  • log (bool) – Pass True to attempt to convert exp and var to log-scale.

Returns

The appropriate scipy.stats parameters.

Return type

tuple of float

lib5c.util.distributions.freeze_distribution(dist_gen, mu, sigma_2, log=False)[source]

Create a frozen distribution of a given type, given a mean and variance.

Parameters
  • dist_gen (scipy.stats.rv_generic) – The distribution to use.

  • mu (float) – The desired mean.

  • sigma_2 (float) – The desired variance.

  • log (bool) – Pass True to attempt to convert mu and sigma_2 to log-scale.

Returns

A frozen distribution of the specified type with specified mean and variance.

Return type

scipy.stats.rv_frozen

Notes

This function does not perform any fitting, because it assumes that the first two moments directly and uniquely identify the desired distribution.

Examples

>>> frozen_dist = freeze_distribution(stats.poisson, 4.0, 4.0)
>>> print('%s distribution with mean %.2f and variance %.2f'
...       % ((frozen_dist.dist.name,) + frozen_dist.stats(moments='mv')))
poisson distribution with mean 4.00 and variance 4.00
>>> frozen_dist = freeze_distribution(stats.nbinom, 4.0, 3.0)
>>> print('%s distribution with mean %.2f and variance %.2f'
...       % ((frozen_dist.dist.name,) + frozen_dist.stats(moments='mv')))
nbinom distribution with mean 4.00 and variance 4.00
>>> frozen_dist = freeze_distribution(stats.nbinom, 3.0, 4.0)
>>> print('%s distribution with mean %.2f and variance %.2f'
...       % ((frozen_dist.dist.name,) + frozen_dist.stats(moments='mv')))
nbinom distribution with mean 3.00 and variance 4.00
>>> frozen_dist = freeze_distribution(stats.norm, 4.0, 3.0)
>>> print('%s distribution with mean %.2f and variance %.2f'
...       % ((frozen_dist.dist.name,) + frozen_dist.stats(moments='mv')))
norm distribution with mean 4.00 and variance 3.00
>>> frozen_dist = freeze_distribution(stats.logistic, 4.0, 3.0)
>>> print('%s distribution with mean %.2f and variance %.2f'
...       % ((frozen_dist.dist.name,) + frozen_dist.stats(moments='mv')))
logistic distribution with mean 4.00 and variance 3.00
>>> mu = 2  # mean of a normal random variable X
>>> sigma_2 = 16  # variance of X
>>> scale = np.exp(mu)  # parameter conversion for scipy.stats.lognorm
>>> s = np.sqrt(sigma_2)  # ditto
>>> y = stats.lognorm(s=s, scale=scale)  # a lognormal RV: exp(X) = Y
>>> m, v = y.stats(moments='mv')  # mean and variance of Y
>>> m, v
(array(22026.4657948...), array(4.31123106e+15))
>>> frozen_dist = freeze_distribution(stats.logistic, m, v, log=True)
>>> print('%s distribution with mean %.2f and variance %.2f'
...       % ((frozen_dist.dist.name,) + frozen_dist.stats(moments='mv')))
logistic distribution with mean 2.00 and variance 16.00
lib5c.util.distributions.log_parameters(m, v)[source]

Attempts to guess appropriate log-scale mean and variance parameters given non-log scale estimators of these quantities, under the assumptions of a lognormal model.

Based on https://en.wikipedia.org/wiki/Log-normal_distribution#Notation

Parameters
  • m (float) – The non-log scale mean.

  • v (float) – The non-log scale variance.

Returns

The first float is the log-scale mean, the second is the log-scale variance.

Return type

float, float

Notes

This function is array-safe.

Examples

>>> mu = 2  # mean of a normal random variable X
>>> sigma_2 = 16  # variance of X
>>> scale = np.exp(mu)  # parameter conversion for scipy.stats.lognorm
>>> s = np.sqrt(sigma_2)  # ditto
>>> y = stats.lognorm(s=s, scale=scale)  # a lognormal RV: exp(X) = Y
>>> m, v = y.stats(moments='mv')  # mean and variance of Y
>>> m, v
(array(22026.4657948...), array(4.31123106e+15))
>>> log_parameters(m, v)  # recover moments of X from moments of Y
(2.0, 16.0)