lib5c.util.lowess module¶
Module for performing lowess fitting. Consists mostly of a convenience wrapper
around statsmodels.nonparametric.smoothers_lowess.lowess()
.
-
lib5c.util.lowess.
constant_fit
(x, y, logx=False, logy=False, agg='median')[source]¶ Same signature as
lowess_fit()
andgroup_fit()
, but instead of fittingy
againstx
, simply applies an aggregating function toy
.- Parameters
x (Any) – Ignored, present only for signature parity with other fitters.
y (np.ndarray) – The y values to fit.
logx (Any) – Ignored, present only for signature parity with other fitters.
logy (bool) – Pass True to perform the fit on the scale of
log(y)
.agg ({'median', 'mean', 'lowess'}) – The function to use to aggregate y-values.
- Returns
This function takes in
x
values, ignores them completely, and simply returns the constant estimatedy
value on the originaly
scale (regardless of what is passed forlogy
).- Return type
function
-
lib5c.util.lowess.
group_fit
(x, y, logx=False, logy=False, agg='median', left_boundary=None, right_boundary=None, n_windows=100, window_width=0.2)[source]¶ Simpler alternative to lowess fitting using a sliding window mean.
- Parameters
y (x,) – The x and y values to fit, respectively.
logy (logx,) – Pass True to perform the fit on the scale of
log(x)
and/orlog(y)
, respectively.agg ({'median', 'mean', 'lowess'}) – The function to use to aggregate within groups.
right_boundary (left_boundary,) – Allows specifying boundaries for the fit, in the original
x
space. If a float is passed, the returned fit will return the farthest left or farthest right lowess-estimatedy_hat
(from the original fitting set) for all points which are left or right of the specified left or right boundary point, respectively. Pass None to use linear extrapolation for these points instead.n_windows (int) – The number of windows to use (spaced uniformly across the range of
x
).window_width (float) – The width of each window, defined as a fraction of its x-value.
- Returns
This function takes in
x
values on the originalx
scale and returns estimatedy
values on the originaly
scale (regardless of what is passed forlogx
andlogy
). This function will still return sane estimates fory
even at points not in the original fitting set by performing linear interpolation in the space the fit was performed in.- Return type
function
-
lib5c.util.lowess.
lowess_agg
(y, it=3)[source]¶ Performs an aggregation operation equivalent to lowess. Should behave like an outlier-resistant mean.
- Parameters
y (np.ndarray) – The values to aggregate.
it (int) – The number of residual-based reweightings to perform.
- Returns
The lowess-implemented outlier-resistant mean.
- Return type
float
-
lib5c.util.lowess.
lowess_fit
(x, y, logx=False, logy=False, left_boundary=None, right_boundary=None, frac=0.3, delta=0.01)[source]¶ Opinionated convenience wrapper for lowess smoothing.
- Parameters
y (x,) – The x and y values to fit, respectively.
logy (logx,) – Pass True to perform the fit on the scale of
log(x)
and/orlog(y)
, respectively.right_boundary (left_boundary,) – Allows specifying boundaries for the fit, in the original
x
space. If a float is passed, the returned fit will return the farthest left or farthest right lowess-estimatedy_hat
(from the original fitting set) for all points which are left or right of the specified left or right boundary point, respectively. Pass None to use linear extrapolation for these points instead.frac (float) – The lowess smoothing fraction to use.
delta (float) – Distance (on the scale of
x
orlog(x)
) within which to use linear interpolation when constructing the initial fit, expressed as a fraction of the range ofx
orlog(x)
.
- Returns
This function takes in
x
values on the originalx
scale and returns estimatedy
values on the originaly
scale (regardless of what is passed forlogx
andlogy
). This function will still return sane estimates fory
even at points not in the original fitting set by performing linear interpolation in the space the fit was performed in.- Return type
function
Notes
No filtering of input values is performed; clients are expected to handle this if desired. NaN values should not break the function, but
x
points with zero values passed whenlogx
is True are expected to break the function.The default value of the
delta
parameter is set to be non-zero, matching the behavior of lowess smoothing in R and improving performance.Linear interpolation between x-values in the original fitting set is used to provide a familiar functional interface to the fitted function.
Boundary conditions on the fitted function are exposed via
left_boundary
andright_boundary
, mostly as a convenience for points wherex == 0
when fitting was performed on the scale oflog(x)
.When
left_boundary
orright_boundary
are None (this is the default) the fitted function will be linearly extrapolated for points beyond the lowest and highest x-values inx
.