lib5c.algorithms.spline_normalization module¶
Module for fitting b-splines to 5C counts data as a method of bias correction.
-
class
lib5c.algorithms.spline_normalization.
DiscreteBivariateEmpiricalSurface
(xs, ys, zs)[source]¶ Bases:
object
-
lib5c.algorithms.spline_normalization.
fit_spline
(counts_list, primermap, bias_factor, knots=10, asymmetric=False)[source]¶ Fits a 2-D cubic b spline surface to the counts data as a function of the specified upstream and downstream bias factors.
- Parameters
counts_list (List[Dict[str, np.ndarray]]) – The counts data to fit the splines with.
primermap (Dict[str, List[Dict[str, Any]]]) – The primermap describing the loci. The
bias_factor
must be a key of the inner dict.bias_factor (str) – The bias factor to fit the model with.
knots (Optional[int]) – The number of knots to use for the spline. If the bias factor is discrete, pass 0 to use an empirical discrete surface instead of a spline.
log (Optional[bool]) – Pass true to fit the spline to logged data.
asymmetric (Optional[bool]) – Pass True to iterate over only the upper triangular entries of the counts matrices. The default is False, which iterates over the whole counts matrices.
- Returns
List[Dict[str, np.ndarray]]] The first element of the tuple is the spline surface fit to the data. The second element contains the values of the spline surface evaluated at each point in the original counts dict. The third element contains the bias-corrected counts dicts.
- Return type
Tuple[LSQBivariateSpline, Dict[str, np.ndarray],
-
lib5c.algorithms.spline_normalization.
iterative_spline_normalization
(counts_list, exp_list, primermap, bias_list, max_iter=100, eps=0.0001, knots=10, log=True, asymmetric=False)[source]¶ Convenience function for iteratively applying a set of spline normalization steps to a set of counts dicts.
- Parameters
counts_list (List[Dict[str, np.ndarray]]) – A list of observed counts dicts to normalize.
exp_list (List[Dict[str, np.ndarray]]) – A list of expected counts dicts corresponding to the counts dicts in
counts_list
.primermap (Dict[str, List[Dict[str, Any]]]) – Primermap or pixelmap describing the loci in this region.
bias_list (List[str]) – A list of bias factors to remove from the counts. These strings must match metadata keys in
primermap
. That is to say, ifbias_list
is['length']
then we expectprimermap[region][i]['length']
to be a number representing the length of thei
th fragment in the region specified byregion
. If multiple bias factors are specified, the algorithm will iteratively remove all of them from the data.max_iter (int) – The maximum number of iterations when iterating between bias factors.
eps (float) – When the relative change in all models drops below this value convergence is declared.
knots (Union[int, List[int]]) – Specifies the number of knots to put into the splines. Pass a single int to use the same number of knots in each model. Pass a list of ints of length equal to the length of
bias_list
to useknots[i]
knots for the bias factor namedbias_list[i]
. If a bias factor is discrete, pass 0 for its knot number to use an empirical discrete surface instead of a spline.log (bool) – Pass True to fit the splines to log-scale data, reducing the effects of outliers.
asymmetric (bool) – Pass True to construct models using only the upper-triangular elements of the counts matrices, which can lead to asymmetric models. By default, the algorithm iterates over all elements of the counts matrices, enforcing symmetry in the bias models but incurring some redundancy in the actual counts information.
- Returns
List[Dict[str, np.ndarray]], List[Dict[str, np.ndarray]]] The first element of the tuple is a dict mapping the bias factors specified in
bias_list
to BivariateSpline instances. The second element in the tuple is a dict mapping the bias factors specified inbias_list
to counts dicts containing the evaluations of the spline fit to that bias factor at each point in the list of input counts dicts. The third element of the tuple is the normalized list of counts.- Return type
Tuple[Dict[str, scipy.interpolate.BivariateSpline],