lib5c.algorithms.spline_normalization module

Module for fitting b-splines to 5C counts data as a method of bias correction.

class lib5c.algorithms.spline_normalization.DiscreteBivariateEmpiricalSurface(xs, ys, zs)[source]

Bases: object

ev(x, y)[source]
lib5c.algorithms.spline_normalization.fit_spline(counts_list, primermap, bias_factor, knots=10, asymmetric=False)[source]

Fits a 2-D cubic b spline surface to the counts data as a function of the specified upstream and downstream bias factors.

Parameters
  • counts_list (List[Dict[str, np.ndarray]]) – The counts data to fit the splines with.

  • primermap (Dict[str, List[Dict[str, Any]]]) – The primermap describing the loci. The bias_factor must be a key of the inner dict.

  • bias_factor (str) – The bias factor to fit the model with.

  • knots (Optional[int]) – The number of knots to use for the spline. If the bias factor is discrete, pass 0 to use an empirical discrete surface instead of a spline.

  • log (Optional[bool]) – Pass true to fit the spline to logged data.

  • asymmetric (Optional[bool]) – Pass True to iterate over only the upper triangular entries of the counts matrices. The default is False, which iterates over the whole counts matrices.

Returns

List[Dict[str, np.ndarray]]] The first element of the tuple is the spline surface fit to the data. The second element contains the values of the spline surface evaluated at each point in the original counts dict. The third element contains the bias-corrected counts dicts.

Return type

Tuple[LSQBivariateSpline, Dict[str, np.ndarray],

lib5c.algorithms.spline_normalization.iterative_spline_normalization(counts_list, exp_list, primermap, bias_list, max_iter=100, eps=0.0001, knots=10, log=True, asymmetric=False)[source]

Convenience function for iteratively applying a set of spline normalization steps to a set of counts dicts.

Parameters
  • counts_list (List[Dict[str, np.ndarray]]) – A list of observed counts dicts to normalize.

  • exp_list (List[Dict[str, np.ndarray]]) – A list of expected counts dicts corresponding to the counts dicts in counts_list.

  • primermap (Dict[str, List[Dict[str, Any]]]) – Primermap or pixelmap describing the loci in this region.

  • bias_list (List[str]) – A list of bias factors to remove from the counts. These strings must match metadata keys in primermap. That is to say, if bias_list is ['length'] then we expect primermap[region][i]['length'] to be a number representing the length of the i th fragment in the region specified by region. If multiple bias factors are specified, the algorithm will iteratively remove all of them from the data.

  • max_iter (int) – The maximum number of iterations when iterating between bias factors.

  • eps (float) – When the relative change in all models drops below this value convergence is declared.

  • knots (Union[int, List[int]]) – Specifies the number of knots to put into the splines. Pass a single int to use the same number of knots in each model. Pass a list of ints of length equal to the length of bias_list to use knots[i] knots for the bias factor named bias_list[i]. If a bias factor is discrete, pass 0 for its knot number to use an empirical discrete surface instead of a spline.

  • log (bool) – Pass True to fit the splines to log-scale data, reducing the effects of outliers.

  • asymmetric (bool) – Pass True to construct models using only the upper-triangular elements of the counts matrices, which can lead to asymmetric models. By default, the algorithm iterates over all elements of the counts matrices, enforcing symmetry in the bias models but incurring some redundancy in the actual counts information.

Returns

List[Dict[str, np.ndarray]], List[Dict[str, np.ndarray]]] The first element of the tuple is a dict mapping the bias factors specified in bias_list to BivariateSpline instances. The second element in the tuple is a dict mapping the bias factors specified in bias_list to counts dicts containing the evaluations of the spline fit to that bias factor at each point in the list of input counts dicts. The third element of the tuple is the normalized list of counts.

Return type

Tuple[Dict[str, scipy.interpolate.BivariateSpline],