lib5c.algorithms.expected module

Module for computing expected models for 5C interaction data.

lib5c.algorithms.expected.empirical_binned(regional_counts, log_transform=True)[source]

Make a regional one-dimensional bin-level expected model by taking an average of the interaction values at each distance.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • log_transform (bool) – Pass True to take the geometric mean instead of the arithmetic mean, which is equivalent to averaging log-transformed counts.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins.

Return type

List[float]

lib5c.algorithms.expected.force_monotonic(distance_expected)[source]

Force a one-dimensional distance expected to be monotonic.

Parameters

distance_expected (Union[List[float], Dict[int, float]]) – The one-dimensional expected model to force to monotonicity. If the model describes bin-level data, this should be a list of floats, where the i th element of the list corresponds to the expected value for interactions between loci separated by i bins. If the model describes fragment-level data, this should be a dict mapping interaction distances in units of base pairs to the expected value at that distance.

Returns

The forced-monotonic version of the input one-dimensional expected model.

Return type

Union[List[float], Dict[int, float]]

lib5c.algorithms.expected.get_distance_expected(obs_matrix, regional_primermap=None, level='bin', powerlaw=False, regression=False, degree=1, lowess_smooth=False, lowess_frac=0.8, log_transform='auto', exclude_near_diagonal=False)[source]

Convenience function for computing a regional one-dimensional expected model from a matrix of observed counts, with properties that can be customized by kwargs.

Parameters
  • obs_matrix (np.ndarray) – The matrix of observed counts to model.

  • regional_primermap (Optional[List[Dict[str, Any]]]) – The primermap for this region. Required if obs_matrix is fragment-level.

  • level ({'bin', 'fragment'}) – The level of obs_matrix.

  • powerlaw (bool) – Whether or not to fit a discrete power law distribution to the data.

  • regression (bool) – Whether or not to use a polynomial regression model.

  • degree (int) – The degree of the regression model to use.

  • lowess_smooth (bool) – Whether or not to use lowess smoothing to compute the model.

  • lowess_frac (float) – The lowess smoothing fraction parameter.

  • log_transform ({'counts', 'both', 'none', 'auto'}) –

    What to transform into log space.
    • counts: log-transform only the counts but not the distances. This results in semi-log models, which don’t work on fragment-level data yet.

    • both: log-transform both the counts and the distances, resulting in log-log models.

    • none: don’t log anything.

    • auto: automatically pick a reasonably choice based on the other kwargs.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. For bin-level data, this is a list of floats, where the i th element of the list corresponds to the expected value for interactions between loci separated by i bins. For fragment-level data, this is a dict mapping interaction distances in units of base pairs to the appropriate expected values.

Return type

Union[List[float], Dict[int, float]]

lib5c.algorithms.expected.get_global_distance_expected(counts, primermap=None, level='bin', powerlaw=False, regression=False, degree=1, lowess_smooth=False, lowess_frac=0.8, log_transform='auto', exclude_near_diagonal=False)[source]

Convenience function for computing a global one-dimensional expected model from a dict of observed counts, with properties that can be customized by kwargs.

Parameters
  • counts (Dict[str, np.ndarray]) – The dict of observed counts to model.

  • primermap (Optional[Dict[str, List[Dict[str, Any]]]]) – A primermap corresponding to counts.

  • level ({'bin', 'fragment'}) – The level of counts.

  • powerlaw (bool) – Whether or not to fit a discrete power law distribution to the data.

  • regression (bool) – Whether or not to use a polynomial regression model.

  • degree (int) – The degree of the regression model to use.

  • lowess_smooth (bool) – Whether or not to use lowess smoothing to compute the model.

  • lowess_frac (float) – The lowess smoothing fraction parameter.

  • log_transform ({'counts', 'both', 'none', 'auto'}) –

    What to transform into log space.
    • counts: log-transform only the counts but not the distances. This results in semi-log models, which don’t work on fragment-level data yet.

    • both: log-transform both the counts and the distances, resulting in log-log models.

    • none: don’t log anything.

    • auto: automatically pick a reasonably choice based on the other kwargs.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. For bin-level data, this is a list of floats, where the i th element of the list corresponds to the expected value for interactions between loci separated by i bins. For fragment-level data, this is a dict mapping interaction distances in units of base pairs to the appropriate expected values.

Return type

Union[List[float], Dict[int, float]]

lib5c.algorithms.expected.global_empirical_binned(counts, log_transform=True)[source]

Make a global one-dimensional bin-level expected model by taking an average of the interaction values at each distance.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • log_transform (bool) – Pass True to take the geometric mean instead of the arithmetic mean, which is equivalent to averaging log-transformed counts.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins. The length of this list will match the size of the largest region in the input counts dict.

Return type

List[float]

lib5c.algorithms.expected.global_lowess_binned(counts, frac=0.8, exclude_near_diagonal=False)[source]

Make a global one-dimensional bin-level expected model by performing lowess regression in unlogged space, excluding the first third of the distance scales and only using the emprical arithmetic means there instead.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • frac (float) – The lowess smoothing fraction parameter to use.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins. The length of this list will match the size of the largest region in the input counts dict.

Return type

List[float]

lib5c.algorithms.expected.global_lowess_binned_log_counts(counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]

Make a global one-dimensional bin-level expected model by performing lowess regression in log-counts space, excluding the first third of the distance scales and only using the emprical geometric means there instead.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • frac (float) – The lowess smoothing fraction parameter to use.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins. The length of this list will match the size of the largest region in the input counts dict.

Return type

List[float]

lib5c.algorithms.expected.global_lowess_log_log_binned(counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]

Make a global one-dimensional bin-level expected model by performing lowess regression in log-log space.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • frac (float) – The lowess smoothing fraction parameter to use.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins. The length of this list will match the size of the largest region in the input counts dict.

Return type

List[float]

lib5c.algorithms.expected.global_lowess_log_log_fragment(counts, distances, pseudocount=1, frac=0.8)[source]

Make a global one-dimensional fragment-level expected model by performing lowess regression in log-log space.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • distances (Dict[str, np.ndarray]) – A dict of pairwise distance matrices describing the genomic distances between the elements of the matrices in counts. The keys and array dimensions should match the keys and array dimensions of counts.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • frac (float) – The lowess smoothing fraction parameter to use.

Returns

A mapping from interaction distances in units of base pairs to the expected value at that distance.

Return type

Dict[int, float]

lib5c.algorithms.expected.global_poly_log_log_binned(counts, degree=1, pseudocount=1, exclude_near_diagonal=False)[source]

Make a global one-dimensional bin-level expected model by fitting a polynomial in log-log space.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • degree (int) – The degree of the polynomial to fit.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins. The length of this list will match the size of the largest region in the input counts dict.

Return type

List[float]

lib5c.algorithms.expected.global_poly_log_log_fragment(counts, distances, degree=1, pseudocount=1)[source]

Make a global one-dimensional fragment-level expected model by fitting a polynomial in log-log space.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • distances (Dict[str, np.ndarray]) – A dict of pairwise distance matrices describing the genomic distances between the elements of the matrices in counts. The keys and array dimensions should match the keys and array dimensions of counts.

  • degree (int) – The degree of the polynomial to fit.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

Returns

A mapping from interaction distances in units of base pairs to the expected value at that distance.

Return type

Dict[int, float]

lib5c.algorithms.expected.global_powerlaw_binned(counts, exclude_near_diagonal=False)[source]

Make a global one-dimensional bin-level expected model by fitting a polynomial in log-log space.

Parameters
  • counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins. The length of this list will match the size of the largest region in the input counts dict.

Return type

List[float]

lib5c.algorithms.expected.interpolate_expected(expected_matrix, regional_primermap, distance)[source]

Interpolate the value of an expected model (represented as a matrix) at an arbitrary distance scale.

Parameters
  • expected_matrix (np.ndarray) – The expected matrix to use as a source for interpolation.

  • regional_primermap (List[Dict[str, Any]]) – The primermap for this region.

  • distance (int) – The interaction distance at which to estimate the expected value, in base pairs.

Returns

The interpolated expected value, or -1 if distance is outside of the range of the expected model.

Return type

float

lib5c.algorithms.expected.lowess_binned(regional_counts, frac=0.8, exclude_near_diagonal=False)[source]

Make a regional one-dimensional bin-level expected model by performing lowess regression in unlogged space, excluding the first third of the region and only using the emprical geometric means there instead.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • frac (float) – The lowess smoothing fraction parameter to use.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins.

Return type

List[float]

lib5c.algorithms.expected.lowess_binned_log_counts(regional_counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]

Make a regional one-dimensional bin-level expected model by performing lowess regression in log-counts space, excluding the first third of the region and only using the emprical geometric means there instead.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • frac (float) – The lowess smoothing fraction parameter to use.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins.

Return type

List[float]

lib5c.algorithms.expected.lowess_log_log_binned(regional_counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]

Make a regional one-dimensional bin-level expected model by performing lowess regression in log-log space.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • frac (float) – The lowess smoothing fraction parameter to use.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins.

Return type

List[float]

lib5c.algorithms.expected.lowess_log_log_fragment(regional_counts, distances, pseudocount=1, frac=0.8)[source]

Make a regional one-dimensional fragment-level expected model by performing lowess regression in log-log space.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • distances (np.ndarray) – The pairwise distance matrix for all fragments in this region in units of base pairs.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • frac (float) – The lowess smoothing fraction parameter to use.

Returns

A mapping from interaction distances in units of base pairs to the expected value at that distance.

Return type

Dict[int, float]

lib5c.algorithms.expected.make_distance_matrix(regional_primermap)[source]

Construct a pairwise distance matrix for the fragments in a region from the primermap describing those fragments.

Parameters

regional_primermap (List[Dict[str, Any]]) – The primermap for this region.

Returns

The pairwise distance matrix for all fragments in this region in units of base pairs.

Return type

np.ndarray

lib5c.algorithms.expected.make_expected_dict_from_matrix(expected_matrix, distance_matrix)[source]

Convert an expected matrix into a dict representation of the one-dimensional expected model it embodies.

Parameters
  • expected_matrix (np.ndarray) – The expected matrix.

  • distance_matrix (np.ndarray) – The pairwise distance matrix for the fragments in this region.

Returns

A mapping from interaction distances in units of base pairs to the expected value at that distance.

Return type

Dict[int, float]

lib5c.algorithms.expected.make_expected_matrix(obs_matrix, regional_primermap=None, level='bin', powerlaw=False, regression=False, degree=1, lowess_smooth=False, lowess_frac=0.8, log_transform='auto', monotonic=False, donut=False, w=15, p=5, donut_frac=0.2, min_exp=0.1, log_donut=False, max_donut_ll=False, distance_expected=None, exclude_near_diagonal=False)[source]

Convenience function for computing a complete expected matrix given a matrix of observed counts that can be customized with a variety of kwargs.

Parameters
  • obs_matrix (np.ndarray) – The matrix of observed counts to make an expected matrix for.

  • regional_primermap (Optional[List[Dict[str, Any]]]) – The primermap for this region. Required if obs_matrix is fragment-level.

  • level ({'bin', 'fragment'}) – The level of obs_matrix.

  • powerlaw (bool) – Whether or not to fit a discrete power law distribution to the data.

  • regression (bool) – Whether or not to use a polynomial regression model.

  • degree (int) – The degree of the regression model to use.

  • lowess_smooth (bool) – Whether or not to use lowess smoothing to compute the model.

  • lowess_frac (float) – The lowess smoothing fraction parameter.

  • log_transform ({'counts', 'both', 'none', 'auto'}) –

    What to transform into log space.
    • counts: log-transform only the counts but not the distances. This results in semi-log models, which don’t work on fragment-level data yet.

    • both: log-transform both the counts and the distances, resulting in log-log models.

    • none: don’t log anything.

    • auto: automatically pick a reasonably choice based on the other kwargs.

  • monotonic (bool) – Pass True to force the one-dimensional expected model to be monotonic.

  • donut (bool) – Pass True to apply donut-filter local correction to the expected model. Not implemented for fragment-level input data.

  • w (int) – The outer width of the donut when using donut correction. Should be an odd integer.

  • p (int) – The inner width of the donut when using donut correction. Should be an odd integer.

  • donut_frac (float) – If the fraction of possible elements in the donut that lie wihtin the region and have non-infinte values is lower than this fraction then the donut-corrected value at that point will be NaN.

  • min_exp (float) – If the sum of the 1-D expected matrix under the donut or lower left footprint for a particular pixel is less than this value, set the output at this pixel to nan to avoid numerical instability related to division by small numbers.

  • log_donut (bool) – Pass True to perform donut correction in log-counts space.

  • max_donut_ll (bool) – If donut is True, pass True here too to make the donut correction use the maximum of the “donut” and “lower-left” regions.

  • distance_expected (Optional[Union[List[float], Dict[int, float]]]) – Pass a one-dimensional expected model to use it instead of computing a new one from scratch according to the other kwargs.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

  • Tuple[np.ndarray, Union[List[float], Dict[int, float]], Optional[

  • np.ndarray]] – The first element of the tuple is the expected matrix. The second element of the tuple is the one-dimensional expected model, which will be a list of expected values if level was ‘bin’ or a dict mapping integer distances to expected values if level was ‘fragment’. The third element will be the pairwise distance matrix if level was ‘fragment’, but will simply be None if level was ‘bin’.

lib5c.algorithms.expected.make_expected_matrix_from_dict(distance_expected, distance_matrix)[source]

Converts a fragment-level one-dimensional expected model into an expected matrix.

Parameters
  • distance_expected (Dict[int, float]) – A mapping from interaction distances in units of base pairs to the expected value at that distance.

  • distance_matrix (np.ndarray) – The pairwise distance matrix for the fragments in this region.

Returns

The expected matrix.

Return type

np.ndarray

lib5c.algorithms.expected.make_expected_matrix_from_list(distance_expected)[source]

Converts a bin-level one-dimensional expected model into an expected matrix.

Parameters

distance_expected (List[float]) – The one-dimensional distance expected model to make a matrix out of.

Returns

The expected matrix.

Return type

np.ndarray

lib5c.algorithms.expected.make_poly_log_log_binned_expected_matrix(obs_matrix, exclude_near_diagonal=False)[source]

Convenience function for quickly making an expected matrix for a bin-level observed counts matrix based on a simple power law relationship.

Parameters
  • obs_matrix (np.ndarray) – The matrix of observed counts.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The expected matrix.

Return type

np.ndarray

lib5c.algorithms.expected.make_poly_log_log_fragment_expected_matrix(obs_matrix, regional_primermap)[source]

Convenience function for quickly making an expected matrix for a fragment-level observed counts matrix based on a simple power law relationship.

Parameters
  • obs_matrix (np.ndarray) – The matrix of observed counts.

  • regional_primermap (List[Dict[str, Any]]) – Primermap describing the loci in the region represented by obs_matrix. Necessary to figure out distances between elements in the contact matrix.

Returns

The expected matrix.

Return type

np.ndarray

lib5c.algorithms.expected.make_powerlaw_binned_expected_matrix(obs_matrix, exclude_near_diagonal=False)[source]

Convenience function for quickly making an expected matrix for a bin-level observed counts matrix based on a simple power law relationship.

Parameters
  • obs_matrix (np.ndarray) – The matrix of observed counts.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The expected matrix.

Return type

np.ndarray

lib5c.algorithms.expected.poly_log_log_binned(regional_counts, degree=1, pseudocount=1, exclude_near_diagonal=False)[source]

Make a regional one-dimensional bin-level expected model by fitting a polynomial in log-log space.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • degree (int) – The degree of the polynomial to fit.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins.

Return type

List[float]

lib5c.algorithms.expected.poly_log_log_fragment(regional_counts, distances, degree=1, pseudocount=1)[source]

Make a regional one-dimensional fragment-level expected model by fitting a polynomial in log-log space.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • distances (np.ndarray) – The pairwise distance matrix for all fragments in this region in units of base pairs.

  • degree (int) – The degree of the polynomial to fit.

  • pseudocount (int) – The pseudocount to add to the counts before logging.

Returns

A mapping from interaction distances in units of base pairs to the expected value at that distance.

Return type

Dict[int, float]

lib5c.algorithms.expected.powerlaw_binned(regional_counts, exclude_near_diagonal=False)[source]

Make a regional one-dimensional bin-level expected model by fitting a polynomial in log-log space.

Parameters
  • regional_counts (np.ndarray) – The observed counts matrix for this region.

  • exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.

Returns

The one-dimensional expected model. The i th element of the list corresponds to the expected value for interactions between loci separated by i bins.

Return type

List[float]