lib5c.algorithms.expected module¶
Module for computing expected models for 5C interaction data.
-
lib5c.algorithms.expected.
empirical_binned
(regional_counts, log_transform=True)[source]¶ Make a regional one-dimensional bin-level expected model by taking an average of the interaction values at each distance.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
log_transform (bool) – Pass True to take the geometric mean instead of the arithmetic mean, which is equivalent to averaging log-transformed counts.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins.- Return type
List[float]
-
lib5c.algorithms.expected.
force_monotonic
(distance_expected)[source]¶ Force a one-dimensional distance expected to be monotonic.
- Parameters
distance_expected (Union[List[float], Dict[int, float]]) – The one-dimensional expected model to force to monotonicity. If the model describes bin-level data, this should be a list of floats, where the
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. If the model describes fragment-level data, this should be a dict mapping interaction distances in units of base pairs to the expected value at that distance.- Returns
The forced-monotonic version of the input one-dimensional expected model.
- Return type
Union[List[float], Dict[int, float]]
-
lib5c.algorithms.expected.
get_distance_expected
(obs_matrix, regional_primermap=None, level='bin', powerlaw=False, regression=False, degree=1, lowess_smooth=False, lowess_frac=0.8, log_transform='auto', exclude_near_diagonal=False)[source]¶ Convenience function for computing a regional one-dimensional expected model from a matrix of observed counts, with properties that can be customized by kwargs.
- Parameters
obs_matrix (np.ndarray) – The matrix of observed counts to model.
regional_primermap (Optional[List[Dict[str, Any]]]) – The primermap for this region. Required if
obs_matrix
is fragment-level.level ({'bin', 'fragment'}) – The level of
obs_matrix
.powerlaw (bool) – Whether or not to fit a discrete power law distribution to the data.
regression (bool) – Whether or not to use a polynomial regression model.
degree (int) – The degree of the regression model to use.
lowess_smooth (bool) – Whether or not to use lowess smoothing to compute the model.
lowess_frac (float) – The lowess smoothing fraction parameter.
log_transform ({'counts', 'both', 'none', 'auto'}) –
- What to transform into log space.
counts: log-transform only the counts but not the distances. This results in semi-log models, which don’t work on fragment-level data yet.
both: log-transform both the counts and the distances, resulting in log-log models.
none: don’t log anything.
auto: automatically pick a reasonably choice based on the other kwargs.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. For bin-level data, this is a list of floats, where the
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. For fragment-level data, this is a dict mapping interaction distances in units of base pairs to the appropriate expected values.- Return type
Union[List[float], Dict[int, float]]
-
lib5c.algorithms.expected.
get_global_distance_expected
(counts, primermap=None, level='bin', powerlaw=False, regression=False, degree=1, lowess_smooth=False, lowess_frac=0.8, log_transform='auto', exclude_near_diagonal=False)[source]¶ Convenience function for computing a global one-dimensional expected model from a dict of observed counts, with properties that can be customized by kwargs.
- Parameters
counts (Dict[str, np.ndarray]) – The dict of observed counts to model.
primermap (Optional[Dict[str, List[Dict[str, Any]]]]) – A primermap corresponding to
counts
.level ({'bin', 'fragment'}) – The level of
counts
.powerlaw (bool) – Whether or not to fit a discrete power law distribution to the data.
regression (bool) – Whether or not to use a polynomial regression model.
degree (int) – The degree of the regression model to use.
lowess_smooth (bool) – Whether or not to use lowess smoothing to compute the model.
lowess_frac (float) – The lowess smoothing fraction parameter.
log_transform ({'counts', 'both', 'none', 'auto'}) –
- What to transform into log space.
counts: log-transform only the counts but not the distances. This results in semi-log models, which don’t work on fragment-level data yet.
both: log-transform both the counts and the distances, resulting in log-log models.
none: don’t log anything.
auto: automatically pick a reasonably choice based on the other kwargs.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. For bin-level data, this is a list of floats, where the
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. For fragment-level data, this is a dict mapping interaction distances in units of base pairs to the appropriate expected values.- Return type
Union[List[float], Dict[int, float]]
-
lib5c.algorithms.expected.
global_empirical_binned
(counts, log_transform=True)[source]¶ Make a global one-dimensional bin-level expected model by taking an average of the interaction values at each distance.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
log_transform (bool) – Pass True to take the geometric mean instead of the arithmetic mean, which is equivalent to averaging log-transformed counts.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. The length of this list will match the size of the largest region in the input counts dict.- Return type
List[float]
-
lib5c.algorithms.expected.
global_lowess_binned
(counts, frac=0.8, exclude_near_diagonal=False)[source]¶ Make a global one-dimensional bin-level expected model by performing lowess regression in unlogged space, excluding the first third of the distance scales and only using the emprical arithmetic means there instead.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
frac (float) – The lowess smoothing fraction parameter to use.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. The length of this list will match the size of the largest region in the input counts dict.- Return type
List[float]
-
lib5c.algorithms.expected.
global_lowess_binned_log_counts
(counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]¶ Make a global one-dimensional bin-level expected model by performing lowess regression in log-counts space, excluding the first third of the distance scales and only using the emprical geometric means there instead.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
pseudocount (int) – The pseudocount to add to the counts before logging.
frac (float) – The lowess smoothing fraction parameter to use.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. The length of this list will match the size of the largest region in the input counts dict.- Return type
List[float]
-
lib5c.algorithms.expected.
global_lowess_log_log_binned
(counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]¶ Make a global one-dimensional bin-level expected model by performing lowess regression in log-log space.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
pseudocount (int) – The pseudocount to add to the counts before logging.
frac (float) – The lowess smoothing fraction parameter to use.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. The length of this list will match the size of the largest region in the input counts dict.- Return type
List[float]
-
lib5c.algorithms.expected.
global_lowess_log_log_fragment
(counts, distances, pseudocount=1, frac=0.8)[source]¶ Make a global one-dimensional fragment-level expected model by performing lowess regression in log-log space.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
distances (Dict[str, np.ndarray]) – A dict of pairwise distance matrices describing the genomic distances between the elements of the matrices in
counts
. The keys and array dimensions should match the keys and array dimensions ofcounts
.pseudocount (int) – The pseudocount to add to the counts before logging.
frac (float) – The lowess smoothing fraction parameter to use.
- Returns
A mapping from interaction distances in units of base pairs to the expected value at that distance.
- Return type
Dict[int, float]
-
lib5c.algorithms.expected.
global_poly_log_log_binned
(counts, degree=1, pseudocount=1, exclude_near_diagonal=False)[source]¶ Make a global one-dimensional bin-level expected model by fitting a polynomial in log-log space.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
degree (int) – The degree of the polynomial to fit.
pseudocount (int) – The pseudocount to add to the counts before logging.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. The length of this list will match the size of the largest region in the input counts dict.- Return type
List[float]
-
lib5c.algorithms.expected.
global_poly_log_log_fragment
(counts, distances, degree=1, pseudocount=1)[source]¶ Make a global one-dimensional fragment-level expected model by fitting a polynomial in log-log space.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
distances (Dict[str, np.ndarray]) – A dict of pairwise distance matrices describing the genomic distances between the elements of the matrices in
counts
. The keys and array dimensions should match the keys and array dimensions ofcounts
.degree (int) – The degree of the polynomial to fit.
pseudocount (int) – The pseudocount to add to the counts before logging.
- Returns
A mapping from interaction distances in units of base pairs to the expected value at that distance.
- Return type
Dict[int, float]
-
lib5c.algorithms.expected.
global_powerlaw_binned
(counts, exclude_near_diagonal=False)[source]¶ Make a global one-dimensional bin-level expected model by fitting a polynomial in log-log space.
- Parameters
counts (Dict[str, np.ndarray]) – The observed counts dict to fit the model to.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins. The length of this list will match the size of the largest region in the input counts dict.- Return type
List[float]
-
lib5c.algorithms.expected.
interpolate_expected
(expected_matrix, regional_primermap, distance)[source]¶ Interpolate the value of an expected model (represented as a matrix) at an arbitrary distance scale.
- Parameters
expected_matrix (np.ndarray) – The expected matrix to use as a source for interpolation.
regional_primermap (List[Dict[str, Any]]) – The primermap for this region.
distance (int) – The interaction distance at which to estimate the expected value, in base pairs.
- Returns
The interpolated expected value, or -1 if
distance
is outside of the range of the expected model.- Return type
float
-
lib5c.algorithms.expected.
lowess_binned
(regional_counts, frac=0.8, exclude_near_diagonal=False)[source]¶ Make a regional one-dimensional bin-level expected model by performing lowess regression in unlogged space, excluding the first third of the region and only using the emprical geometric means there instead.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
frac (float) – The lowess smoothing fraction parameter to use.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins.- Return type
List[float]
-
lib5c.algorithms.expected.
lowess_binned_log_counts
(regional_counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]¶ Make a regional one-dimensional bin-level expected model by performing lowess regression in log-counts space, excluding the first third of the region and only using the emprical geometric means there instead.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
pseudocount (int) – The pseudocount to add to the counts before logging.
frac (float) – The lowess smoothing fraction parameter to use.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins.- Return type
List[float]
-
lib5c.algorithms.expected.
lowess_log_log_binned
(regional_counts, pseudocount=1, frac=0.8, exclude_near_diagonal=False)[source]¶ Make a regional one-dimensional bin-level expected model by performing lowess regression in log-log space.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
pseudocount (int) – The pseudocount to add to the counts before logging.
frac (float) – The lowess smoothing fraction parameter to use.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins.- Return type
List[float]
-
lib5c.algorithms.expected.
lowess_log_log_fragment
(regional_counts, distances, pseudocount=1, frac=0.8)[source]¶ Make a regional one-dimensional fragment-level expected model by performing lowess regression in log-log space.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
distances (np.ndarray) – The pairwise distance matrix for all fragments in this region in units of base pairs.
pseudocount (int) – The pseudocount to add to the counts before logging.
frac (float) – The lowess smoothing fraction parameter to use.
- Returns
A mapping from interaction distances in units of base pairs to the expected value at that distance.
- Return type
Dict[int, float]
-
lib5c.algorithms.expected.
make_distance_matrix
(regional_primermap)[source]¶ Construct a pairwise distance matrix for the fragments in a region from the primermap describing those fragments.
- Parameters
regional_primermap (List[Dict[str, Any]]) – The primermap for this region.
- Returns
The pairwise distance matrix for all fragments in this region in units of base pairs.
- Return type
np.ndarray
-
lib5c.algorithms.expected.
make_expected_dict_from_matrix
(expected_matrix, distance_matrix)[source]¶ Convert an expected matrix into a dict representation of the one-dimensional expected model it embodies.
- Parameters
expected_matrix (np.ndarray) – The expected matrix.
distance_matrix (np.ndarray) – The pairwise distance matrix for the fragments in this region.
- Returns
A mapping from interaction distances in units of base pairs to the expected value at that distance.
- Return type
Dict[int, float]
-
lib5c.algorithms.expected.
make_expected_matrix
(obs_matrix, regional_primermap=None, level='bin', powerlaw=False, regression=False, degree=1, lowess_smooth=False, lowess_frac=0.8, log_transform='auto', monotonic=False, donut=False, w=15, p=5, donut_frac=0.2, min_exp=0.1, log_donut=False, max_donut_ll=False, distance_expected=None, exclude_near_diagonal=False)[source]¶ Convenience function for computing a complete expected matrix given a matrix of observed counts that can be customized with a variety of kwargs.
- Parameters
obs_matrix (np.ndarray) – The matrix of observed counts to make an expected matrix for.
regional_primermap (Optional[List[Dict[str, Any]]]) – The primermap for this region. Required if
obs_matrix
is fragment-level.level ({'bin', 'fragment'}) – The level of
obs_matrix
.powerlaw (bool) – Whether or not to fit a discrete power law distribution to the data.
regression (bool) – Whether or not to use a polynomial regression model.
degree (int) – The degree of the regression model to use.
lowess_smooth (bool) – Whether or not to use lowess smoothing to compute the model.
lowess_frac (float) – The lowess smoothing fraction parameter.
log_transform ({'counts', 'both', 'none', 'auto'}) –
- What to transform into log space.
counts: log-transform only the counts but not the distances. This results in semi-log models, which don’t work on fragment-level data yet.
both: log-transform both the counts and the distances, resulting in log-log models.
none: don’t log anything.
auto: automatically pick a reasonably choice based on the other kwargs.
monotonic (bool) – Pass True to force the one-dimensional expected model to be monotonic.
donut (bool) – Pass True to apply donut-filter local correction to the expected model. Not implemented for fragment-level input data.
w (int) – The outer width of the donut when using donut correction. Should be an odd integer.
p (int) – The inner width of the donut when using donut correction. Should be an odd integer.
donut_frac (float) – If the fraction of possible elements in the donut that lie wihtin the region and have non-infinte values is lower than this fraction then the donut-corrected value at that point will be NaN.
min_exp (float) – If the sum of the 1-D expected matrix under the donut or lower left footprint for a particular pixel is less than this value, set the output at this pixel to nan to avoid numerical instability related to division by small numbers.
log_donut (bool) – Pass True to perform donut correction in log-counts space.
max_donut_ll (bool) – If
donut
is True, pass True here too to make the donut correction use the maximum of the “donut” and “lower-left” regions.distance_expected (Optional[Union[List[float], Dict[int, float]]]) – Pass a one-dimensional expected model to use it instead of computing a new one from scratch according to the other kwargs.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
Tuple[np.ndarray, Union[List[float], Dict[int, float]], Optional[
np.ndarray]] – The first element of the tuple is the expected matrix. The second element of the tuple is the one-dimensional expected model, which will be a list of expected values if
level
was ‘bin’ or a dict mapping integer distances to expected values iflevel
was ‘fragment’. The third element will be the pairwise distance matrix iflevel
was ‘fragment’, but will simply be None iflevel
was ‘bin’.
-
lib5c.algorithms.expected.
make_expected_matrix_from_dict
(distance_expected, distance_matrix)[source]¶ Converts a fragment-level one-dimensional expected model into an expected matrix.
- Parameters
distance_expected (Dict[int, float]) – A mapping from interaction distances in units of base pairs to the expected value at that distance.
distance_matrix (np.ndarray) – The pairwise distance matrix for the fragments in this region.
- Returns
The expected matrix.
- Return type
np.ndarray
-
lib5c.algorithms.expected.
make_expected_matrix_from_list
(distance_expected)[source]¶ Converts a bin-level one-dimensional expected model into an expected matrix.
- Parameters
distance_expected (List[float]) – The one-dimensional distance expected model to make a matrix out of.
- Returns
The expected matrix.
- Return type
np.ndarray
-
lib5c.algorithms.expected.
make_poly_log_log_binned_expected_matrix
(obs_matrix, exclude_near_diagonal=False)[source]¶ Convenience function for quickly making an expected matrix for a bin-level observed counts matrix based on a simple power law relationship.
- Parameters
obs_matrix (np.ndarray) – The matrix of observed counts.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The expected matrix.
- Return type
np.ndarray
-
lib5c.algorithms.expected.
make_poly_log_log_fragment_expected_matrix
(obs_matrix, regional_primermap)[source]¶ Convenience function for quickly making an expected matrix for a fragment-level observed counts matrix based on a simple power law relationship.
- Parameters
obs_matrix (np.ndarray) – The matrix of observed counts.
regional_primermap (List[Dict[str, Any]]) – Primermap describing the loci in the region represented by
obs_matrix
. Necessary to figure out distances between elements in the contact matrix.
- Returns
The expected matrix.
- Return type
np.ndarray
-
lib5c.algorithms.expected.
make_powerlaw_binned_expected_matrix
(obs_matrix, exclude_near_diagonal=False)[source]¶ Convenience function for quickly making an expected matrix for a bin-level observed counts matrix based on a simple power law relationship.
- Parameters
obs_matrix (np.ndarray) – The matrix of observed counts.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The expected matrix.
- Return type
np.ndarray
-
lib5c.algorithms.expected.
poly_log_log_binned
(regional_counts, degree=1, pseudocount=1, exclude_near_diagonal=False)[source]¶ Make a regional one-dimensional bin-level expected model by fitting a polynomial in log-log space.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
degree (int) – The degree of the polynomial to fit.
pseudocount (int) – The pseudocount to add to the counts before logging.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins.- Return type
List[float]
-
lib5c.algorithms.expected.
poly_log_log_fragment
(regional_counts, distances, degree=1, pseudocount=1)[source]¶ Make a regional one-dimensional fragment-level expected model by fitting a polynomial in log-log space.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
distances (np.ndarray) – The pairwise distance matrix for all fragments in this region in units of base pairs.
degree (int) – The degree of the polynomial to fit.
pseudocount (int) – The pseudocount to add to the counts before logging.
- Returns
A mapping from interaction distances in units of base pairs to the expected value at that distance.
- Return type
Dict[int, float]
-
lib5c.algorithms.expected.
powerlaw_binned
(regional_counts, exclude_near_diagonal=False)[source]¶ Make a regional one-dimensional bin-level expected model by fitting a polynomial in log-log space.
- Parameters
regional_counts (np.ndarray) – The observed counts matrix for this region.
exclude_near_diagonal (bool) – If regression or lowess_smooth are True, set this kwarg to True to ignore the first third of the distance scales when fitting the model.
- Returns
The one-dimensional expected model. The
i
th element of the list corresponds to the expected value for interactions between loci separated byi
bins.- Return type
List[float]