lib5c.algorithms.filtering.filter_functions module¶
Module providing utilities for defining and constructing filter functions.
-
lib5c.algorithms.filtering.filter_functions.
amean_gaussian
(sigma=1000.0, norm_ord=1, check_threshold=0.2)[source]¶ Constructs a filter function that uses the arithmetic mean with Gaussian weights as the aggregating function and a p-norm as the norm function.
- Parameters
sigma (float) – The standard deviation to use for the Gaussian when assigning weights.
norm_ord (int) – The order of the p-norm to use to convert (x-dist, y-dist) vectors to scalar distances.
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
amean_inverse
(bin_width=4000, norm_ord=1, check_threshold=0.2)[source]¶ Constructs a filter function that uses the arithmetic mean with “inverse” weights as the aggregating function and a p-norm as the norm function.
- Parameters
bin_width (int) – The bin width in base pairs.
norm_ord (int) – The order of the p-norm to use to convert (x-dist, y-dist) vectors to scalar distances.
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
arithmetic_mean
(check_threshold=0.2)[source]¶ Constructs a filter function that uses the unweighted arithmetic mean as the aggregating function.
- Parameters
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
check_neighboorhood_nonnan
(neighborhood, threshold)[source]¶ Check to see if a neighborhood clears as specified non-nan fraction threshold.
- Parameters
neighborhood (List[Dict[str, Any]]) –
A list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
threshold (float) – If less than this fraction of the values in the neighborhood are non-infinite, the neighborhood fails the check.
- Returns
True if this neighborhood clears the threshold, otherwise False.
- Return type
bool
-
lib5c.algorithms.filtering.filter_functions.
check_neighboorhood_positive
(neighborhood, threshold)[source]¶ Check to see if a neighborhood clears as specified positive fraction threshold.
- Parameters
neighborhood (List[Dict[str, Any]]) –
A list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
threshold (float) – If less than this fraction of the values in the neighborhood are positive, the neighborhood fails the check.
- Returns
True if this neighborhood clears the threshold, otherwise False.
- Return type
bool
-
lib5c.algorithms.filtering.filter_functions.
geometric_mean
(check_threshold=0.2)[source]¶ Constructs a filter function that uses the unweighted geometric mean as the aggregating function.
- Parameters
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
gmean_gaussian
(sigma=1000.0, norm_ord=1, check_threshold=0.2)[source]¶ Constructs a filter function that uses the geometric mean with Gaussian weights as the aggregating function and a p-norm as the norm function.
- Parameters
sigma (float) – The standard deviation to use for the Gaussian when assigning weights.
norm_ord (int) – The order of the p-norm to use to convert (x-dist, y-dist) vectors to scalar distances.
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
gmean_inverse
(bin_width=4000, norm_ord=1, check_threshold=0.2)[source]¶ Constructs a filter function that uses the geometric mean with “inverse” weights as the aggregating function and a p-norm as the norm function.
- Parameters
bin_width (int) – The bin width in base pairs.
norm_ord (int) – The order of the p-norm to use to convert (x-dist, y-dist) vectors to scalar distances.
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
inverse_weighting_function
(distance, bin_width=None)[source]¶ The “inverse” weighting function used in Yaffe and Tanay 2011.
- Parameters
distance (float) – The distance to compute a weight for, in base pairs.
bin_width (Optional[int]) – The bin width in base pairs. Used to make results equivalent to Yaffe and Tanay 2011 by scaling
distance
to units of bins. Pass None to simply leave the distance in units of base pairs
- Returns
A weight appropriate for this distance.
- Return type
float
-
lib5c.algorithms.filtering.filter_functions.
make_filter_function
(function='gmean', threshold=0.0, norm_order=1, bin_width=4000, sigma=12000.0, inverse=False, gaussian=False)[source]¶ Convenience function for quickly constructing filtering functions with desired properties.
- Parameters
function ({'sum', 'median', 'amean', 'gmean'}) – The aggregation function to use. This is the operation that will be applied to all points in the neighborhood, after weighting their values if appropriate.
threshold (float) – If less than this fraction of the values in a neighborhood are non-infinite, the filter function will return nan for that neighborhood.
norm_order (int) – The order of p-norm to use when computing distances.
bin_width (int) – The width of each bin in base pairs. This value is used to scale certain weights.
sigma (float) – The value to use for the standard deviation of the Gaussian when using Gaussian weights.
inverse (bool) – Pass True to use “inverse” weights as in Yaffe and Tanay 2011.
gaussian (bool) – Pass True to use Gaussian weights with standard deviation
sigma
.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
median
(check_threshold=0.2)[source]¶ Constructs a filter function that uses the median as the aggregating function.
- Parameters
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
norm_filter_function
(weighted_function, norm_function, weighted_kwargs=None, norm_kwargs=None, pseudocount=0, check_function=None, check_threshold=None)[source]¶ Constructs a filter function that passes the value and some distance norm (as specified by
norm_function
) for each point in the neighborhood to a special aggregation function capable of performing weighted aggregation based on these distances.- Parameters
weighted_function (Callable[[List[Dict[str, float]], float]) –
A special aggregation function that takes in a list of points represented as dicts with the following structure:
{ 'value': float, 'dist': float }
where ‘value’ is the interaction value at that point and ‘dist’ is its scalar distance from the neighborhood. This function should then return a float representing the aggregate value of the neighborhood, weighted using the distances.
norm_function (Callable[[Tuple[int]], float]) – A function that takes in a tuple of ints representing the x- and y-axis distances of a point to the neighborhood and returns a scalar value representing the distance.
weighted_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to
weighted_function
.norm_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to
norm_function
.pseudocount (float) – A pseudocount to be added to the values before applying the aggregation function. Useful if the aggregation function has catastrophic behavior when one input value is zero.
check_function (Optional[Callable[[List[Dict[str, Any]], float], bool]]) – A function that takes in a neighborhood and a threshold value and performs some sort of test on the neighborhood, returning False if the filter function should return NaN for the neighborhood because it fails some critical condition.
check_threshold (float) – The threshold to pass as the second arg to
check_function
.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
simple_sum
(check_threshold=0.2)[source]¶ Constructs a filter function that uses a simple sum as the aggregating function.
- Parameters
check_threshold (float) – If less than this fraction of the values in a neighborhood are positive, the filter function will return NaN.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
value_filter_function
(function, function_kwargs=None, pseudocount=0, check_function=None, check_threshold=None)[source]¶ Constructs a filter function that passes the values in the neighborhood to an aggregation function.
- Parameters
function (Callable[Sequence[float], float]) – The aggregation function to use on the values in each neighborhood.
function_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to
function
.pseudocount (float) – A pseudocount to be added to the values before applying the aggregation function. Useful if the aggregation function has catastrophic behavior when one input value is zero.
check_function (Optional[Callable[[List[Dict[str, Any]], float], bool]]) – A function that takes in a neighborhood and a threshold value and performs some sort of test on the neighborhood, returning False if the filter function should return NaN for the neighborhood because it fails some critical condition.
check_threshold (float) – The threshold to pass as the second arg to
check_function
.
- Returns
The constructed filter function. This function takes in a “neighborhood” and returns the filtered value given that neighborhood. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{ 'value': float, 'x_dist': int, 'y_dist': int }
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
- Return type
Callable[[List[Dict[str, Any]]], float]
-
lib5c.algorithms.filtering.filter_functions.
weighted_amean
(values, weights)[source]¶ Weighted version of the arithmetic mean.
- Parameters
values (Sequence[float]) – The values to aggregate.
weights (Sequence[float]) – The weights for each value.
- Returns
The weighted arithmetic mean of the values given the weights.
- Return type
float
-
lib5c.algorithms.filtering.filter_functions.
weighted_gmean
(values, weights)[source]¶ Weighted version of the geometric mean.
- Parameters
values (Sequence[float]) – The values to aggregate.
weights (Sequence[float]) – The weights for each value.
- Returns
The weighted geometric mean of the values given the weights.
- Return type
float
-
lib5c.algorithms.filtering.filter_functions.
weighted_values_distances_function
(weighting_function, aggregating_function, weighting_kwargs=None, aggregating_kwargs=None, cache=True)[source]¶ Constructs a weighted aggregation function appropriate for use with
norm_filter_function()
.- Parameters
weighting_function (Callable[[float], float]) – A function that takes in a distance and returns a weight.
aggregating_function (Callable[[Sequence[float], Sequence[float]], float]) – A special aggregating function that takes in the values and the weights as parallel vectors and returns the aggregated value.
weighting_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to
weighting_function
.aggregating_kwargs (Optional[Dict[str, Any]]) – Kwargs to be passed to
aggregating_function
.cache (bool) – Pass True to make the returned function use a cache to avoid recomputing expensive weighting function calls.
- Returns
A special aggregation function that takes in a list of points represented as dicts with the following structure:
{ 'value': float, 'dist': float }
where ‘value’ is the interaction value at that point and ‘dist’ is its scalar distance from the neighborhood. This function returns a float representing the aggregate value of the neighborhood, weighted using the distances.
- Return type
Callable[[List[Dict[str, float]], float]