Binning and smoothing¶
To reduce spatial noise in 5C data, we can treat the 5C contact frequencies as a 2-D signal that can be smoothed with various filtering functions. Depending on the original coordinates of the contact matrix and the coordinates we choose to evaluate the filtered signal at, this process can be referred to as “binning” or “smoothing”.
Theoretical overview¶
We will create “filtering functions” to pass over the contact matrices. For generality in terms of the level of data the filtering functions can be applied to, we will require that filtering functions compute values on “neighborhoods” of points defined by spatial proximity to the point we want to evaluate the filtered signal at.
Command-line interfaces¶
Command-line interfaces for binning and smoothing countsfiles are provided
directly in lib5c
.
Binning¶
If we have a fragment-level countsfile called fragment_level.counts
, a
primer bedfile called primers.bed
, and a binned bedfile called bins.bed
,
we can run
$ lib5c bin -w 20000 -p primers.bed -b bins.bed fragment_level.counts binned.counts
to bin the counts using a 20 kb window width.
For a complete list of command-line flags for the lib5c bin
subcommand, we can run
$ lib5c bin -h
Smoothing¶
If we have a countsfile called unsmoothed.counts
and a bedfile called
loci.bed
, we can run
$ lib5c smooth -w 20000 -p loci.bed -b bins.bed unsmoothed.counts smoothed.counts
to smooth the counts using a 20 kb window width.
For a complete list of command-line flags for the lib5c smooth
subcommand, we can run
$ lib5c smooth -h
Exposed functionality¶
The algorithms which make up the filtering can be found in the
lib5c.algorithms.filtering
subpackage.
Core API¶
The core API for filtering is in the form of three convenience functions:
lib5c.algorithms.filtering.bin_bin_filtering.bin_bin_filter()
lib5c.algorithms.filtering.fragment_fragment_filtering.fragment_fragment_filter()
lib5c.algorithms.filtering.fragment_bin_filtering.fragment_bin_filter()
These functions take in a counts matrix and return a filtered counts matrix. They also take in a filter function, which will be described in detail below. They also require information about the loci involved in the filtering. Finally, they require a distance threshold for defining the neighborhood of points that will be passed to the filter function.
Filter function API¶
Filter functions must take in a representation of a “neighborhood” around a point and return a scalar value representing the evaluation of the filtered signal at that point. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:
{
'value': float,
'x_dist': int,
'y_dist': int
}
where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.
More formally, the Python type annotation for a filter function is:
Callable[[List[Dict[str, Any]]], float]
The lib5c.algorithms.filtering.filter_functions
module provides a
framework for the construction of filter functions with desired properties. The
key function exposed there is
lib5c.algorithms.filtering.filter_functions.make_filter_function()
which constructs a filter function according to the specification in the kwargs and returns it.