Binning and smoothing

To reduce spatial noise in 5C data, we can treat the 5C contact frequencies as a 2-D signal that can be smoothed with various filtering functions. Depending on the original coordinates of the contact matrix and the coordinates we choose to evaluate the filtered signal at, this process can be referred to as “binning” or “smoothing”.

Theoretical overview

We will create “filtering functions” to pass over the contact matrices. For generality in terms of the level of data the filtering functions can be applied to, we will require that filtering functions compute values on “neighborhoods” of points defined by spatial proximity to the point we want to evaluate the filtered signal at.

Command-line interfaces

Command-line interfaces for binning and smoothing countsfiles are provided directly in lib5c.

Binning

If we have a fragment-level countsfile called fragment_level.counts, a primer bedfile called primers.bed, and a binned bedfile called bins.bed, we can run

$ lib5c bin -w 20000 -p primers.bed -b bins.bed fragment_level.counts binned.counts

to bin the counts using a 20 kb window width.

For a complete list of command-line flags for the lib5c bin subcommand, we can run

$ lib5c bin -h

Smoothing

If we have a countsfile called unsmoothed.counts and a bedfile called loci.bed, we can run

$ lib5c smooth -w 20000 -p loci.bed -b bins.bed unsmoothed.counts smoothed.counts

to smooth the counts using a 20 kb window width.

For a complete list of command-line flags for the lib5c smooth subcommand, we can run

$ lib5c smooth -h

Exposed functionality

The algorithms which make up the filtering can be found in the lib5c.algorithms.filtering subpackage.

Core API

The core API for filtering is in the form of three convenience functions:

These functions take in a counts matrix and return a filtered counts matrix. They also take in a filter function, which will be described in detail below. They also require information about the loci involved in the filtering. Finally, they require a distance threshold for defining the neighborhood of points that will be passed to the filter function.

Filter function API

Filter functions must take in a representation of a “neighborhood” around a point and return a scalar value representing the evaluation of the filtered signal at that point. A neighborhood is represented as a list of “nearby points” where each nearby point is represented as a dict of the following form:

{
    'value': float,
    'x_dist': int,
    'y_dist': int
}

where ‘value’ is the value at the point and ‘x_dist’ and ‘y_dist’ are its distances from the center of the neighborhood along the x- and y-axis, respectively, in base pairs.

More formally, the Python type annotation for a filter function is:

Callable[[List[Dict[str, Any]]], float]

The lib5c.algorithms.filtering.filter_functions module provides a framework for the construction of filter functions with desired properties. The key function exposed there is

lib5c.algorithms.filtering.filter_functions.make_filter_function()

which constructs a filter function according to the specification in the kwargs and returns it.