Trimming

An important early preprocessing step is the removal of low-quality primers from the dataset.

Command-line interface

Primer trimming can be accomplished on the command line by running

$ lib5c trim

For complete details on the usage of this command, see the output of

$ lib5c trim -h

Exposed functionality

The algorithms which make up the primer trimming framework can be found in the lib5c.algorithms.trimming subpackage.

The core API is exposed in the following convenience functions:

The functions wipe_counts() and trim_counts() also have convenience wrappers which apply them over a counts superdict (dict of counts dicts, whose first-level keys are replicate names), which are:

Workflow

The general workflow is to trim primers first (based on the quality of the counts matrices in the dataset), and then either trim or wipe those counts matrices:

from lib5c.algorithms.trimming import trim_primers, trim_counts_superdict

trimmed_primermap, trimmed_indices = trim_primers(primermap, counts_superdict)
trimmed_counts_superdict = trim_counts_superdict(counts_superdict, trimmed_indices)

The call to trim_primers() does not modify the counts_superdict, leaving the client to decide what to do next.

Trimming versus wiping

trim_counts() removes rows and columns from the matrices in the counts dict, with the result that the dimensions of these matrices will match the lengths of the values of trimmed_primermap. This is the recommended way to treat removal of low-quality fragments.

wipe_counts() does not change the dimensions of any matrix, and instead simply paints over the removed indices according to its kwarg wipe_value. This can be useful when removing low-quality regions from already-binned data, for example:

from lib5c.algorithms.trimming import trim_primers, wipe_counts_superdict

_, trimmed_indices = trim_primers(pixelmap, counts_superdict)
wiped_counts_superdict = wipe_counts_superdict(counts_superdict, trimmed_indices)

Notice that we discard the trimmed_pixelmap from the first function call, because this pixelmap’s dimensions do not match any of the counts dicts.

Trimming options

There are two different ways to assess the quality of a primer: its total cis contact count (row sum in the counts matrix) or the fraction of its possible interactions which are nonzero. These two quality metrics are thresholded on by the two kwargs of trim_primers(): min_sum and min_frac.