Trimming¶
An important early preprocessing step is the removal of low-quality primers from the dataset.
Command-line interface¶
Primer trimming can be accomplished on the command line by running
$ lib5c trim
For complete details on the usage of this command, see the output of
$ lib5c trim -h
Exposed functionality¶
The algorithms which make up the primer trimming framework can be found in
the lib5c.algorithms.trimming
subpackage.
The core API is exposed in the following convenience functions:
The functions wipe_counts()
and trim_counts()
also have convenience
wrappers which apply them over a counts superdict (dict of counts dicts, whose
first-level keys are replicate names), which are:
Workflow¶
The general workflow is to trim primers first (based on the quality of the counts matrices in the dataset), and then either trim or wipe those counts matrices:
from lib5c.algorithms.trimming import trim_primers, trim_counts_superdict
trimmed_primermap, trimmed_indices = trim_primers(primermap, counts_superdict)
trimmed_counts_superdict = trim_counts_superdict(counts_superdict, trimmed_indices)
The call to trim_primers()
does not modify the counts_superdict
, leaving
the client to decide what to do next.
Trimming versus wiping¶
trim_counts()
removes rows and columns from the matrices in the counts dict,
with the result that the dimensions of these matrices will match the lengths of
the values of trimmed_primermap
. This is the recommended way to treat
removal of low-quality fragments.
wipe_counts()
does not change the dimensions of any matrix, and instead
simply paints over the removed indices according to its kwarg wipe_value
.
This can be useful when removing low-quality regions from already-binned data,
for example:
from lib5c.algorithms.trimming import trim_primers, wipe_counts_superdict
_, trimmed_indices = trim_primers(pixelmap, counts_superdict)
wiped_counts_superdict = wipe_counts_superdict(counts_superdict, trimmed_indices)
Notice that we discard the trimmed_pixelmap
from the first function call,
because this pixelmap’s dimensions do not match any of the counts dicts.
Trimming options¶
There are two different ways to assess the quality of a primer: its total cis
contact count (row sum in the counts matrix) or the fraction of its possible
interactions which are nonzero. These two quality metrics are thresholded on by
the two kwargs of trim_primers()
: min_sum
and min_frac
.