lib5c.tools.helpers module¶
-
lib5c.tools.helpers.
infer_level_mapping
(rep_names, triggers)[source]¶ Infers a mapping from replicate names to level names (i.e., classes or conditions) using a simple trigger substring approach.
A replicate is assigned to a level if the level’s trigger substring is a substring of the replicate name.
- Parameters
rep_names (list of str) – The replicate names to assign levels to.
triggers (dict or list of str) – Pass a dict mapping trigger substrings to level names, or pass a list of level names to use the level names as their own trigger substrings.
- Returns
A mapping from rep_names to level names.
- Return type
dict
-
lib5c.tools.helpers.
infer_replicate_names
(infiles, as_dict=False, pattern=None)[source]¶ Infers replicate names given a list of filenames.
- Parameters
infiles (list of str) – The filenames to consider.
as_dict (bool) – Pass True to make this function return a dict mapping the the infiles to their inferred replicate names.
pattern (str, optional) – If the infiles are glob-based matches to a patten containing one wildcard, pass that pattern to use it to reconstruct the replicate names.
- Returns
If as_dict is False (the default), this is just the list of inferred rep names, in the same order as infiles. If as_dict is True, this is a dict mapping the original infiles to their inferred replicate names.
- Return type
list of str or dict
-
lib5c.tools.helpers.
resolve_expected_models
(expected_model_string, observed_counts, primermap, level=None)[source]¶ Convenience helper for resolving expected models.
- Parameters
expected_model_string (str) – If None, we expect to estimate fresh expected models from
observed_counts
. If a path to a specific countsfile, we expect that it contains the expected model to be used for all the observed counts. If a glob-expandable path, we expect that each file matching the pattern is to be used for one of the observed counts (assuming they too are in glob order).observed_counts (list of dict of np.ndarray) – Each element in the list is one replicate, represented as a counts dict of observed values.
primermap (primermap) – The primermap to use for parsing files, etc.
level ({'bin', 'fragment'}) – The level to use if a fresh expected modeul must be estimated.
- Returns
The resolved expected models.
- Return type
list of dict of np.ndarray
-
lib5c.tools.helpers.
resolve_level
(primermap, level='auto')[source]¶ Resolves the level of some input data.
- Parameters
primermap (primermap) – Primermap to try to resolve the level of.
level (str) – If you already know the level, you can pass it as a string here.
- Returns
The resolved level.
- Return type
str
Notes
This function operates on a “three in a row” heuristic: if the first three bins in the primemap are all of the same size, then we guess that it’s bin level data.
-
lib5c.tools.helpers.
resolve_parallel
(parser, args, subcommand='', key_arg='infile', root_command='lib5c')[source]¶ Parallelizes as a command via bsub if it is available.
- Parameters
parser (argparse.ArgumentParser) – The parser used to parse the args for the root command.
args (argparse.Namespace) – The args parsed by the parser.
subcommand (str) – The particular subcommand of the root command being invoked.
key_arg (str) – The argument to parallelize over.
root_command (str) – The string used to invoke the root command.
-
lib5c.tools.helpers.
resolve_primerfile
(infile, primerfile=None)[source]¶ Searches for a primerfile next to in infile.
- Parameters
infile (str or list of str) – The infile(s) to look next to.
primerfile (str, optional) – If you already know where the primerfile is pass it here to skip the search.
- Returns
The primerfile.
- Return type
str
-
lib5c.tools.helpers.
split_self_regionally
(regions, script='lib5c', hang=False)[source]¶ Allows a command line script that accepts a –region flag to split itself into a separate command run for each region.
- Parameters
regions (list of str) – The regions to split into.
script (str) – The name of the script to invoke.
hang (bool) – Pass True to cause the original executing process to hang until all the bsub jobs spawned by this function complete. This does nothing if bsub is not available.