lib5c.tools.helpers module

lib5c.tools.helpers.infer_level_mapping(rep_names, triggers)[source]

Infers a mapping from replicate names to level names (i.e., classes or conditions) using a simple trigger substring approach.

A replicate is assigned to a level if the level’s trigger substring is a substring of the replicate name.

Parameters
  • rep_names (list of str) – The replicate names to assign levels to.

  • triggers (dict or list of str) – Pass a dict mapping trigger substrings to level names, or pass a list of level names to use the level names as their own trigger substrings.

Returns

A mapping from rep_names to level names.

Return type

dict

lib5c.tools.helpers.infer_replicate_names(infiles, as_dict=False, pattern=None)[source]

Infers replicate names given a list of filenames.

Parameters
  • infiles (list of str) – The filenames to consider.

  • as_dict (bool) – Pass True to make this function return a dict mapping the the infiles to their inferred replicate names.

  • pattern (str, optional) – If the infiles are glob-based matches to a patten containing one wildcard, pass that pattern to use it to reconstruct the replicate names.

Returns

If as_dict is False (the default), this is just the list of inferred rep names, in the same order as infiles. If as_dict is True, this is a dict mapping the original infiles to their inferred replicate names.

Return type

list of str or dict

lib5c.tools.helpers.resolve_expected_models(expected_model_string, observed_counts, primermap, level=None)[source]

Convenience helper for resolving expected models.

Parameters
  • expected_model_string (str) – If None, we expect to estimate fresh expected models from observed_counts. If a path to a specific countsfile, we expect that it contains the expected model to be used for all the observed counts. If a glob-expandable path, we expect that each file matching the pattern is to be used for one of the observed counts (assuming they too are in glob order).

  • observed_counts (list of dict of np.ndarray) – Each element in the list is one replicate, represented as a counts dict of observed values.

  • primermap (primermap) – The primermap to use for parsing files, etc.

  • level ({'bin', 'fragment'}) – The level to use if a fresh expected modeul must be estimated.

Returns

The resolved expected models.

Return type

list of dict of np.ndarray

lib5c.tools.helpers.resolve_level(primermap, level='auto')[source]

Resolves the level of some input data.

Parameters
  • primermap (primermap) – Primermap to try to resolve the level of.

  • level (str) – If you already know the level, you can pass it as a string here.

Returns

The resolved level.

Return type

str

Notes

This function operates on a “three in a row” heuristic: if the first three bins in the primemap are all of the same size, then we guess that it’s bin level data.

lib5c.tools.helpers.resolve_parallel(parser, args, subcommand='', key_arg='infile', root_command='lib5c')[source]

Parallelizes as a command via bsub if it is available.

Parameters
  • parser (argparse.ArgumentParser) – The parser used to parse the args for the root command.

  • args (argparse.Namespace) – The args parsed by the parser.

  • subcommand (str) – The particular subcommand of the root command being invoked.

  • key_arg (str) – The argument to parallelize over.

  • root_command (str) – The string used to invoke the root command.

lib5c.tools.helpers.resolve_primerfile(infile, primerfile=None)[source]

Searches for a primerfile next to in infile.

Parameters
  • infile (str or list of str) – The infile(s) to look next to.

  • primerfile (str, optional) – If you already know where the primerfile is pass it here to skip the search.

Returns

The primerfile.

Return type

str

lib5c.tools.helpers.split_self_regionally(regions, script='lib5c', hang=False)[source]

Allows a command line script that accepts a –region flag to split itself into a separate command run for each region.

Parameters
  • regions (list of str) – The regions to split into.

  • script (str) – The name of the script to invoke.

  • hang (bool) – Pass True to cause the original executing process to hang until all the bsub jobs spawned by this function complete. This does nothing if bsub is not available.