lib5c.contrib.luigi.tasks module

Provides luigi Task subclasses that wrap the lib5c command line functions.

class lib5c.contrib.luigi.tasks.BinTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.FilteringTask

Task class for binning fragment-level countsfiles into binned countsfiles.

Wraps the lib5c bin command line command.

Input/output specification:
  • self.input()[0]: the bin .bed file

  • self.input()[1]: the primer .bed file

  • self.input()[2]: the input fragment-level countsfile

  • self.output(): the resulting countsfile of binned observed values

heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.CmdTask(*args, **kwargs)[source]

Bases: luigi.task.Task

Luigi Task parent class for Tasks whose run() behavior should be to execute a specific command on the command line.

Subclasses must implement _construct_cmd_string(), which should return a string corresponding to the command to be run on the command line.

If the bsub Python package is installed, the command will be executed using the bsub scheduling system, and the caller will wait for the job corresponding to the task to complete.

If the bsub Python package is not installed, the command will be simply executed via subprocess.

run()[source]

Generic run() implementation for command line Tasks.

class lib5c.contrib.luigi.tasks.CrossVarianceTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.VarianceTask

Task class for computing variance estimates using the cross-replicate variance method.

Wraps the lib5c variance command line command called with -s/--source cross_rep.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input expected countsfile

  • self.input()[2:]: the input observed countsfiles for each replicate

  • self.output(): the resulting countsfile of variance estimates

This class defines a conditions Parameter which should be used to ensure that the input observed countsfiles passed in self.input()[2:] all belong to the same condition. This logic is not implemented here.

conditions = <luigi.parameter.Parameter object>
source = <luigi.parameter.Parameter object>
class lib5c.contrib.luigi.tasks.DetermineBinsTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for determining bin locations.

Wraps the lib5c determine-bins command line command.

Input/output specification:
  • self.input(): the input primer .bed file

  • self.output(): the resulting bin .bed file

bin_width = <luigi.parameter.IntParameter object>
class lib5c.contrib.luigi.tasks.DistributionTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

dist = <luigi.parameter.Parameter object>
log = <luigi.parameter.BoolParameter object>
mode = <luigi.parameter.Parameter object>
class lib5c.contrib.luigi.tasks.DivideTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for dividing one countsfile by another.

Wraps the lib5c divide command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the dividend (countsfile to divide)

  • self.input()[2]: the divisor (countsfile to divide by)

  • self.output(): the quotient (countsfile resulting from the division)

heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.ExpectedTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for computing expected models.

Wraps the lib5c expected command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input observed countsfile

  • self.output(): the resulting countsfile of expected values

degree = <luigi.parameter.IntParameter object>
donut = <luigi.parameter.BoolParameter object>
donut_frac = <luigi.parameter.FloatParameter object>
exclude_near_diagonal = <luigi.parameter.BoolParameter object>
global_expected = <luigi.parameter.BoolParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
log_donut = <luigi.parameter.BoolParameter object>
log_transform = <luigi.parameter.Parameter object>
lowess = <luigi.parameter.BoolParameter object>
lowess_frac = <luigi.parameter.FloatParameter object>
max_with_lower_left = <luigi.parameter.BoolParameter object>
min_exp = <luigi.parameter.FloatParameter object>
monotonic = <luigi.parameter.BoolParameter object>
p = <luigi.parameter.IntParameter object>
plot_outfile = <luigi.parameter.Parameter object>
plot_outfile_hexbin = <luigi.parameter.BoolParameter object>
plot_outfile_kde = <luigi.parameter.BoolParameter object>
powerlaw = <luigi.parameter.BoolParameter object>
regression = <luigi.parameter.BoolParameter object>
run()
w = <luigi.parameter.IntParameter object>
class lib5c.contrib.luigi.tasks.ExpressTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for applying Express bias correction to countsfiles.

Wraps the lib5c express command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile

  • self.output(): the resulting Express-normalized countsfile

bias = <luigi.parameter.BoolParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.FilteringTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Parent Task class for Tasks related to binning and smoothing.

inverse_weights = <luigi.parameter.BoolParameter object>
threshold = <luigi.parameter.FloatParameter object>
window_function = <luigi.parameter.Parameter object>
window_width = <luigi.parameter.IntParameter object>
wipe_unsmoothable_columns = <luigi.parameter.BoolParameter object>
class lib5c.contrib.luigi.tasks.IcedTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for applying ICED bias correction to countsfiles.

Wraps the lib5c iced command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile

  • self.output(): the resulting ICED-normalized countsfile

bias = <luigi.parameter.BoolParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
imputation_size = <luigi.parameter.IntParameter object>
run()
class lib5c.contrib.luigi.tasks.InteractionScoreTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for converting p-values to interaction scores.

Wraps the lib5c interaction-score command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile of p-values

  • self.output(): the resulting countsfile of interaction scores

heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.KnightRuizTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for applying KR bias correction to countsfiles.

Wraps the lib5c kr command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile

  • self.output(): the resulting KR-normalized countsfile

bias = <luigi.parameter.BoolParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
imputation_size = <luigi.parameter.IntParameter object>
run()
class lib5c.contrib.luigi.tasks.LegacyPvaluesOneTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.DistributionTask

bias = <luigi.parameter.BoolParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.LegacyPvaluesTwoTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

bias = <luigi.parameter.BoolParameter object>
dist = <luigi.parameter.Parameter object>
distance_tolerance = <luigi.parameter.IntParameter object>
fractional_tolerance = <luigi.parameter.FloatParameter object>
grouping = <luigi.parameter.Parameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
log = <luigi.parameter.BoolParameter object>
mode = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.LegacyVisualizeFitTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.DistributionTask, lib5c.contrib.luigi.tasks.RegionalTaskMixin

distance_scale = <luigi.parameter.IntParameter object>
expected_value = <luigi.parameter.FloatParameter object>
tolerance = <luigi.parameter.FloatParameter object>
class lib5c.contrib.luigi.tasks.LegacyVisualizeVarianceTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.DistributionTask, lib5c.contrib.luigi.tasks.RegionalTaskMixin

class lib5c.contrib.luigi.tasks.LogTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for logging or unlogging a countsfile.

Wraps the lib5c log command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile (to be logged)

  • self.output(): the resulting countsfile (after logging)

log_base = <luigi.parameter.Parameter object>
pseudocount = <luigi.parameter.FloatParameter object>
unlog = <luigi.parameter.BoolParameter object>
class lib5c.contrib.luigi.tasks.OutliersTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for applying high outlier removal to countsfiles.

Wraps the lib5c outliers command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile

  • self.output(): the resulting outlier-filtered countsfile

fold_threshold = <luigi.parameter.FloatParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
overwrite_value = <luigi.parameter.Parameter object>
run()
window_size = <luigi.parameter.IntParameter object>
class lib5c.contrib.luigi.tasks.PvalueTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for calling p-values.

Wraps the lib5c pvalues command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input observed countsfile

  • self.input()[2]: the input expected countsfile

  • self.input()[3]: the input variance countsfile

  • self.output(): the resulting countsfile of p-values

distribution = <luigi.parameter.Parameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
log = <luigi.parameter.BoolParameter object>
run()
vst = <luigi.parameter.BoolParameter object>
class lib5c.contrib.luigi.tasks.QnormTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for applying quantile normalization to countsfiles.

Wraps the lib5c qnorm command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1:]: the input countsfiles

  • self.output(): not specified explicitly, see below

Technically this class should specify a list of outputs, one for each input countsfile. In practice, this specification of outputs is left to whatever code strings together the pipeline. The lib5c qnorm command will produce output files on disk based on the outfile_pattern and the file names of the input countsfiles.

averaging = <luigi.parameter.BoolParameter object>
condition_on = <luigi.parameter.Parameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
outfile_pattern = <luigi.parameter.Parameter object>
reference = <luigi.parameter.Parameter object>
regional = <luigi.parameter.BoolParameter object>
run()
class lib5c.contrib.luigi.tasks.QvaluesTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for converting p-values to q-values.

Wraps the lib5c qvalues command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile of p-values

  • self.output(): the resulting countsfile of q-values

heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
method = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.RegionalTaskMixin[source]

Bases: object

Mixin class for Tasks that write a separate output file per region.

region = <luigi.parameter.Parameter object>
class lib5c.contrib.luigi.tasks.SmoothTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.FilteringTask

Task class for smoothing countsfiles.

Wraps the lib5c smooth command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input observed countsfile

  • self.output(): the resulting countsfile of smooth observed values

heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.SplineTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for applying explicit spline bias correction to countsfiles.

Wraps the lib5c spline command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input countsfile

  • self.output(): the resulting spline-normalized countsfile

bias_factors = <luigi.parameter.ListParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
knots = <luigi.parameter.ListParameter object>
model_outfile = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.SubtractTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for subtracting one countsfile from another.

Wraps the lib5c subtract command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the minuend (countsfile to subtract from)

  • self.input()[2]: the subtrahend (countsfile to subtract)

  • self.output(): the difference (countsfile resulting from the subtraction)

heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
run()
class lib5c.contrib.luigi.tasks.ThresholdTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for thresholding p-value countsfiles to call loops.

Wraps the lib5c threshold command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1:]: the input countsfiles of p-values

  • self.output()[0]: the output countsfile of called loops

  • self.output()[1]: the output text file summarizing the loop calls

  • self.output()[2]: the output .csv file containing the complete analysis results

background_threshold = <luigi.parameter.FloatParameter object>
bh_fdr = <luigi.parameter.BoolParameter object>
concordant = <luigi.parameter.BoolParameter object>
conditions = <luigi.parameter.Parameter object>
dataset_outfile = <luigi.parameter.Parameter object>
distance_threshold = <luigi.parameter.IntParameter object>
heatmap = <luigi.parameter.BoolParameter object>
heatmap_outdir = <luigi.parameter.Parameter object>
kappa_confusion_outfile = <luigi.parameter.Parameter object>
run()
significance_threshold = <luigi.parameter.FloatParameter object>
size_threshold = <luigi.parameter.IntParameter object>
two_tail = <luigi.parameter.BoolParameter object>
class lib5c.contrib.luigi.tasks.VarianceTask(*args, **kwargs)[source]

Bases: lib5c.contrib.luigi.tasks.CmdTask

Task class for computing variance estimates.

Wraps the lib5c variance command line command.

Input/output specification:
  • self.input()[0]: the primer or bin .bed file

  • self.input()[1]: the input observed countsfile

  • self.input()[2]: the input expected countsfile

  • self.output(): the resulting countsfile of variance estimates

agg_fn = <luigi.parameter.Parameter object>
fitter = <luigi.parameter.Parameter object>
logx = <luigi.parameter.BoolParameter object>
logy = <luigi.parameter.BoolParameter object>
min_disp = <luigi.parameter.Parameter object>
min_dist = <luigi.parameter.IntParameter object>
min_obs = <luigi.parameter.FloatParameter object>
model = <luigi.parameter.Parameter object>
regional = <luigi.parameter.BoolParameter object>
source = <luigi.parameter.Parameter object>
x_unit = <luigi.parameter.Parameter object>
y_unit = <luigi.parameter.Parameter object>
lib5c.contrib.luigi.tasks.add_visualization_hooks(f, pvalue=False, obs_over_exp=False, tetris=False)[source]

Decorator intended to wrap the run() method of luigi Task subclasses to automatically visualize the result of the Task class after it completes.

Parameters
  • f (function) – The function to add visualization hooks to. Intended to be the run() method of luigi Task subclasses.

  • pvalue (bool) – Pass True to denote that the visualized heatmaps should be drawn using the p-value colorscale.

  • obs_over_exp (bool) – Pass True to denote that the visualized heatmaps should be drawn using the obs_over_exp colorscale.

  • tetris (bool) – Pass True to denote that the visualized heatmaps should be drawn as tetris heatmaps.

Returns

The hooked function.

Return type

function

lib5c.contrib.luigi.tasks.get_all_lines(filename)[source]

Utility function for reading all lines from a file on disk.

Parameters

filename (str) – The file to read from.

Returns

The contents of the file.

Return type

str

lib5c.contrib.luigi.tasks.parallelize_reps(task_class, reps, **kwargs)[source]

Parallelizes any Task class whose constructor accepts a rep kwarg across a list of reps by creating a new WrapperTask.

Parameters
  • task_class (luigi.Task subclass) – The Task to parallelize.

  • reps (list of str) – List of reps to parallelize over.

  • kwargs (kwargs) – Additional kwargs to pass through to the Task class.

Returns

A WrapperTask which simply requires the original task_class to be run for every rep in reps.

Return type

luigi.WrapperTask subclass

lib5c.contrib.luigi.tasks.parallelize_reps_regions(task_class, reps, regions, **kwargs)[source]

Parallelizes any Task class whose constructor accepts rep and region kwargs across lists of reps and regions by creating a new WrapperTask.

Parameters
  • task_class (luigi.Task subclass) – The Task to parallelize.

  • reps (list of str) – List of reps to parallelize over.

  • regions (list of str) – List of regions to parallelize over.

  • kwargs (kwargs) – Additional kwargs to pass through to the Task class.

Returns

A WrapperTask which simply requires the original task_class to be run for every rep in reps and every region in regions.

Return type

luigi.WrapperTask subclass

lib5c.contrib.luigi.tasks.visualizable(pvalue=False, obs_over_exp=False, tetris=False)[source]

Class decorator factory for luigi Task subclasses which allows the task to automatically visualize itself after completion by

  1. adding heatmap and heatmap_outdir parameters to the Task and

  2. decorating the Task’s run() method with add_visualization_hooks()

Parameters
  • pvalue (bool) – Pass True to denote that the visualized heatmaps should be drawn using the p-value colorscale.

  • obs_over_exp (bool) – Pass True to denote that the visualized heatmaps should be drawn using the obs_over_exp colorscale.

  • tetris (bool) – Pass True to denote that the visualized heatmaps should be drawn as tetris heatmaps.

Returns

The class decorator.

Return type

function