lib5c.util.scales module

Module for determining various scales for visualization purposes.

lib5c.util.scales.compute_regional_obs_over_exp_scale(counts_superdict, region)[source]

Computes a typical scale for visualizing observed over expected counts for a specified region.

Parameters
  • counts_superdict (dict of dicts of 2d numpy arrays) – The keys of the outer dict are replicate names. The values are dicts corresponding to counts dicts (see lib5c.parsers.counts.load_counts()), whose keys are region names and whose values are the arrays of counts values for that region. These arrays are square and symmetric.

  • region (str) – The region for which the scale should be computed.

Returns

The first element of this list is the minimum value of the computed scale. The second element of this list is the maximum value of the computed scale.

Return type

list of float

Notes

The returned scale is computed as the mean of the observed over expected counts for the selected region across all replicates, plus and minus two and a half times the mean of the standard deviations for the selected region across all replicates for the maximum and the minimum, respectively.

lib5c.util.scales.compute_regional_obs_scale(counts_superdict, region, top_percentile=98)[source]

Computes a typical scale for visualizing observed for a specified region.

Parameters
  • counts_superdict (dict of dicts of 2d numpy arrays) – The keys of the outer dict are replicate names. The values are dicts corresponding to counts dicts (see lib5c.parsers.counts.load_counts()), whose keys are region names and whose values are the arrays of counts values for that region. These arrays are square and symmetric.

  • region (str) – The region for which the scale should be computed.

  • top_percentile (int) – The upper percentile to use when determinig the max of the scale.

Returns

The first element of this list is the minimum value of the computed scale. The second element of this list is the maximum value of the computed scale.

Return type

list of float

Notes

The returned scale is computed as ranging from 0 to the average of the 98th percentiles across the replicates.

lib5c.util.scales.compute_track_scales(tracks, pixelmap, conditions=(), filename_generator=<function <lambda>>)[source]

Computes regional zero-to-max scales for visualizing ChIP-seq tracks.

Parameters
  • tracks (list of str) – List of string identifiers for the tracks to compute scales for. The identifiers will be used to find the appropriate BED files on the disk according to filename_generator.

  • pixelmap (pixelmap) – The pixelmap to use when identifying the region names and boundaries. See lib5c.parsers.bed.get_pixelmap().

  • conditions (list of str) – List of string identifiers for the conditions. If this kwarg is passed, the tracks will be grouped by condition, and tracks that differ only in the condition identifier will be assigned the same scale. If this list is empty (as it is by default), no such grouping of tracks is performed.

  • filename_generator (function str -> str) – When passed any string from tracks, this function should return a string reference to the BED file on the disk containing the BED features for that track.

Returns

The keys of the outer dict are the elements of tracks. The values are dicts whose keys are region names and whose values are the maximum values within that region for that track.

Return type

dict of dict of numeric

Notes

Since this function computes a zero-to-max scale, the minimum of the scale is always implicitly taken to be zero.

When conditions is not empty, the tracks are grouped by removing all instances of all condition identifiers from the track identifiers and then comparing them for equality. When the sequence of characters defined by a condition identifier appears elsewhere in the track name in a location not intended to identify the condition of the track, this may lead to failure to correctly group tracks by condition.

lib5c.util.scales.main()[source]