lib5c.algorithms.determine_bins module

Module for computing sets of evenly-spaced bins for tiling 5C regions.

lib5c.algorithms.determine_bins.default_bin_namer(bin_index, region_name=None)[source]

Names a bin given its index and, optionally, the name of the region.

  • bin_index (int) – The index of this bin, within the region if appropriate.
  • region_name (Optional[str]) – The name of the region this bin is in.

The name for this bin.

Return type:



>>> default_bin_namer(3)
>>> default_bin_namer(123, region_name='Sox2')
lib5c.algorithms.determine_bins.determine_regional_bins(regional_primermap, bin_width, region_name=None, bin_namer=<function default_bin_namer>, bin_namer_kwargs=None, region_span='mid-to-mid', bin_number='n')[source]

Determines a set of bins of a specified width that will tile a set of primers within a region.

  • regional_primermap (List[Dict[str, Any]]) –

    An ordered list of fragments in this region. The elements of the list are dicts (representing fragments) with at least the following structure:

        'chrom': str
        'start': int,
        'end': int

    See lib5c.parsers.primers.get_primermap().

  • bin_width (int) – The width of the bins, in bp.
  • region_name (Optional[str]) – The name of the region as a string. If this value is provided, it will also be passed on to the bin_namer as a kwarg.
  • bin_namer (Callable[[int, ..], str]) – A function mapping bin indices to bin names. This function will be used to name the resulting bins. If region_name is passed, it will be passed on to this function as a kwarg.
  • bin_namer_kwargs (Optional[Dict[Any, Any]]) – Additional kwargs to be passed to the bin_namer.
  • region_span (Optional[str]) – Describes whether the span of the region is considered to be stretching from the midpoint of the first fragment to the midpoint of the last fragment (‘mid-to-mid’) or from the beginning of the first fragment to the end of the last fragment (‘start-to-end’).
  • bin_number (Optional[str]) – Describes how many bins to fit in the region, given that ‘n’ is the largest number of full bins that will fit in the region. Use ‘n’ to reproduce traditional pipeline output, at the risk of leaving some fragment midpoints outside of the range of the bins. Use ‘n+1’ for a more conservative binning strategy that is guaranteed to not leave any fragment midpoints outside of the region if region_span is ‘mid-to-mid’.

An ordered list of bins tiling the region. The elements of the list are dicts (representing bins) with the following structure:

    'name': str,
    'chrom': str,
    'start': int,
    'end': int,
    'index': int,
    'region': str (present only if region_name was passed)

Return type:

List[Dict[str, Any]]


>>> # single fragment results in single bin centered on the fragment
>>> regional_primermap = [{'chrom': 'chr1', 'start': 2000, 'end': 4000}]
>>> (determine_regional_bins(regional_primermap, 4000,
...                          region_name='Sox2') ==
... [{'name': 'Sox2_BIN_000', 'chrom': 'chr1', 'start': 1000, 'end': 5000,
...   'index': 0, 'region': 'Sox2'}])
>>> # examples for region_span='mid-to-mid'
>>> regional_primermap = [{'chrom': 'chr1', 'start': 2000, 'end': 4000},
...                       {'chrom': 'chr1', 'start': 9500, 'end': 10500}]
>>> (determine_regional_bins(regional_primermap, 5000) ==
... [{'name': 'BIN_000', 'chrom': 'chr1', 'start': 4000, 'end': 9000,
...   'index': 0}])
>>> (determine_regional_bins(regional_primermap, 3000) ==
... [{'name': 'BIN_000', 'chrom': 'chr1', 'start': 3500, 'end': 6500,
...   'index': 0},
...  {'name': 'BIN_001', 'chrom': 'chr1', 'start': 6500, 'end': 9500,
...   'index': 1}])
>>> (determine_regional_bins(regional_primermap, 3000,
...                          bin_number='n+1') ==
... [{'name': 'BIN_000', 'chrom': 'chr1', 'start': 2000, 'end': 5000,
...   'index': 0},
...  {'name': 'BIN_001', 'chrom': 'chr1', 'start': 5000, 'end': 8000,
...   'index': 1},
...  {'name': 'BIN_002', 'chrom': 'chr1', 'start': 8000, 'end': 11000,
...   'index': 2}])
>>> # examples for region_span='start-to-end'
>>> regional_primermap = [{'chrom': 'chr1', 'start': 2000, 'end': 4000},
...                       {'chrom': 'chr1', 'start': 9000, 'end': 10000}]
>>> (determine_regional_bins(regional_primermap, 5000,
...                          region_span='start-to-end') ==
... [{'name': 'BIN_000', 'chrom': 'chr1', 'start': 3500, 'end': 8500,
...   'index': 0}])
>>> (determine_regional_bins(regional_primermap, 3000,
...                          region_span='start-to-end') ==
... [{'name': 'BIN_000', 'chrom': 'chr1', 'start': 3000, 'end': 6000,
...   'index': 0},
...  {'name': 'BIN_001', 'chrom': 'chr1', 'start': 6000, 'end': 9000,
...   'index': 1}])
>>> (determine_regional_bins(regional_primermap, 3000,
...                          region_span='start-to-end',
...                          bin_number='n+1') ==
... [{'name': 'BIN_000', 'chrom': 'chr1', 'start': 1500, 'end': 4500,
...   'index': 0},
...  {'name': 'BIN_001', 'chrom': 'chr1', 'start': 4500, 'end': 7500,
...   'index': 1},
...  {'name': 'BIN_002', 'chrom': 'chr1', 'start': 7500, 'end': 10500,
...   'index': 2}])