Parallelization across regions¶
Given that the most common data structure is a counts dict (whose keys are the region names in our dataset), we often want to call a function for each region in this dictionary:
>>> result = {region: fn(counts[region]) for region in counts}
This pattern may become even more complicated if fn()
returns a tuple, for
example. Furthermore, it is clear that overall operation is “embarrassingly
parallel” with respect to the regions being processed. In order to simplify our
code, reduce redundancy, and gain the benefits of parallel execution, we
introduce a new decorator: @parallelize_regions
, which can be found in the
subpackage lib5c.util.parallelization
. This decorator allows you to write
fn()
just once, writing it as if it processes only one matrix, but then call
it with one matrix or an entire counts dict as is convenient. For example, we
can write
from lib5c.util.parallelization import parallelize_regions
@parallelize_regions
def fn(matrix):
return matrix + 1
and then call this function via
result_counts = fn(counts)
or alternatively,
result_matrix = fn(counts['Sox2'])
as is convenient for us.
Mechanism and caveats¶
The following sections dig into the mechanics behind the
@parallelize_regions
decorator and highlight some important features and
caveats.
First positional argument dependence¶
The @parallelize_regions
decorator works by first checking to see if the
first argument passed to the decorated function is a dict. If it is not, the
decorator does nothing, and the function is executed as normal. If it is a dict,
the execution of the function is parallelized across the keys of that dict. This
means that if the non-parallelized version of fn()
expects a dict as its
first positional argument, you will not be able to use the same name for both
the parallel and non-parallel versions of the function. To work around this, you
can define
from lib5c.util.parallelization import parallelize_regions
def fn(somedict):
return somedict
fn_parallel = parallelize_regions(fn)
and then you can call fn(somedict)
when you want the non-parallelized
version and fn_parallel(doubledict)
when you want the parallelization.
Per-region args and kwargs¶
By default, @parallelize_regions
will simply copy all the other args and
kwargs to each region’s invocation of fn()
. In other words, when you call
fn(counts, arg_1, arg_2)
, the following will be executed:
fn(counts['region_1'], arg_1, arg_2)
fn(counts['region_2'], arg_1, arg_2)
...
However, if any arg or kwarg is a dict which has the same keys as the first
positional argument (or, if the arg is a nested dict, if its second level has
these same keys), the arg will be replaced with each region’s entry in that
dict. In other words, if we call fn(counts, primermap)
, where primermap
is a dict whose keys match counts
, the following will be executed:
fn(counts['region_1'], primermap['region_1'])
fn(counts['region_2'], primermap['region_2'])
...
This substitution is performed on an arg-by-arg basis, so you can use any mixture of normal and “regional dictionary” arguments when calling the fucnction.
Automatic result unpacking¶
Let’s say fn()
returns a tuple, for example:
from lib5c.util.parallelization import parallelize_regions
@parallelize_regions
def fn(matrix):
return matrix + 1, matrix - 1
When we call fn()
on a single matrix, we expect to see
bigger_matrix, smaller_matrix = fn(matrix)
The same thing will work when calling fn()
on a counts dict:
bigger_counts_dict, smaller_counts_dict = fn(counts)
In this case bigger_counts_dict
and smaller_counts_dict
will each be
dicts whose keys match the keys of counts
.
Fallback to series execution¶
If an error is encountered during the parallel processing, the decorator will attempt to re-run the same job in series, in hopes that this will result in a more readable stack trace.
Signature preservation¶
@parallelize_regions
is itself decorated by the @pretty_decorator
meta-decorator, which can be found in lib5c.util.pretty_decorator
. This
allows the signature of the decorated function to be preserved through the
decoration process.