lib5c.parsers.hic module

Module for parsing Hi-C data from the Rao et al. 2014 paper.

lib5c.parsers.hic.load_range_from_contact_matrix(contact_matrix_file, grange, region_name='', norm_file=None)[source]

Parses a chunk of contact information out of a Hi-C contact matrix file.

The Hi-C contact matrix file format parsed by this function is the format used in the contact matrices uploaded to GEO for the Rao et al. 2014 paper. It is also the same format used by the Juicer tools dump command.

Parameters
  • contact_matrix_file (str) – String reference to the Hi-C contact matrix file to parse.

  • grange (Dict[str, Any]) –

    The genomic range to extract contact information for. This should be specified as a dict with the following structure:

    {
        'chrom': str,
        'start': int,
        'end': int
    }
    

  • region_name (Optional[str]) – Name for this genomic region. If passed, it will be used to name the bins in the returned pixelmap.

  • norm_file (Optional[str]) – String reference to a file containing a Hi-C bias vector corresponding to the contact_matrix_file. If passed, the data will be normalized using this vector.

Returns

The first element of the tuple is the extract counts matrix for the requested genomic range. The second element of the tuple is a pixelmap generated for this region describing the specific bins that were extracted.

Return type

Tuple[np.ndarray, List[Dict[str, Any]]]