Generic API

hictkpy.is_cooler(path: str | PathLike) bool

Test whether path points to a cooler file.

hictkpy.is_mcool_file(path: str | PathLike) bool

Test whether path points to a .mcool file.

hictkpy.is_scool_file(path: str | PathLike) bool

Test whether path points to a .scool file.

hictkpy.is_hic(path: str | PathLike) bool

Test whether path points to a .hic file.

class hictkpy.MultiResFile(*args, **kwargs)

Class representing a file handle to a .hic or .mcool file

__init__(self, path: str | PathLike) None

Open a multi-resolution Cooler file (.mcool) or .hic file.

__getitem__(self, arg: int, /) File

Open the Cooler or .hic file corresponding to the resolution given as input.

__enter__(self) MultiResFile
__exit__(
self,
exc_type: object | None = None,
exc_value: object | None = None,
traceback: object | None = None,
) None
attributes(self) dict

Get file attributes as a dictionary.

chromosomes(self, include_ALL: bool = False) dict[str, int]

Get the chromosome sizes as a dictionary mapping names to sizes.

close(self) None

Manually close the file handle.

is_hic(self) bool

Test whether the file is in .hic format.

is_mcool(self) bool

Test whether the file is in .mcool format.

path(self) Path

Get the file path.

resolutions(
self,
) numpy.ndarray[dtype=int64, shape=(*), order='C']

Get the list of available resolutions.

class hictkpy.File(*args, **kwargs)

Class representing a file handle to a .cool or .hic file.

__init__(
self,
path: str | PathLike,
resolution: int | None = None,
matrix_type: str = 'observed',
matrix_unit: str = 'BP',
) None

Construct a file object to a .hic, .cool or .mcool file given the file path and resolution. Resolution is ignored when opening single-resolution Cooler files.

__enter__(self) File
__exit__(
self,
exc_type: object | None = None,
exc_value: object | None = None,
traceback: object | None = None,
) None
attributes(self) dict

Get file attributes as a dictionary.

avail_normalizations(self) list[str]

Get the list of available normalizations.

bins(self) BinTable

Get table of bins.

chromosomes(self, include_ALL: bool = False) dict[str, int]

Get chromosome sizes as a dictionary mapping names to sizes.

close(self) None

Manually close the file handle.

fetch(
self,
range1: str | None = None,
range2: str | None = None,
normalization: str | None = None,
count_type: type | str = 'int32',
join: bool = False,
query_type: str = 'UCSC',
diagonal_band_width: int | None = None,
) PixelSelector

Fetch interactions overlapping a region of interest.

has_normalization(self, normalization: str) bool

Check whether a given normalization is available.

is_cooler(self) bool

Test whether file is in .cool format.

is_hic(self) bool

Test whether file is in .hic format.

nbins(self) int

Get the total number of bins.

nchroms(self, include_ALL: bool = False) int

Get the total number of chromosomes.

path(self) Path

Return the file path.

resolution(self) int

Get the bin size in bp.

uri(self) str

Return the file URI.

weights(
self,
name: str,
divisive: bool = True,
) numpy.ndarray[dtype=float64, shape=(*), order='C'] | None
weights(
self,
names: Sequence[str],
divisive: bool = True,
) DataFrame

Overloaded function.

  1. weights(self, name: str, divisive: bool = True) -> numpy.ndarray[dtype=float64, shape=(*), order='C'] | None

Fetch the balancing weights for the given normalization method.

  1. weights(self, names: collections.abc.Sequence[str], divisive: bool = True) -> pandas.DataFrame

Fetch the balancing weights for the given normalization methods.Weights are returned as a pandas.DataFrame.

class hictkpy.PixelSelector(*args, **kwargs)

Class representing pixels overlapping with the given genomic intervals.

coord1(self) tuple[str, int, int] | None

Get query coordinates for the first dimension. Returns None when query spans the entire genome.

coord2(self) tuple[str, int, int] | None

Get query coordinates for the second dimension. Returns None when query spans the entire genome.

dtype(self) type

Get the dtype for the pixel count.

to_arrow(
self,
query_span: str = 'upper_triangle',
) Table

Retrieve interactions as a pyarrow.Table.

to_coo(
self,
query_span: str = 'upper_triangle',
low_memory: bool = False,
) coo_matrix

Retrieve interactions as a SciPy COO matrix. When low_memory=True, the heuristic used to minimize the number of memory allocations is turned off, and a two-pass algorithm that allocates a matrix with the exact shape is used instead.

to_csr(
self,
query_span: str = 'upper_triangle',
low_memory: bool = False,
) csr_matrix

Retrieve interactions as a SciPy CSR matrix. When low_memory=True, the heuristic used to minimize the number of memory allocations is turned off, and a two-pass algorithm that allocates a matrix with the exact shape is used instead.

to_df(
self,
query_span: str = 'upper_triangle',
) DataFrame

Alias to to_pandas().

to_numpy(
self,
query_span: str = 'full',
) numpy.ndarray[shape=(*, *), order='C']

Retrieve interactions as a numpy 2D matrix.

to_pandas(
self,
query_span: str = 'upper_triangle',
) DataFrame

Retrieve interactions as a pandas DataFrame.

size(self, upper_triangular: bool = True) int

Get the number of pixels overlapping with the given query.

Statistics

hictkpy.PixelSelector exposes several methods to compute or estimate several statistics efficiently.

The main features of these methods are:

  • All statistics are computed by traversing the data only once and without caching interactions.

  • All methods can be tweaked to include or exclude non-finite values.

  • All functions implemented using short-circuiting to detect scenarios where the required statistics can be computed without traversing all pixels.

The following statistics are guaranteed to be exact:

  • nnz

  • sum

  • min

  • max

  • mean

The rest of the supported statistics (currently variance, skewness, and kurtosis) are estimated and are thus not guaranteed to be exact. However, in practice, the estimation is usually very accurate (relative error < 1.0e-6).

You can instruct hictkpy to compute the exact statistics by passing exact=True to hictkpy.PixelSelector.describe() and related methods. It should be noted that for large queries this will result in slower computations and higher memory usage.

describe(
self,
metrics: Sequence[str] = ['nnz', 'sum', 'min', 'max', 'mean', 'variance', 'skewness', 'kurtosis'],
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
exact: bool = False,
) dict

Compute one or more descriptive metrics in the most efficient way possible. Known metrics: nnz, sum, min, max, mean, variance, skewness, kurtosis. When a metric cannot be computed (e.g. because metrics=[“variance”], but selector overlaps with a single pixel), the value for that metric is set to None. When keep_infs or keep_nans are set to True, and keep_zeros=True, nan and/or inf values are treated as zeros. By default, metrics are estimated by doing a single pass through the data. The estimates are stable and usually very accurate. However, if you require exact values, you can specify exact=True.

kurtosis(
self,
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
exact: bool = False,
) float | None

Get the kurtosis of the number of interactions for the current pixel selection. See documentation for describe() for more details.

max(
self,
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
) int | float | None

Get the maximum number of interactions for the current pixel selection. See documentation for describe() for more details.

mean(
self,
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
) float | None

Get the average number of interactions for the current pixel selection. See documentation for describe() for more details.

min(
self,
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
) int | float | None

Get the minimum number of interactions for the current pixel selection. See documentation for describe() for more details.

nnz(self, keep_nans: bool = False, keep_infs: bool = False) int

Get the number of non-zero entries for the current pixel selection. See documentation for describe() for more details.

skewness(
self,
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
exact: bool = False,
) float | None

Get the skewness of the number of interactions for the current pixel selection. See documentation for describe() for more details.

sum(
self,
keep_nans: bool = False,
keep_infs: bool = False,
) int | float

Get the total number of interactions for the current pixel selection. See documentation for describe() for more details.

variance(
self,
keep_nans: bool = False,
keep_infs: bool = False,
keep_zeros: bool = False,
exact: bool = False,
) float | None

Get the variance of the number of interactions for the current pixel selection. See documentation for describe() for more details.

Iteration

__iter__(self) hictkpy.PixelIterator

Implement iter(self). The resulting iterator yields objects of type hictkpy.Pixel.

In [1]: import hictkpy as htk

In [2]: f = htk.File("file.cool")

In [3]: sel = f.fetch("chr2L:10,000,000-20,000,000")

In [4]: for i, pixel in enumerate(sel):
   ...:     print(pixel.bin1_id, pixel.bin2_id, pixel.count)
   ...:     if i > 10:
   ...:         break
   ...:
1000 1000 6759
1000 1001 3241
1000 1002 760
1000 1003 454
1000 1004 289
1000 1005 674
1000 1006 354
1000 1007 124
1000 1008 130
1000 1009 105
1000 1010 99
1000 1011 120

It is also possible to iterate over pixels together with their genomic coordinates by specifying join=True when calling hictkpy.File.fetch():

In [5]: sel = f.fetch("chr2L:10,000,000-20,000,000", join=True)

In [6]: for i, pixel in enumerate(sel):
   ...:     print(
   ...:         pixel.chrom1, pixel.start1, pixel.end1,
   ...:         pixel.chrom2, pixel.start2, pixel.end2,
   ...:         pixel.count
   ...:     )
   ...:     if i > 10:
   ...:         break
   ...:
chr2L 10000000 10010000 chr2L 10000000 10010000 6759
chr2L 10000000 10010000 chr2L 10010000 10020000 3241
chr2L 10000000 10010000 chr2L 10020000 10030000 760
chr2L 10000000 10010000 chr2L 10030000 10040000 454
chr2L 10000000 10010000 chr2L 10040000 10050000 289
chr2L 10000000 10010000 chr2L 10050000 10060000 674
chr2L 10000000 10010000 chr2L 10060000 10070000 354
chr2L 10000000 10010000 chr2L 10070000 10080000 124
chr2L 10000000 10010000 chr2L 10080000 10090000 130
chr2L 10000000 10010000 chr2L 10090000 10100000 105
chr2L 10000000 10010000 chr2L 10100000 10110000 99
chr2L 10000000 10010000 chr2L 10110000 10120000 120
class hictkpy.Bin

Class representing a genomic Bin (i.e., a BED interval).

property id

Get the bin ID.

property rel_id

Get the relative bin ID (i.e., the ID that uniquely identifies a bin within a chromosome).

property chrom

Get the name of the chromosome to which the Bin refers to.

property start

Get the Bin start position.

property end

Get the Bin end position.

class hictkpy.BinTable(*args, **kwargs)

Class representing a table of genomic bins.

__init__(self, chroms: dict[str, int], resolution: int) None
__init__(self, bins: DataFrame) None

Overloaded function.

  1. __init__(self, chroms: dict[str, int], resolution: int) -> None

Construct a table of bins given a dictionary mapping chromosomes to their sizes and a resolution.

  1. __init__(self, bins: pandas.DataFrame) -> None

Construct a table of bins from a pandas.DataFrame with columns [“chrom”, “start”, “end”].

chromosomes(self, include_ALL: bool = False) dict[str, int]

Get the chromosome sizes as a dictionary mapping names to sizes.

get(self, bin_id: int) Bin
get(self, bin_ids: Sequence[int]) DataFrame
get(self, chrom: str, pos: int) Bin
get(
self,
chroms: Sequence[str],
pos: Sequence[int],
) DataFrame

Overloaded function.

  1. get(self, bin_id: int) -> hictkpy.Bin

Get the genomic coordinate given a bin ID.

  1. get(self, bin_ids: collections.abc.Sequence[int]) -> pandas.DataFrame

Get the genomic coordinates given a sequence of bin IDs. Genomic coordinates are returned as a pandas.DataFrame with columns [“chrom”, “start”, “end”].

  1. get(self, chrom: str, pos: int) -> hictkpy.Bin

Get the bin overlapping the given genomic coordinate.

  1. get(self, chroms: collections.abc.Sequence[str], pos: collections.abc.Sequence[int]) -> pandas.DataFrame

Get the bins overlapping the given genomic coordinates. Bins are returned as a pandas.DataFrame with columns [“chrom”, “start”, “end”].

get_id(self, chrom: str, pos: int) int

Get the ID of the bin overlapping the given genomic coordinate.

get_ids(
self,
chroms: Sequence[str],
pos: Sequence[int],
) numpy.ndarray[dtype=int64, shape=(*)]

Get the IDs of the bins overlapping the given genomic coordinates.

merge(self, df: DataFrame) DataFrame

Merge genomic coordinates corresponding to the given bin identifiers. Bin identifiers should be provided as a pandas.DataFrame with columns “bin1_id” and “bin2_id”. Genomic coordinates are returned as a pandas.DataFrame containing the same data as the DataFrame given as input, plus columns [“chrom1”, “start1”, “end1”, “chrom2”, “start2”, “end2”].

resolution(self) int

Get the bin size for the bin table. Return 0 in case the bin table has a variable bin size.

to_arrow(
self,
range: str | None = None,
query_type: str = 'UCSC',
) Table

Return the bins in the BinTable as a pyarrow.Table. The optional “range” parameter can be used to only fetch a subset of the bins in the BinTable.

to_df(
self,
range: str | None = None,
query_type: str = 'UCSC',
) DataFrame

Alias to to_pandas().

to_pandas(
self,
range: str | None = None,
query_type: str = 'UCSC',
) DataFrame

Return the bins in the BinTable as a pandas.DataFrame. The optional “range” parameter can be used to only fetch a subset of the bins in the BinTable.

type(self) str

Get the type of table underlying the BinTable object (i.e. fixed or variable).

__iter__(self) hictkpy.BinTableIterator

Implement iter(self). The resulting iterator yields objects of type hictkpy.Bin.

class hictkpy.Pixel(*args, **kwargs)

Class modeling a Pixel in COO or BG2 format.

property bin1_id

Get the ID of bin1.

property bin2_id

Get the ID of bin2.

property count

Get the number of interactions.

The following properties are only available when pixels are in BG2 format.

property bin1

Get bin1.

property bin2

Get bin2.

property chrom1

Get the chromosome associated with bin1.

property start1

Get the start position associated with bin1.

property end1

Get the end position associated with bin1.

property chrom2

Get the chromosome associated with bin2.

property start2

Get the start position associated with bin2.

property end2

Get the end position associated with bin2.