lib5c.algorithms.pca module

lib5c.algorithms.pca.compute_pca(matrix, scaled=True, logged=False, kernel=None, kernel_kwargs=None, variant='pca', pf=1)[source]

Performs PCA on a matrix.

Parameters:
  • matrix (np.ndarray) – The design matrix, whose rows are observations (replicates) and whose columns are features (interaction values at each position).
  • scaled (bool) – Pass True to scale the features to unit variance.
  • logged (bool) – Pass True to log the features before PCA.
  • kernel (Optional[str]) – Pass a kernel accepted by sklearn.decomposition.KernelPCA() to perform KPCA.
  • kernel_kwargs (Optional[Dict[str, Any]]) – Kwargs to use for the kernel.
  • variant ({'pca', 'ica', 'fa', 'mds'}) – Select which variant of PCA to use.
  • pf (int) – Specify an integer number of pure polynomial features to use in the PCA.
Returns:

The first element is the matrix of PCA-projected replicates. The second element is the PVE for each component, or None if the PCA method selected doesn’t provide a PVE estimate. The third element is a matrix of the principle component vectors, or None if the PCA method selected doesn’t provide a set of principle component vectors.

Return type:

Tuple[np.ndarray]

lib5c.algorithms.pca.compute_pca_from_counts_superdict(counts_superdict, rep_order=None, **kwargs)[source]

Convenience function for performing PCA on a counts superdict data structure.

Parameters:
  • counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict structure to compute PCA on.
  • rep_order (Optional[List[str]]) – The order in which the replicates in counts_superdict should be considered when filling in the rows of the design matrix.
  • kwargs (Dict[str, Any]) – Additional kwargs to be passed to compute_pca().
Returns:

The first element is the matrix of PCA-projected replicates. The second element is the PVE for each component, or None if the PCA method selected doesn’t provide a PVE estimate. The third element is a matrix of the principle component vectors, or None if the PCA method selected doesn’t provide a set of principle component vectors.

Return type:

Tuple[np.ndarray]