lib5c.algorithms.pca module

lib5c.algorithms.pca.compute_pca(matrix, scaled=True, logged=False, kernel=None, kernel_kwargs=None, variant='pca', pf=1)[source]

Performs PCA on a matrix.

Parameters
  • matrix (np.ndarray) – The design matrix, whose rows are observations (replicates) and whose columns are features (interaction values at each position).

  • scaled (bool) – Pass True to scale the features to unit variance.

  • logged (bool) – Pass True to log the features before PCA.

  • kernel (Optional[str]) – Pass a kernel accepted by sklearn.decomposition.KernelPCA() to perform KPCA.

  • kernel_kwargs (Optional[Dict[str, Any]]) – Kwargs to use for the kernel.

  • variant ({'pca', 'ica', 'fa', 'mds'}) – Select which variant of PCA to use.

  • pf (int) – Specify an integer number of pure polynomial features to use in the PCA.

Returns

The first element is the matrix of PCA-projected replicates. The second element is the PVE for each component, or None if the PCA method selected doesn’t provide a PVE estimate. The third element is a matrix of the principle component vectors, or None if the PCA method selected doesn’t provide a set of principle component vectors.

Return type

Tuple[np.ndarray]

lib5c.algorithms.pca.compute_pca_from_counts_superdict(counts_superdict, rep_order=None, **kwargs)[source]

Convenience function for performing PCA on a counts superdict data structure.

Parameters
  • counts_superdict (Dict[str, Dict[str, np.ndarray]]) – The counts superdict structure to compute PCA on.

  • rep_order (Optional[List[str]]) – The order in which the replicates in counts_superdict should be considered when filling in the rows of the design matrix.

  • kwargs (Dict[str, Any]) – Additional kwargs to be passed to compute_pca().

Returns

The first element is the matrix of PCA-projected replicates. The second element is the PVE for each component, or None if the PCA method selected doesn’t provide a PVE estimate. The third element is a matrix of the principle component vectors, or None if the PCA method selected doesn’t provide a set of principle component vectors.

Return type

Tuple[np.ndarray]