In [1]:
import SCOTCH
import pandas as pd 
import anndata as ad
import scanpy as sc
import torch 

import matplotlib.pyplot as plt

Running SCOTCH on Simulated Data¶

This notebook is an example of utilizing SCOTCH to factor an example matrix, with a generated block structure. The matrix is found in the text "test/A.txt". The matrix contains 500 rows and 1000 columns. The matrix naturally can be factorized into 3 row clusters and 9 column clusters. In the code block below, we load the A matrix into a pandas dataframe and visualize the heatmap.

In [2]:
A = pd.read_csv("test/A.txt", sep = '\t', header=None)
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot()
ax.imshow(A, cmap='viridis', interpolation='nearest')
ax.set_axis_off()
No description has been provided for this image

SCOTCH interfaces with anndata structure to perform its . In the next code block we convert A into an anndata object and initialize a scotch object. For the first factor example, we are using $k1=3$, and $k2 = 9$. These are the expected number of row and column clusters respectively. $max\_l\_u$ and $max\_l\_v$ corresponds to the orthogonal regularization parameter. In general, it is recommended to use regularization with scotch. We are running it without regularization in the context to demonstrate the poor clustering performance and to estimate the error associated NMTF without regularization.

An explanation of the remaining parameters is below:

  • var_lambda: SCOTCH allows for a ramping of the regularization parameters using a sigmoid scheduler. In general this is not need to determine good clusters and embeddings. By setting this to False, the max_l_u and max_l_v values are used for all interactions of the update.

  • device: SCOTCH updates are GPU enabled for reasonably sized feature matrices. If using a GPU this should point to the name of the GPU device.

  • init_style: SCOTCH supports multiple different NM(T)F initializations. Here we are initializing all elements of $U$, $V$, and $S$. Other supported values of this parameter are "nnsvd" for non-negative SVD initialization.

  • draw_intermediate_graph: This parameter allows for visualization of the factors and estimates during the SCOTCH update. Each full update of $U$, $S$, and $V$ are captured into a frame and stitched together and can be stitched into a GIF. Including this parameter does slow down the update especially for large expression matrices.

After initializing SCOTCH, we add the data from adata object to scotch for factorization.

In [3]:
adata = ad.AnnData(A)
scotch = SCOTCH.SCOTCH(k1 = 3, k2 = 9, max_l_u = 0.0, max_l_v = 0.0, term_tol = 1e-10, var_lambda= False, device = "cpu", init_style="random", draw_intermediate_graph=True)
scotch.add_data_from_adata(adata)
/opt/anaconda3/envs/Pytorch/lib/python3.9/site-packages/anndata/_core/aligned_df.py:67: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
/opt/anaconda3/envs/Pytorch/lib/python3.9/site-packages/anndata/_core/aligned_df.py:67: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)

This block runs SCOTCH. If the verbose SCOTCH input parameter The following information is printed for each iteration:

  • Iter - the current interation that is being run.
  • Iter time - The time that it took to complete the iteration. Iteration 1 generally takes longer than all other iters due to some initialization at the beginning of SCOTCH.
  • Total time - the current run time.
  • Objective - An estimate of the objective.
  • Relative Delta Objective - The change relative in the objective from the previous iteration. When this value is less that term_tol, and greater than zero the algorithm with stop.
  • Reconstruction error - This is the component of the objective related to $|| X - USV^T ||_F^2$. This value is comparable across different random initialization or with different regularization parameters.
In [4]:
scotch.fit()
Initializing NMTF factors
Beginning NMTF
Iter: 1	Iter Time: 0.241	Total Time: 0.241	Objective: 8.195e+06	Relative Delta Objective: 2.086e-01	Reconstruction Error: 8.195e+06
Iter: 2	Iter Time: 0.061	Total Time: 0.302	Objective: 8.021e+06	Relative Delta Objective: 2.124e-02	Reconstruction Error: 8.021e+06
Iter: 3	Iter Time: 0.058	Total Time: 0.360	Objective: 7.722e+06	Relative Delta Objective: 3.729e-02	Reconstruction Error: 7.722e+06
Iter: 4	Iter Time: 0.051	Total Time: 0.411	Objective: 7.339e+06	Relative Delta Objective: 4.962e-02	Reconstruction Error: 7.339e+06
Iter: 5	Iter Time: 0.056	Total Time: 0.467	Objective: 7.143e+06	Relative Delta Objective: 2.665e-02	Reconstruction Error: 7.143e+06
Iter: 6	Iter Time: 0.055	Total Time: 0.522	Objective: 6.968e+06	Relative Delta Objective: 2.456e-02	Reconstruction Error: 6.968e+06
Iter: 7	Iter Time: 0.057	Total Time: 0.579	Objective: 6.861e+06	Relative Delta Objective: 1.528e-02	Reconstruction Error: 6.861e+06
Iter: 8	Iter Time: 0.136	Total Time: 0.715	Objective: 6.824e+06	Relative Delta Objective: 5.451e-03	Reconstruction Error: 6.824e+06
Iter: 9	Iter Time: 0.054	Total Time: 0.768	Objective: 6.812e+06	Relative Delta Objective: 1.791e-03	Reconstruction Error: 6.812e+06
Iter: 10	Iter Time: 0.048	Total Time: 0.817	Objective: 6.807e+06	Relative Delta Objective: 6.477e-04	Reconstruction Error: 6.807e+06
Iter: 11	Iter Time: 0.046	Total Time: 0.863	Objective: 6.806e+06	Relative Delta Objective: 2.322e-04	Reconstruction Error: 6.806e+06
Iter: 12	Iter Time: 0.044	Total Time: 0.907	Objective: 6.805e+06	Relative Delta Objective: 9.426e-05	Reconstruction Error: 6.805e+06
Iter: 13	Iter Time: 0.054	Total Time: 0.961	Objective: 6.805e+06	Relative Delta Objective: 4.372e-05	Reconstruction Error: 6.805e+06
Iter: 14	Iter Time: 0.047	Total Time: 1.008	Objective: 6.804e+06	Relative Delta Objective: 2.271e-05	Reconstruction Error: 6.804e+06
Iter: 15	Iter Time: 0.041	Total Time: 1.049	Objective: 6.804e+06	Relative Delta Objective: 1.279e-05	Reconstruction Error: 6.804e+06
Iter: 16	Iter Time: 0.044	Total Time: 1.092	Objective: 6.804e+06	Relative Delta Objective: 7.495e-06	Reconstruction Error: 6.804e+06
Iter: 17	Iter Time: 0.047	Total Time: 1.140	Objective: 6.804e+06	Relative Delta Objective: 4.556e-06	Reconstruction Error: 6.804e+06
Iter: 18	Iter Time: 0.043	Total Time: 1.183	Objective: 6.804e+06	Relative Delta Objective: 2.719e-06	Reconstruction Error: 6.804e+06
Iter: 19	Iter Time: 0.042	Total Time: 1.224	Objective: 6.804e+06	Relative Delta Objective: 1.690e-06	Reconstruction Error: 6.804e+06
Iter: 20	Iter Time: 0.046	Total Time: 1.271	Objective: 6.804e+06	Relative Delta Objective: 1.029e-06	Reconstruction Error: 6.804e+06
Iter: 21	Iter Time: 0.044	Total Time: 1.314	Objective: 6.804e+06	Relative Delta Objective: 6.614e-07	Reconstruction Error: 6.804e+06
Iter: 22	Iter Time: 0.133	Total Time: 1.447	Objective: 6.804e+06	Relative Delta Objective: 3.674e-07	Reconstruction Error: 6.804e+06
Iter: 23	Iter Time: 0.043	Total Time: 1.491	Objective: 6.804e+06	Relative Delta Objective: 2.205e-07	Reconstruction Error: 6.804e+06
Iter: 24	Iter Time: 0.043	Total Time: 1.534	Objective: 6.804e+06	Relative Delta Objective: 1.470e-07	Reconstruction Error: 6.804e+06
Iter: 25	Iter Time: 0.044	Total Time: 1.578	Objective: 6.804e+06	Relative Delta Objective: 1.470e-07	Reconstruction Error: 6.804e+06
Iter: 26	Iter Time: 0.042	Total Time: 1.620	Objective: 6.804e+06	Relative Delta Objective: 0.000e+00	Reconstruction Error: 6.804e+06

If the draw_intermediate_graph is set to true, you can write a gif using the following function after initialization. It is saved to the SCOTCH output directory, under the following file name.

In [6]:
scotch.write_gif('NMTF_no_reg.gif')
writing gif to ./NMTF_no_reg.gif

NMTF without ortho-regularization

In this context, without regularization, the SCOTCH fit is roughly equivalent to the NMF fit. First the three U factors, are related to a single V factor (this is indicated by the three bright values of the S matrix). Each of the three main factors of V represent the mean profile of each of the correspond cell blocks. The remaining factors of V (3-8) are contributing little to the estimation scheme). Below we repeat the SCOTCH runs with regularization. The value of lambda can range from $[0, 1]$ but in practice we find that the values between $0.05$ and $0.20$ produce orthogonal vector representations of U and V without drastically increasing the reconstructions error.

In [8]:
scotch = SCOTCH.SCOTCH(k1 = 3, k2 = 9, max_l_u = 0.15, max_l_v = 0.15, term_tol = 1e-10, var_lambda= False, device = "cpu", init_style="random", max_iter=200)
scotch.draw_intermediate_graph = True
scotch.add_data_from_adata(adata)
scotch.fit()
Initializing NMTF factors
Beginning NMTF
Iter: 1	Iter Time: 0.108	Total Time: 0.108	Objective: 8.179e+06	Relative Delta Objective: 2.102e-01	Reconstruction Error: 8.179e+06
Iter: 2	Iter Time: 0.074	Total Time: 0.182	Objective: 7.971e+06	Relative Delta Objective: 2.539e-02	Reconstruction Error: 7.971e+06
Iter: 3	Iter Time: 0.046	Total Time: 0.228	Objective: 7.528e+06	Relative Delta Objective: 5.562e-02	Reconstruction Error: 7.528e+06
Iter: 4	Iter Time: 0.044	Total Time: 0.272	Objective: 7.331e+06	Relative Delta Objective: 2.615e-02	Reconstruction Error: 7.331e+06
Iter: 5	Iter Time: 0.045	Total Time: 0.316	Objective: 7.228e+06	Relative Delta Objective: 1.402e-02	Reconstruction Error: 7.228e+06
Iter: 6	Iter Time: 0.210	Total Time: 0.527	Objective: 7.122e+06	Relative Delta Objective: 1.468e-02	Reconstruction Error: 7.122e+06
Iter: 7	Iter Time: 0.044	Total Time: 0.571	Objective: 6.987e+06	Relative Delta Objective: 1.901e-02	Reconstruction Error: 6.987e+06
Iter: 8	Iter Time: 0.050	Total Time: 0.621	Objective: 6.889e+06	Relative Delta Objective: 1.402e-02	Reconstruction Error: 6.889e+06
Iter: 9	Iter Time: 0.053	Total Time: 0.674	Objective: 6.861e+06	Relative Delta Objective: 4.105e-03	Reconstruction Error: 6.861e+06
Iter: 10	Iter Time: 0.054	Total Time: 0.728	Objective: 6.855e+06	Relative Delta Objective: 7.902e-04	Reconstruction Error: 6.855e+06
Iter: 11	Iter Time: 0.062	Total Time: 0.790	Objective: 6.856e+06	Relative Delta Objective: -1.282e-04	Reconstruction Error: 6.856e+06
Iter: 12	Iter Time: 0.051	Total Time: 0.841	Objective: 6.858e+06	Relative Delta Objective: -2.930e-04	Reconstruction Error: 6.858e+06
Iter: 13	Iter Time: 0.050	Total Time: 0.891	Objective: 6.859e+06	Relative Delta Objective: -1.691e-04	Reconstruction Error: 6.859e+06
Iter: 14	Iter Time: 0.049	Total Time: 0.940	Objective: 6.860e+06	Relative Delta Objective: -6.014e-05	Reconstruction Error: 6.860e+06
Iter: 15	Iter Time: 0.048	Total Time: 0.988	Objective: 6.860e+06	Relative Delta Objective: -6.232e-05	Reconstruction Error: 6.860e+06
Iter: 16	Iter Time: 0.052	Total Time: 1.040	Objective: 6.861e+06	Relative Delta Objective: -1.389e-04	Reconstruction Error: 6.861e+06
Iter: 17	Iter Time: 0.042	Total Time: 1.082	Objective: 6.861e+06	Relative Delta Objective: -1.100e-05	Reconstruction Error: 6.861e+06
Iter: 18	Iter Time: 0.043	Total Time: 1.125	Objective: 6.861e+06	Relative Delta Objective: -1.377e-05	Reconstruction Error: 6.861e+06
Iter: 19	Iter Time: 0.051	Total Time: 1.176	Objective: 6.861e+06	Relative Delta Objective: 9.036e-06	Reconstruction Error: 6.861e+06
Iter: 20	Iter Time: 0.135	Total Time: 1.311	Objective: 6.861e+06	Relative Delta Objective: 5.750e-05	Reconstruction Error: 6.861e+06
Iter: 21	Iter Time: 0.040	Total Time: 1.351	Objective: 6.860e+06	Relative Delta Objective: 8.345e-05	Reconstruction Error: 6.860e+06
Iter: 22	Iter Time: 0.058	Total Time: 1.409	Objective: 6.859e+06	Relative Delta Objective: 1.003e-04	Reconstruction Error: 6.859e+06
Iter: 23	Iter Time: 0.056	Total Time: 1.465	Objective: 6.859e+06	Relative Delta Objective: 2.872e-05	Reconstruction Error: 6.859e+06
Iter: 24	Iter Time: 0.065	Total Time: 1.529	Objective: 6.859e+06	Relative Delta Objective: -3.572e-06	Reconstruction Error: 6.859e+06
Iter: 25	Iter Time: 0.059	Total Time: 1.589	Objective: 6.859e+06	Relative Delta Objective: -1.174e-05	Reconstruction Error: 6.859e+06
Iter: 26	Iter Time: 0.041	Total Time: 1.630	Objective: 6.859e+06	Relative Delta Objective: -2.435e-05	Reconstruction Error: 6.859e+06
Iter: 27	Iter Time: 0.043	Total Time: 1.673	Objective: 6.860e+06	Relative Delta Objective: -6.896e-05	Reconstruction Error: 6.860e+06
Iter: 28	Iter Time: 0.043	Total Time: 1.716	Objective: 6.860e+06	Relative Delta Objective: -7.471e-05	Reconstruction Error: 6.860e+06
Iter: 29	Iter Time: 0.053	Total Time: 1.769	Objective: 6.861e+06	Relative Delta Objective: -9.132e-05	Reconstruction Error: 6.861e+06
Iter: 30	Iter Time: 0.042	Total Time: 1.812	Objective: 6.862e+06	Relative Delta Objective: -7.572e-05	Reconstruction Error: 6.862e+06
Iter: 31	Iter Time: 0.039	Total Time: 1.851	Objective: 6.862e+06	Relative Delta Objective: -6.281e-05	Reconstruction Error: 6.862e+06
Iter: 32	Iter Time: 0.042	Total Time: 1.893	Objective: 6.862e+06	Relative Delta Objective: -5.326e-05	Reconstruction Error: 6.862e+06
Iter: 33	Iter Time: 0.043	Total Time: 1.936	Objective: 6.863e+06	Relative Delta Objective: -5.727e-05	Reconstruction Error: 6.863e+06
Iter: 34	Iter Time: 0.045	Total Time: 1.981	Objective: 6.863e+06	Relative Delta Objective: -3.096e-05	Reconstruction Error: 6.863e+06
Iter: 35	Iter Time: 0.045	Total Time: 2.026	Objective: 6.863e+06	Relative Delta Objective: -2.477e-06	Reconstruction Error: 6.863e+06
Iter: 36	Iter Time: 0.041	Total Time: 2.068	Objective: 6.863e+06	Relative Delta Objective: -3.278e-06	Reconstruction Error: 6.863e+06
Iter: 37	Iter Time: 0.143	Total Time: 2.211	Objective: 6.863e+06	Relative Delta Objective: 1.683e-05	Reconstruction Error: 6.863e+06
Iter: 38	Iter Time: 0.043	Total Time: 2.253	Objective: 6.863e+06	Relative Delta Objective: 3.825e-05	Reconstruction Error: 6.863e+06
Iter: 39	Iter Time: 0.043	Total Time: 2.296	Objective: 6.862e+06	Relative Delta Objective: 5.923e-05	Reconstruction Error: 6.862e+06
Iter: 40	Iter Time: 0.044	Total Time: 2.340	Objective: 6.862e+06	Relative Delta Objective: 4.794e-05	Reconstruction Error: 6.862e+06
Iter: 41	Iter Time: 0.043	Total Time: 2.383	Objective: 6.862e+06	Relative Delta Objective: 4.241e-05	Reconstruction Error: 6.862e+06
Iter: 42	Iter Time: 0.043	Total Time: 2.426	Objective: 6.862e+06	Relative Delta Objective: -3.133e-06	Reconstruction Error: 6.862e+06
Iter: 43	Iter Time: 0.061	Total Time: 2.486	Objective: 6.862e+06	Relative Delta Objective: -4.154e-05	Reconstruction Error: 6.862e+06
Iter: 44	Iter Time: 0.056	Total Time: 2.543	Objective: 6.862e+06	Relative Delta Objective: -6.062e-05	Reconstruction Error: 6.862e+06
Iter: 45	Iter Time: 0.056	Total Time: 2.599	Objective: 6.863e+06	Relative Delta Objective: -7.935e-05	Reconstruction Error: 6.863e+06
Iter: 46	Iter Time: 0.068	Total Time: 2.667	Objective: 6.863e+06	Relative Delta Objective: -7.045e-05	Reconstruction Error: 6.863e+06
Iter: 47	Iter Time: 0.056	Total Time: 2.723	Objective: 6.864e+06	Relative Delta Objective: -7.059e-05	Reconstruction Error: 6.864e+06
Iter: 48	Iter Time: 0.053	Total Time: 2.775	Objective: 6.864e+06	Relative Delta Objective: -4.910e-05	Reconstruction Error: 6.864e+06
Iter: 49	Iter Time: 0.061	Total Time: 2.837	Objective: 6.864e+06	Relative Delta Objective: -4.167e-05	Reconstruction Error: 6.864e+06
Iter: 50	Iter Time: 0.055	Total Time: 2.891	Objective: 6.865e+06	Relative Delta Objective: -2.739e-05	Reconstruction Error: 6.865e+06
Iter: 51	Iter Time: 0.056	Total Time: 2.948	Objective: 6.865e+06	Relative Delta Objective: -4.326e-05	Reconstruction Error: 6.865e+06
Iter: 52	Iter Time: 0.041	Total Time: 2.989	Objective: 6.865e+06	Relative Delta Objective: -3.853e-05	Reconstruction Error: 6.865e+06
Iter: 53	Iter Time: 0.047	Total Time: 3.035	Objective: 6.866e+06	Relative Delta Objective: -4.049e-05	Reconstruction Error: 6.866e+06
Iter: 54	Iter Time: 0.040	Total Time: 3.076	Objective: 6.866e+06	Relative Delta Objective: -4.311e-05	Reconstruction Error: 6.866e+06
Iter: 55	Iter Time: 0.050	Total Time: 3.126	Objective: 6.866e+06	Relative Delta Objective: -5.302e-05	Reconstruction Error: 6.866e+06
Iter: 56	Iter Time: 0.045	Total Time: 3.170	Objective: 6.867e+06	Relative Delta Objective: -5.229e-05	Reconstruction Error: 6.867e+06
Iter: 57	Iter Time: 0.042	Total Time: 3.212	Objective: 6.867e+06	Relative Delta Objective: -5.330e-05	Reconstruction Error: 6.867e+06
Iter: 58	Iter Time: 0.043	Total Time: 3.255	Objective: 6.867e+06	Relative Delta Objective: -5.410e-05	Reconstruction Error: 6.867e+06
Iter: 59	Iter Time: 0.167	Total Time: 3.422	Objective: 6.868e+06	Relative Delta Objective: -5.359e-05	Reconstruction Error: 6.868e+06
Iter: 60	Iter Time: 0.043	Total Time: 3.465	Objective: 6.868e+06	Relative Delta Objective: -4.740e-05	Reconstruction Error: 6.868e+06
Iter: 61	Iter Time: 0.041	Total Time: 3.506	Objective: 6.868e+06	Relative Delta Objective: -4.186e-05	Reconstruction Error: 6.868e+06
Iter: 62	Iter Time: 0.040	Total Time: 3.546	Objective: 6.869e+06	Relative Delta Objective: -3.756e-05	Reconstruction Error: 6.869e+06
Iter: 63	Iter Time: 0.042	Total Time: 3.588	Objective: 6.869e+06	Relative Delta Objective: -3.290e-05	Reconstruction Error: 6.869e+06
Iter: 64	Iter Time: 0.048	Total Time: 3.636	Objective: 6.869e+06	Relative Delta Objective: -2.977e-05	Reconstruction Error: 6.869e+06
Iter: 65	Iter Time: 0.042	Total Time: 3.678	Objective: 6.869e+06	Relative Delta Objective: -2.875e-05	Reconstruction Error: 6.869e+06
Iter: 66	Iter Time: 0.041	Total Time: 3.719	Objective: 6.869e+06	Relative Delta Objective: -2.744e-05	Reconstruction Error: 6.869e+06
Iter: 67	Iter Time: 0.041	Total Time: 3.760	Objective: 6.869e+06	Relative Delta Objective: -2.213e-05	Reconstruction Error: 6.869e+06
Iter: 68	Iter Time: 0.040	Total Time: 3.800	Objective: 6.870e+06	Relative Delta Objective: -1.885e-05	Reconstruction Error: 6.870e+06
Iter: 69	Iter Time: 0.042	Total Time: 3.842	Objective: 6.870e+06	Relative Delta Objective: -1.892e-05	Reconstruction Error: 6.870e+06
Iter: 70	Iter Time: 0.051	Total Time: 3.893	Objective: 6.870e+06	Relative Delta Objective: -1.885e-05	Reconstruction Error: 6.870e+06
Iter: 71	Iter Time: 0.042	Total Time: 3.935	Objective: 6.870e+06	Relative Delta Objective: -1.958e-05	Reconstruction Error: 6.870e+06
Iter: 72	Iter Time: 0.041	Total Time: 3.976	Objective: 6.870e+06	Relative Delta Objective: -1.870e-05	Reconstruction Error: 6.870e+06
Iter: 73	Iter Time: 0.042	Total Time: 4.017	Objective: 6.870e+06	Relative Delta Objective: -1.870e-05	Reconstruction Error: 6.870e+06
Iter: 74	Iter Time: 0.043	Total Time: 4.061	Objective: 6.870e+06	Relative Delta Objective: -1.718e-05	Reconstruction Error: 6.870e+06
Iter: 75	Iter Time: 0.047	Total Time: 4.108	Objective: 6.870e+06	Relative Delta Objective: -1.150e-05	Reconstruction Error: 6.870e+06
Iter: 76	Iter Time: 0.040	Total Time: 4.148	Objective: 6.871e+06	Relative Delta Objective: -7.496e-06	Reconstruction Error: 6.871e+06
Iter: 77	Iter Time: 0.049	Total Time: 4.197	Objective: 6.871e+06	Relative Delta Objective: 9.461e-07	Reconstruction Error: 6.871e+06
Iter: 78	Iter Time: 0.042	Total Time: 4.239	Objective: 6.870e+06	Relative Delta Objective: 6.477e-06	Reconstruction Error: 6.870e+06
Iter: 79	Iter Time: 0.043	Total Time: 4.282	Objective: 6.870e+06	Relative Delta Objective: 1.390e-05	Reconstruction Error: 6.870e+06
Iter: 80	Iter Time: 0.042	Total Time: 4.324	Objective: 6.870e+06	Relative Delta Objective: 2.016e-05	Reconstruction Error: 6.870e+06
Iter: 81	Iter Time: 0.042	Total Time: 4.367	Objective: 6.870e+06	Relative Delta Objective: 2.875e-05	Reconstruction Error: 6.870e+06
Iter: 82	Iter Time: 0.042	Total Time: 4.409	Objective: 6.870e+06	Relative Delta Objective: 4.309e-05	Reconstruction Error: 6.870e+06
Iter: 83	Iter Time: 0.047	Total Time: 4.456	Objective: 6.869e+06	Relative Delta Objective: 5.022e-05	Reconstruction Error: 6.869e+06
Iter: 84	Iter Time: 0.041	Total Time: 4.497	Objective: 6.869e+06	Relative Delta Objective: 6.456e-05	Reconstruction Error: 6.869e+06
Iter: 85	Iter Time: 0.045	Total Time: 4.542	Objective: 6.869e+06	Relative Delta Objective: 6.333e-05	Reconstruction Error: 6.869e+06
Iter: 86	Iter Time: 0.187	Total Time: 4.729	Objective: 6.868e+06	Relative Delta Objective: 6.173e-05	Reconstruction Error: 6.868e+06
Iter: 87	Iter Time: 0.049	Total Time: 4.778	Objective: 6.868e+06	Relative Delta Objective: 5.780e-05	Reconstruction Error: 6.868e+06
Iter: 88	Iter Time: 0.044	Total Time: 4.822	Objective: 6.867e+06	Relative Delta Objective: 4.849e-05	Reconstruction Error: 6.867e+06
Iter: 89	Iter Time: 0.042	Total Time: 4.864	Objective: 6.867e+06	Relative Delta Objective: 4.165e-05	Reconstruction Error: 6.867e+06
Iter: 90	Iter Time: 0.042	Total Time: 4.907	Objective: 6.867e+06	Relative Delta Objective: 3.357e-05	Reconstruction Error: 6.867e+06
Iter: 91	Iter Time: 0.041	Total Time: 4.948	Objective: 6.867e+06	Relative Delta Objective: 2.774e-05	Reconstruction Error: 6.867e+06
Iter: 92	Iter Time: 0.046	Total Time: 4.993	Objective: 6.867e+06	Relative Delta Objective: 2.184e-05	Reconstruction Error: 6.867e+06
Iter: 93	Iter Time: 0.044	Total Time: 5.037	Objective: 6.866e+06	Relative Delta Objective: 1.544e-05	Reconstruction Error: 6.866e+06
Iter: 94	Iter Time: 0.044	Total Time: 5.081	Objective: 6.866e+06	Relative Delta Objective: 1.085e-05	Reconstruction Error: 6.866e+06
Iter: 95	Iter Time: 0.044	Total Time: 5.125	Objective: 6.866e+06	Relative Delta Objective: 7.646e-06	Reconstruction Error: 6.866e+06
Iter: 96	Iter Time: 0.040	Total Time: 5.165	Objective: 6.866e+06	Relative Delta Objective: 5.898e-06	Reconstruction Error: 6.866e+06
Iter: 97	Iter Time: 0.044	Total Time: 5.209	Objective: 6.866e+06	Relative Delta Objective: 4.588e-06	Reconstruction Error: 6.866e+06
Iter: 98	Iter Time: 0.042	Total Time: 5.252	Objective: 6.866e+06	Relative Delta Objective: 3.423e-06	Reconstruction Error: 6.866e+06
Iter: 99	Iter Time: 0.044	Total Time: 5.295	Objective: 6.866e+06	Relative Delta Objective: 2.330e-06	Reconstruction Error: 6.866e+06
Iter: 100	Iter Time: 0.043	Total Time: 5.339	Objective: 6.866e+06	Relative Delta Objective: 2.622e-06	Reconstruction Error: 6.866e+06
Iter: 101	Iter Time: 0.040	Total Time: 5.378	Objective: 6.866e+06	Relative Delta Objective: 2.112e-06	Reconstruction Error: 6.866e+06
Iter: 102	Iter Time: 0.046	Total Time: 5.425	Objective: 6.866e+06	Relative Delta Objective: 1.748e-06	Reconstruction Error: 6.866e+06
Iter: 103	Iter Time: 0.052	Total Time: 5.477	Objective: 6.866e+06	Relative Delta Objective: 1.238e-06	Reconstruction Error: 6.866e+06
Iter: 104	Iter Time: 0.040	Total Time: 5.517	Objective: 6.866e+06	Relative Delta Objective: 8.739e-07	Reconstruction Error: 6.866e+06
Iter: 105	Iter Time: 0.042	Total Time: 5.559	Objective: 6.866e+06	Relative Delta Objective: 4.369e-07	Reconstruction Error: 6.866e+06
Iter: 106	Iter Time: 0.042	Total Time: 5.600	Objective: 6.866e+06	Relative Delta Objective: 2.913e-07	Reconstruction Error: 6.866e+06
Iter: 107	Iter Time: 0.043	Total Time: 5.643	Objective: 6.866e+06	Relative Delta Objective: 1.456e-07	Reconstruction Error: 6.866e+06
Iter: 108	Iter Time: 0.047	Total Time: 5.690	Objective: 6.866e+06	Relative Delta Objective: 7.282e-08	Reconstruction Error: 6.866e+06
Iter: 109	Iter Time: 0.042	Total Time: 5.732	Objective: 6.866e+06	Relative Delta Objective: 0.000e+00	Reconstruction Error: 6.866e+06
In [10]:
scotch.write_gif('NMTF_lU_lV_0.15.gif')
writing gif to ./NMTF_lU_lV_0.15.gif

NMTF with Reg 0.15

This factorization produces a much richer representation in S which indicate the relationship between column clusters and row clusters. The function below plots the factors in a manor where they are sorted by the U and V with maximum contributions. In the sorted representation below we can see that most factors of U and V are orthogonal. Notably, some of the factors in V, e.g. 3, 4, and 5, larely reflect a similar trend in the matrix. Specifically, they capture columns which are representation in all clusters.

In [11]:
fig = scotch.visualize_factors_sorted()
display(fig)
No description has been provided for this image

The orthogonal representation naturally lends itself to assigning these row and column features to clusters. For each row of U and V (note the representation above shows $V^T$) we assign them to the cluster based on the $argmax(U[i, :])$ and $argmax(V[i, :])$. The clustering is implemented in the assign_cluster function below. After assigning clusters, the SCOTCH information can be directly added back to the adata object using he addScotchEmbeddingToAnnData() function.

In [12]:
scotch.assign_cluster()
In [13]:
scotch.add_scotch_embeddings_to_adata(adata, 'NMTF_lU_lV_0.15')
Out[13]:
AnnData object with n_obs × n_vars = 500 × 1000
    obs: 'NMTF_lU_lV_0.15_cell_clusters'
    var: 'NMTF_lU_lV_0.15_gene_clusters'
    uns: 'NMTF_lU_lV_0.15_S_matrix', 'NMTF_lU_lV_0.15_reconstruction_error', 'NMTF_lU_lV_0.15_error'
    obsm: 'NMTF_lU_lV_0.15_cell_embedding', 'NMTF_lU_lV_0.15_P_embedding'
    varm: 'NMTF_lU_lV_0.15_gene_embedding', 'NMTF_lU_lV_0.15_Q_embedding'

We can visualize the different U and V components using UMAP below. The first UMAP is generated using PCA. The second and third UMAP are generated using the SCOTCH embeddings. First using $U$, and second using $U*S = P$. The $U$ matrix's orthogonality may lead to poor representations in the UMAP space due to the instability in the knn graph. Generally this can be improved by increasing the number of neighbors. The $P$ embedding generally produces better representation in the UMAP space because it is less effected by the orthogonality constraint. The 4th UMAP demonstrates the relationships captured in the column space.

In [14]:
sc.pp.pca(adata)
sc.pp.neighbors(adata, use_rep= "X_pca")
sc.tl.umap(adata)
sc.pl.umap(adata, color = "NMTF_lU_lV_0.15_cell_clusters")
No description has been provided for this image
In [15]:
sc.pp.pca(adata)
sc.pp.neighbors(adata, use_rep= "NMTF_lU_lV_0.15_cell_embedding", n_neighbors= 100)
sc.tl.umap(adata)
sc.pl.umap(adata, color = "NMTF_lU_lV_0.15_cell_clusters")
No description has been provided for this image
In [16]:
sc.pp.pca(adata)
sc.pp.neighbors(adata, use_rep= "NMTF_lU_lV_0.15_P_embedding", n_neighbors= 100)
sc.tl.umap(adata)
sc.pl.umap(adata, color = "NMTF_lU_lV_0.15_cell_clusters")
No description has been provided for this image
In [17]:
adata_genes = adata.T
sc.pp.pca(adata_genes)
sc.pp.neighbors(adata_genes, use_rep= "NMTF_lU_lV_0.15_Q_embedding", n_neighbors= 300)
sc.tl.umap(adata_genes)
sc.pl.umap(adata_genes, color = "NMTF_lU_lV_0.15_gene_clusters")
/opt/anaconda3/envs/Pytorch/lib/python3.9/site-packages/sklearn/manifold/_spectral_embedding.py:274: UserWarning: Graph is not fully connected, spectral embedding may not work as expected.
  warnings.warn(
No description has been provided for this image

SCOTCH provides cluster visualization technique where the clusters are ordered, and each alternating cluster is assigned either a black or grey barcode.

In [18]:
scotch.visualize_clusters()
Out[18]:
No description has been provided for this image
In [19]:
scotch.visualize_clusters_sorted()
Out[19]:
No description has been provided for this image

The scotch update is susceptible to poor initialization which results in multiple factors representation similar contributions in V. This can result in some of the errors in clustering. One method to get around this is to utilize NMTF to generate a lower embedding, clustering the embedding, and using this as an initialization which accurately capture the relationship between factors. This is performed using the reclusterV function below.

In [20]:
scotch.recluster_V()
scotch.assign_cluster()
Iter: 1	Iter Time: 0.243	Total Time: 0.243	Objective: 6.858e+06	Relative Delta Objective: 7.597e-04	Reconstruction Error: 6.858e+06
Iter: 2	Iter Time: 0.061	Total Time: 0.304	Objective: 6.858e+06	Relative Delta Objective: -6.657e-05	Reconstruction Error: 6.858e+06
Iter: 3	Iter Time: 0.057	Total Time: 0.360	Objective: 6.859e+06	Relative Delta Objective: -1.027e-04	Reconstruction Error: 6.859e+06
Iter: 4	Iter Time: 0.047	Total Time: 0.407	Objective: 6.859e+06	Relative Delta Objective: -5.154e-05	Reconstruction Error: 6.859e+06
Iter: 5	Iter Time: 0.059	Total Time: 0.466	Objective: 6.859e+06	Relative Delta Objective: -3.601e-05	Reconstruction Error: 6.859e+06
Iter: 6	Iter Time: 0.069	Total Time: 0.535	Objective: 6.860e+06	Relative Delta Objective: -3.550e-05	Reconstruction Error: 6.860e+06
Iter: 7	Iter Time: 0.069	Total Time: 0.604	Objective: 6.860e+06	Relative Delta Objective: -4.162e-05	Reconstruction Error: 6.860e+06
Iter: 8	Iter Time: 0.073	Total Time: 0.676	Objective: 6.860e+06	Relative Delta Objective: -4.942e-05	Reconstruction Error: 6.860e+06
Iter: 9	Iter Time: 0.072	Total Time: 0.749	Objective: 6.861e+06	Relative Delta Objective: -6.049e-05	Reconstruction Error: 6.861e+06
Iter: 10	Iter Time: 0.060	Total Time: 0.809	Objective: 6.861e+06	Relative Delta Objective: -6.770e-05	Reconstruction Error: 6.861e+06
Iter: 11	Iter Time: 0.089	Total Time: 0.897	Objective: 6.862e+06	Relative Delta Objective: -6.457e-05	Reconstruction Error: 6.862e+06
Iter: 12	Iter Time: 0.057	Total Time: 0.955	Objective: 6.862e+06	Relative Delta Objective: -5.560e-05	Reconstruction Error: 6.862e+06
Iter: 13	Iter Time: 0.052	Total Time: 1.006	Objective: 6.862e+06	Relative Delta Objective: -4.037e-05	Reconstruction Error: 6.862e+06
Iter: 14	Iter Time: 0.045	Total Time: 1.051	Objective: 6.863e+06	Relative Delta Objective: -2.485e-05	Reconstruction Error: 6.863e+06
Iter: 15	Iter Time: 0.073	Total Time: 1.124	Objective: 6.863e+06	Relative Delta Objective: -8.015e-06	Reconstruction Error: 6.863e+06
Iter: 16	Iter Time: 0.047	Total Time: 1.171	Objective: 6.862e+06	Relative Delta Objective: 1.209e-05	Reconstruction Error: 6.862e+06
Iter: 17	Iter Time: 0.169	Total Time: 1.340	Objective: 6.862e+06	Relative Delta Objective: 4.736e-06	Reconstruction Error: 6.862e+06
Iter: 18	Iter Time: 0.050	Total Time: 1.390	Objective: 6.862e+06	Relative Delta Objective: 2.689e-05	Reconstruction Error: 6.862e+06
Iter: 19	Iter Time: 0.043	Total Time: 1.433	Objective: 6.862e+06	Relative Delta Objective: 2.324e-05	Reconstruction Error: 6.862e+06
Iter: 20	Iter Time: 0.042	Total Time: 1.475	Objective: 6.862e+06	Relative Delta Objective: 3.629e-05	Reconstruction Error: 6.862e+06
Iter: 21	Iter Time: 0.046	Total Time: 1.521	Objective: 6.862e+06	Relative Delta Objective: 2.784e-05	Reconstruction Error: 6.862e+06
Iter: 22	Iter Time: 0.055	Total Time: 1.577	Objective: 6.861e+06	Relative Delta Objective: 2.536e-05	Reconstruction Error: 6.861e+06
Iter: 23	Iter Time: 0.046	Total Time: 1.623	Objective: 6.861e+06	Relative Delta Objective: 1.137e-05	Reconstruction Error: 6.861e+06
Iter: 24	Iter Time: 0.050	Total Time: 1.673	Objective: 6.861e+06	Relative Delta Objective: 1.385e-06	Reconstruction Error: 6.861e+06
Iter: 25	Iter Time: 0.049	Total Time: 1.722	Objective: 6.861e+06	Relative Delta Objective: -6.558e-07	Reconstruction Error: 6.861e+06
Iter: 26	Iter Time: 0.049	Total Time: 1.771	Objective: 6.861e+06	Relative Delta Objective: 1.166e-06	Reconstruction Error: 6.861e+06
Iter: 27	Iter Time: 0.057	Total Time: 1.828	Objective: 6.861e+06	Relative Delta Objective: 3.352e-06	Reconstruction Error: 6.861e+06
Iter: 28	Iter Time: 0.044	Total Time: 1.872	Objective: 6.861e+06	Relative Delta Objective: 5.538e-06	Reconstruction Error: 6.861e+06
Iter: 29	Iter Time: 0.045	Total Time: 1.917	Objective: 6.861e+06	Relative Delta Objective: 3.279e-06	Reconstruction Error: 6.861e+06
Iter: 30	Iter Time: 0.064	Total Time: 1.981	Objective: 6.861e+06	Relative Delta Objective: 3.206e-06	Reconstruction Error: 6.861e+06
Iter: 31	Iter Time: 0.055	Total Time: 2.035	Objective: 6.861e+06	Relative Delta Objective: 2.478e-06	Reconstruction Error: 6.861e+06
Iter: 32	Iter Time: 0.062	Total Time: 2.098	Objective: 6.861e+06	Relative Delta Objective: 4.810e-06	Reconstruction Error: 6.861e+06
Iter: 33	Iter Time: 0.049	Total Time: 2.146	Objective: 6.861e+06	Relative Delta Objective: 3.644e-06	Reconstruction Error: 6.861e+06
Iter: 34	Iter Time: 0.047	Total Time: 2.194	Objective: 6.861e+06	Relative Delta Objective: 2.915e-06	Reconstruction Error: 6.861e+06
Iter: 35	Iter Time: 0.065	Total Time: 2.259	Objective: 6.861e+06	Relative Delta Objective: 5.538e-06	Reconstruction Error: 6.861e+06
Iter: 36	Iter Time: 0.057	Total Time: 2.316	Objective: 6.861e+06	Relative Delta Objective: 3.935e-06	Reconstruction Error: 6.861e+06
Iter: 37	Iter Time: 0.069	Total Time: 2.385	Objective: 6.861e+06	Relative Delta Objective: 2.623e-06	Reconstruction Error: 6.861e+06
Iter: 38	Iter Time: 0.180	Total Time: 2.565	Objective: 6.861e+06	Relative Delta Objective: 6.340e-06	Reconstruction Error: 6.861e+06
Iter: 39	Iter Time: 0.047	Total Time: 2.611	Objective: 6.861e+06	Relative Delta Objective: 5.101e-06	Reconstruction Error: 6.861e+06
Iter: 40	Iter Time: 0.045	Total Time: 2.656	Objective: 6.861e+06	Relative Delta Objective: 4.883e-06	Reconstruction Error: 6.861e+06
Iter: 41	Iter Time: 0.042	Total Time: 2.698	Objective: 6.861e+06	Relative Delta Objective: 1.822e-06	Reconstruction Error: 6.861e+06
Iter: 42	Iter Time: 0.051	Total Time: 2.750	Objective: 6.861e+06	Relative Delta Objective: 3.571e-06	Reconstruction Error: 6.861e+06
Iter: 43	Iter Time: 0.056	Total Time: 2.806	Objective: 6.861e+06	Relative Delta Objective: 2.478e-06	Reconstruction Error: 6.861e+06
Iter: 44	Iter Time: 0.052	Total Time: 2.858	Objective: 6.861e+06	Relative Delta Objective: 1.822e-06	Reconstruction Error: 6.861e+06
Iter: 45	Iter Time: 0.046	Total Time: 2.904	Objective: 6.861e+06	Relative Delta Objective: 1.530e-06	Reconstruction Error: 6.861e+06
Iter: 46	Iter Time: 0.043	Total Time: 2.947	Objective: 6.861e+06	Relative Delta Objective: 1.312e-06	Reconstruction Error: 6.861e+06
Iter: 47	Iter Time: 0.054	Total Time: 3.000	Objective: 6.861e+06	Relative Delta Objective: 1.166e-06	Reconstruction Error: 6.861e+06
Iter: 48	Iter Time: 0.057	Total Time: 3.058	Objective: 6.861e+06	Relative Delta Objective: 1.020e-06	Reconstruction Error: 6.861e+06
Iter: 49	Iter Time: 0.049	Total Time: 3.106	Objective: 6.861e+06	Relative Delta Objective: 1.020e-06	Reconstruction Error: 6.861e+06
Iter: 50	Iter Time: 0.045	Total Time: 3.151	Objective: 6.861e+06	Relative Delta Objective: 8.745e-07	Reconstruction Error: 6.861e+06
Iter: 51	Iter Time: 0.052	Total Time: 3.203	Objective: 6.861e+06	Relative Delta Objective: 3.134e-06	Reconstruction Error: 6.861e+06
Iter: 52	Iter Time: 0.044	Total Time: 3.246	Objective: 6.861e+06	Relative Delta Objective: 1.239e-06	Reconstruction Error: 6.861e+06
Iter: 53	Iter Time: 0.052	Total Time: 3.298	Objective: 6.861e+06	Relative Delta Objective: -2.186e-07	Reconstruction Error: 6.861e+06
Iter: 54	Iter Time: 0.046	Total Time: 3.344	Objective: 6.861e+06	Relative Delta Objective: 2.186e-07	Reconstruction Error: 6.861e+06
Iter: 55	Iter Time: 0.041	Total Time: 3.386	Objective: 6.861e+06	Relative Delta Objective: 2.915e-07	Reconstruction Error: 6.861e+06
Iter: 56	Iter Time: 0.048	Total Time: 3.433	Objective: 6.861e+06	Relative Delta Objective: 2.915e-07	Reconstruction Error: 6.861e+06
Iter: 57	Iter Time: 0.062	Total Time: 3.495	Objective: 6.861e+06	Relative Delta Objective: 2.186e-07	Reconstruction Error: 6.861e+06
Iter: 58	Iter Time: 0.065	Total Time: 3.560	Objective: 6.861e+06	Relative Delta Objective: 1.458e-07	Reconstruction Error: 6.861e+06
Iter: 59	Iter Time: 0.059	Total Time: 3.619	Objective: 6.861e+06	Relative Delta Objective: 1.458e-07	Reconstruction Error: 6.861e+06
Iter: 60	Iter Time: 0.054	Total Time: 3.673	Objective: 6.861e+06	Relative Delta Objective: 7.288e-08	Reconstruction Error: 6.861e+06
Iter: 61	Iter Time: 0.064	Total Time: 3.737	Objective: 6.861e+06	Relative Delta Objective: 7.288e-08	Reconstruction Error: 6.861e+06
Iter: 62	Iter Time: 0.050	Total Time: 3.787	Objective: 6.861e+06	Relative Delta Objective: 0.000e+00	Reconstruction Error: 6.861e+06
In [21]:
scotch.visualize_factors()
Out[21]:
No description has been provided for this image

After running this function the elements of better capture different natural column clusters.

In [22]:
scotch.visualize_clusters()
Out[22]:
No description has been provided for this image
In [23]:
scotch.visualize_clusters_sorted()
Out[23]:
No description has been provided for this image