Utility FunctionsΒΆ

convertTextToPT.convert_text_to_pt(args)[source]

Converts a delimited text file to a PyTorch tensor (.pt) file.

This function reads a delimited file using pandas, converts the data to a NumPy array, then to a PyTorch tensor, and finally saves it as a .pt file. The output file will have the same name as the input file, but with a .pt extension.

Parameters:

args (argparse.Namespace) --

Parsed command line arguments with the following attributes:

  • in_file (str): Path to the input delimited text file.

  • delimiter (str, optional): Delimiter used in the text file. Default is tab (' ').

  • header (int or None, optional): Number of header lines before data. Default is None.

  • dtype (numpy.dtype, optional): Data type for the PyTorch tensor. Default is np.float32.

Return type:

None

runNMTF.runNMTF(args)[source]

Runs Non-negative Matrix Tri-Factorization (NMTF) on an input dataset and saves the results.

This function initializes the NMTF model using the provided arguments, loads the input data (either from a PyTorch .pt file or a tab-delimited text file), fits the model to the data, and saves the output to the specified directory.

Parameters:

args (argparse.Namespace) --

Parsed command line arguments with the following attributes:

  • in_file (str): Path to the input file (tab-delimited matrix or .pt file).

  • k1 (int, optional): Dimension of the row factors. Default is -999.

  • k2 (int, optional): Dimension of the column factors. Default is -999.

  • lU (float, optional): Orthogonal regularization for the U factor. Default is 0.

  • lV (float, optional): Orthogonal regularization for the V factor. Default is 0.

  • aU (float, optional): Sparsity (L1) regularization for the U factor. Default is 0.

  • aV (float, optional): Sparsity (L1) regularization for the V factor. Default is 0.

  • verbose (bool, optional): If True, print progress to the terminal. Default is False.

  • seed (int, optional): Random seed for reproducibility. Default is 1010.

  • max_iter (int, optional): Maximum number of iterations. Default is 100.

  • term_tol (float, optional): Termination tolerance for relative error change. Default is 1e-25.

  • out_dir (str, optional): Directory for saving output files. Default is '.'.

  • save_clust (bool, optional): Save cluster assignments for each iteration. Default is False.

  • kill_factors (bool, optional): Option to kill unused factors. Default is False.

  • track_objective (bool, optional): Track objective function values during training. Default is False.

  • save_USV (bool, optional): Save factorization components (U, S, V) at each iteration. Default is False.

  • device (str, optional): Compute device for PyTorch ('cuda:0', 'cuda:1', 'cpu'). Default is 'cuda:0'.

  • legacy (bool, optional): Use legacy update method for factorization. Default is False.

Return type:

None