Utility FunctionsΒΆ
- convertTextToPT.convert_text_to_pt(args)[source]
Converts a delimited text file to a PyTorch tensor (.pt) file.
This function reads a delimited file using pandas, converts the data to a NumPy array, then to a PyTorch tensor, and finally saves it as a .pt file. The output file will have the same name as the input file, but with a .pt extension.
- Parameters:
args (argparse.Namespace) --
Parsed command line arguments with the following attributes:
in_file (str): Path to the input delimited text file.
delimiter (str, optional): Delimiter used in the text file. Default is tab (' ').
header (int or None, optional): Number of header lines before data. Default is None.
dtype (numpy.dtype, optional): Data type for the PyTorch tensor. Default is np.float32.
- Return type:
None
- runNMTF.runNMTF(args)[source]
Runs Non-negative Matrix Tri-Factorization (NMTF) on an input dataset and saves the results.
This function initializes the NMTF model using the provided arguments, loads the input data (either from a PyTorch .pt file or a tab-delimited text file), fits the model to the data, and saves the output to the specified directory.
- Parameters:
args (argparse.Namespace) --
Parsed command line arguments with the following attributes:
in_file (str): Path to the input file (tab-delimited matrix or .pt file).
k1 (int, optional): Dimension of the row factors. Default is -999.
k2 (int, optional): Dimension of the column factors. Default is -999.
lU (float, optional): Orthogonal regularization for the U factor. Default is 0.
lV (float, optional): Orthogonal regularization for the V factor. Default is 0.
aU (float, optional): Sparsity (L1) regularization for the U factor. Default is 0.
aV (float, optional): Sparsity (L1) regularization for the V factor. Default is 0.
verbose (bool, optional): If True, print progress to the terminal. Default is False.
seed (int, optional): Random seed for reproducibility. Default is 1010.
max_iter (int, optional): Maximum number of iterations. Default is 100.
term_tol (float, optional): Termination tolerance for relative error change. Default is 1e-25.
out_dir (str, optional): Directory for saving output files. Default is '.'.
save_clust (bool, optional): Save cluster assignments for each iteration. Default is False.
kill_factors (bool, optional): Option to kill unused factors. Default is False.
track_objective (bool, optional): Track objective function values during training. Default is False.
save_USV (bool, optional): Save factorization components (U, S, V) at each iteration. Default is False.
device (str, optional): Compute device for PyTorch ('cuda:0', 'cuda:1', 'cpu'). Default is 'cuda:0'.
legacy (bool, optional): Use legacy update method for factorization. Default is False.
- Return type:
None