GeneCover Functions Reference

GeneCover(num_marker, corr_mat, w, m=3, interval=0, lambdaMax=0.3, lambdaMin=0.05, timeLimit=600, output=0, solver='Gurobi')[source]

Selects marker genes based on gene-gene correlation using combinatorial optimization or a greedy heuristic.

Parameters:
  • num_marker (int) – Desired number of markers to select.

  • corr_mat (np.ndarray) – Gene-gene correlation matrix.

  • interval (int) – Allowed deviation from num_marker. The final number of markers may vary within this range.

  • w (np.ndarray) – An array of weights for the genes. Higher weights indicate higher cost for selection.

  • lambdaMax (float) – Maximum threshold for acceptable gene-gene correlation.

  • lambdaMin (float) – Minimum threshold for acceptable gene-gene correlation.

  • timeLimit (float) – Time limit (in seconds) for the optimization.

  • ouput (int) – Whether to print the optimization process. Set to 1 to enable.

  • solver (str) – The solver to use for the optimization. Options are “Gurobi”, “SCIP”, and “Greedy”.

  • greedy (bool) – Whether to use a greedy algorithm for set cover instead of the Gurobi solver. Default: False.

Returns:

Indices of the selected marker genes.

Return type:

List[int]

Iterative_GeneCover(incremental_sizes, corr_mat, w, m=3, lambdaMin=0.05, lambdaMax=0.3, timeLimit=600, output=0, solver='Gurobi')[source]

Performs iterative marker gene selection using the GeneCover algorithm.

Parameters:
  • corr_mat (np.ndarray) – Gene-gene correlation matrix of shape (d, d).

  • incremental_sizes (List[int]) – A list indicating the number of markers to select at each iteration.

  • w (np.ndarray) – An array of weights for each gene. Higher weights indicate higher cost for selection.

  • lambdaMax (float) – Maximum threshold for gene-gene correlation.

  • lambdaMin (float) – Minimum threshold for gene-gene correlation.

  • timeLimit (float) – Time limit (in seconds) for the optimization.

  • output (int) – Whether to print the optimization process. Set to 1 to enable.

  • solver (str) – The solver to use for the optimization. Options are “Gurobi”, “SCIP”, and “Greedy”.

Returns:

A list where each element is a list of indices of the selected marker genes at the corresponding iteration.

Return type:

List[List[int]]

covering(Z, minSize=1, alpha=0.05, weights=1.0, output=None, callBack=None, poolSolutions=None, poolSearchMode=None, poolGap=None, timeLimit=None, LogToConsole=1, restart=None)[source]

Solves the minimal weight set covering problem using the Gurobi solver.

Parameters:
  • Z (np.ndarray) – A binary matrix of shape (N, d), where N is the number of samples and d is the number of genes.

  • minSize (int) – The minimum number of genes to select.

  • alpha (float) – The minimum fraction of samples that must be covered.

  • weights (np.ndarray) – A 1D array of weights for each gene. Higher weights indicate higher cost for selection.

  • output (int) – Enables or disables solver output. Set to 1 to print optimization details, 0 to suppress.

  • callBack (Callable) – A callback function to be invoked during optimization.

  • poolSolutions (int) – Number of solutions to store in the solution pool. See: https://www.gurobi.com/documentation/current/refman/poolsolutions.html

  • poolSearchMode (int) – Mode for exploring the MIP search tree. See: https://www.gurobi.com/documentation/current/refman/poolsearchmode.html

  • poolGap (float) – Relative MIP optimality gap for accepting solutions into the pool. See:https://www.gurobi.com/documentation/current/refman/poolgap.html

  • timeLimit (float) – Time limit (in seconds) for the optimization run.

  • LogToConsole (int) – Whether to print the optimization log. Set to 1 to enable.

  • restart (gurobipy.Model) – A Gurobi model instance to restart the optimization from.

Returns:

Indices of the selected genes.

Return type:

List[int]

covering_scip(Z, minSize=1, weights=1.0, timeLimit=None, output=1)[source]

Solves the minimal weight set covering problem using the SCIP solver.

Parameters:
  • Z (np.ndarray) – A binary matrix of shape (N, d), where Z[i, j] == 1 indicates that set j covers element i.

  • minSize (int, optional) – Minimum number of sets required to cover each element. Defaults to 1.

  • weights (str|float|Sequence[float], optional) – Weights for each set. - If ‘prob’, uses 1 - 0.01 * mean coverage per column. - If scalar, uses the same weight for all sets. - Otherwise, an array of length d giving each set’s weight. Defaults to 1.0.

  • timeLimit (float, optional) – Time limit in seconds for the SCIP solver. Defaults to None (no limit).

  • output (int, optional) – 1 to enable solver output, 0 to suppress it. Defaults to 1.

Returns:

List of selected set indices (column indices of Z).

Return type:

List[int]

gene_gene_correlation(X, method='spearman')[source]

Compute the gene-gene correlation matrix from the gene expression matrix X.

Parameters:
  • X (np.ndarray or List[np.ndarray]) – A matrix of shape (N, d), where N is the number of cells/spots and d is the number of genes. Alternatively, a list of such matrices (from different batches/samples) with consistent gene dimensions.

  • method (str) – Method used to compute correlation. Must be either ‘spearman’ or ‘pearson’.

Returns:

A gene-gene correlation matrix of shape (d, d).

Return type:

np.ndarray

greedy_weighted_set_cover(Z, w)[source]

Greedy heuristic for the weighted set cover problem.

Parameters:
  • Z (np.ndarray) – A binary matrix of shape (n_elements, m_sets), where Z[i, j] == 1 indicates that set j covers element i.

  • w (np.ndarray) – A 1D array of length m_sets representing the weight of each set.

Returns:

Indices of the selected sets (column indices of Z) that form a cover.

Return type:

List[int]