GeneCover Functions Reference

GeneCover(num_marker, corr_mat, w, m=3, interval=0, lambdaMax=0.3, lambdaMin=0.05, timeLimit=600, output=0, solver='Gurobi')[source]

Selects marker genes based on gene-gene correlation using combinatorial optimization or a greedy heuristic.

Parameters:

num_marker (int) – Desired number of markers to select.
corr_mat (np.ndarray) – Gene-gene correlation matrix.
interval (int) – Allowed deviation from num_marker. The final number of markers may vary within this range.
w (np.ndarray) – An array of weights for the genes. Higher weights indicate higher cost for selection.
lambdaMax (float) – Maximum threshold for acceptable gene-gene correlation.
lambdaMin (float) – Minimum threshold for acceptable gene-gene correlation.
timeLimit (float) – Time limit (in seconds) for the optimization.
ouput (int) – Whether to print the optimization process. Set to 1 to enable.
solver (str) – The solver to use for the optimization. Options are “Gurobi”, “SCIP”, and “Greedy”.
greedy (bool) – Whether to use a greedy algorithm for set cover instead of the Gurobi solver. Default: False.

Returns:

Indices of the selected marker genes.

Return type:

List[int]

Iterative_GeneCover(incremental_sizes, corr_mat, w, m=3, lambdaMin=0.05, lambdaMax=0.3, timeLimit=600, output=0, solver='Gurobi')[source]

Performs iterative marker gene selection using the GeneCover algorithm.

Parameters:

corr_mat (np.ndarray) – Gene-gene correlation matrix of shape (d, d).
incremental_sizes (List[int]) – A list indicating the number of markers to select at each iteration.
w (np.ndarray) – An array of weights for each gene. Higher weights indicate higher cost for selection.
lambdaMax (float) – Maximum threshold for gene-gene correlation.
lambdaMin (float) – Minimum threshold for gene-gene correlation.
timeLimit (float) – Time limit (in seconds) for the optimization.
output (int) – Whether to print the optimization process. Set to 1 to enable.
solver (str) – The solver to use for the optimization. Options are “Gurobi”, “SCIP”, and “Greedy”.

Returns:

A list where each element is a list of indices of the selected marker genes at the corresponding iteration.

Return type:

List[List[int]]

covering(Z, minSize=1, alpha=0.05, weights=1.0, output=None, callBack=None, poolSolutions=None, poolSearchMode=None, poolGap=None, timeLimit=None, LogToConsole=1, restart=None)[source]

Solves the minimal weight set covering problem using the Gurobi solver.

Parameters:

Z (np.ndarray) – A binary matrix of shape (N, d), where N is the number of samples and d is the number of genes.
minSize (int) – The minimum number of genes to select.
alpha (float) – The minimum fraction of samples that must be covered.
weights (np.ndarray) – A 1D array of weights for each gene. Higher weights indicate higher cost for selection.
output (int) – Enables or disables solver output. Set to 1 to print optimization details, 0 to suppress.
callBack (Callable) – A callback function to be invoked during optimization.
poolSolutions (int) – Number of solutions to store in the solution pool. See: https://www.gurobi.com/documentation/current/refman/poolsolutions.html
poolSearchMode (int) – Mode for exploring the MIP search tree. See: https://www.gurobi.com/documentation/current/refman/poolsearchmode.html
poolGap (float) – Relative MIP optimality gap for accepting solutions into the pool. See:https://www.gurobi.com/documentation/current/refman/poolgap.html
timeLimit (float) – Time limit (in seconds) for the optimization run.
LogToConsole (int) – Whether to print the optimization log. Set to 1 to enable.
restart (gurobipy.Model) – A Gurobi model instance to restart the optimization from.

Returns:

Indices of the selected genes.

Return type:

List[int]

covering_scip(Z, minSize=1, weights=1.0, timeLimit=None, output=1)[source]

Solves the minimal weight set covering problem using the SCIP solver.

Parameters:

Z (np.ndarray) – A binary matrix of shape (N, d), where Z[i, j] == 1 indicates that set j covers element i.
minSize (int, optional) – Minimum number of sets required to cover each element. Defaults to 1.
weights (str|float|Sequence[float], optional) – Weights for each set. - If ‘prob’, uses 1 - 0.01 * mean coverage per column. - If scalar, uses the same weight for all sets. - Otherwise, an array of length d giving each set’s weight. Defaults to 1.0.
timeLimit (float, optional) – Time limit in seconds for the SCIP solver. Defaults to None (no limit).
output (int, optional) – 1 to enable solver output, 0 to suppress it. Defaults to 1.

Returns:

List of selected set indices (column indices of Z).

Return type:

List[int]

gene_gene_correlation(X, method='spearman')[source]

Compute the gene-gene correlation matrix from the gene expression matrix X.

Parameters:

X (np.ndarray or List[np.ndarray]) – A matrix of shape (N, d), where N is the number of cells/spots and d is the number of genes. Alternatively, a list of such matrices (from different batches/samples) with consistent gene dimensions.
method (str) – Method used to compute correlation. Must be either ‘spearman’ or ‘pearson’.

Returns:

A gene-gene correlation matrix of shape (d, d).

Return type:

np.ndarray

greedy_weighted_set_cover(Z, w)[source]

Greedy heuristic for the weighted set cover problem.

Parameters:

Z (np.ndarray) – A binary matrix of shape (n_elements, m_sets), where Z[i, j] == 1 indicates that set j covers element i.
w (np.ndarray) – A 1D array of length m_sets representing the weight of each set.

Returns:

Indices of the selected sets (column indices of Z) that form a cover.

Return type:

List[int]