scSHARP Usage¶

scSHARP¶

class scSHARP.sc_sharp.scSHARP(data_path, tools, marker_path, preds_path=None, neighbors=2, config='2_40.txt', ncells='all', anndata_layer=None, anndata_use_raw=False)¶

Class for prediction, analysis, and visualization of cell type based on DGE matrix

scSHARP object manages I/O directories, running of component tools, as well as prediction and analysis using scSHARP model.

Attributes:¶

data_path: path to DGE matrix csv preds_path: path to component tool output file csv format tools: list of component tool string names marker_path: path to marker gene txt file neighbors: number of neighbors used for tool consensus default value is 2 config: config file for the ncells: number of cells from dataset to use for model prediction pre_processed: boolean. True when dataset has been preprocessed

component_correlation()¶: Returns correlation values and heatmap between tool columns

expression_plots(n=5, genes=None)¶

Generates violoin plots of gene expression.

Parameters¶

nint: number of highly attributed genes to show
geneslist: list of genes to show

Returns¶

Plot

get_component_preds(factorized=False)¶: Returns component predictions if available

heat_map(out_dir=None, n=5)¶: Displays heat map based on model interpretation

Parameters¶

att_df: attribute dataframe generated from scSHARP.run_interpretation() out_dir: optional output directory to save heatmap as pdf. (default: None) n: number of most expressed genes per cell type to display

Returns¶

ax: matplotlib ax object for heatmap

knn_consensus(k=5)¶: returns knn consensus predictions for unconfidently labled cells based on k nearest confident votes

load_model(file_path)¶: Load model as serialized object at specified path

model_eval(config, batch_size, neighbors, dropout, random_inits, training_epochs=150)¶: Evaluates a model for a single hyperparameter configuration

prepare_data(thresh=0.51, normalize=True, scale=True, targetsum=10000.0, run_pca=True, comps=500, cell_fil=0, gene_fil=0)¶: Prepares dataset for training and prediction

run_interpretation()¶

Runs gradient-based model interpretation

Note¶

Interpretation requires a trained model. Model is trained by scSHARP.run_prediction()

Returns¶

int_df: The interpretation dataframe with rows corresponding with genes and columns corresponding to cell types.: Values indicate the model’s gradient of cell type with respect to the corresponding input gene after absolute value and scaling by cell type

run_prediction(training_epochs=150, thresh=0.51, batch_size=40, seed=8)¶

Trains GCN modle on consensus labels and returns predictions

Parameters¶

training_epochs: Number of epochs model will be trained on.: For each epoch the model calculates predictions for the entire training dataset, adjusting model weights one or more times.

thresh: voting threshold for component tools (default: 0.51) batch_size: number of training examples passed through model before calculating gradients (default: 40) seed: random seed (default: 8)

Returns¶

Tuple of:: final_preds: predictions on dataset after final training epoch train_nodes: confident labels used for training test_nodes: confident labels used for evaluation (masked labels) keep_cells: cells used in training process, determined during data preprocessing conf_scores: model confidence values for each prediction

run_tools(out_path, ref_path, ref_label_path)¶

Uses subprocess to run component tools in R.

Parameters¶

out_pathstr: Output path
ref_pathstr: Path to reference dge
ref_label_pathstr: Path to labels for reference data set

Returns¶

bool: True if successful, false if not

save_model(file_path)¶: Save model as serialized object at specified path

unfactorize_preds()¶: function that maps preds back to cell types

Utilities¶

scSHARP.utilities.encode_predictions(df)¶: encodes predictions for each cell with 1 for each prediction

scSHARP.utilities.factorize_df(df, all_cells)¶: factorizes all columns in pandas df

scSHARP.utilities.filter_scores(scores, thresh=0.5)¶: filters out score columns with NAs > threshold

scSHARP.utilities.get_consensus_labels(encoded_y, necessary_vote)¶: method that gets consensus vote of multiple prediction tools If vote is < 1 then taken as threshold pct to be >= to

scSHARP.utilities.get_max_consensus(votes)¶: Gets max consensus

scSHARP.utilities.knn_consensus(counts, preds, n_neighbors, converge=False, one_epoch=False)¶: Do kNN consensus, iterate until x% do not change

scSHARP.utilities.knn_consensus_batch(counts, preds, n_neighbors, converge=False, one_epoch=False, batch_size=1000, keep_conf=False)¶: Do kNN consensus, iterate until x% do not change

scSHARP.utilities.load_model(file_path, target_types)¶: loads model from json format

scSHARP.utilities.mask_labels(labels, masking_pct)¶

masks labels for training

Randomly masks a specified portion of the labels, substituting their value for -1

Parameters¶

labels: list of labels masking_pct: float value for proportion of masked rows

Returns¶

Tuple of:: labels: original list of labels masked_labels: copy of original labels, with masking applied

scSHARP.utilities.pred_accuracy(preds, real)¶: returns accuracy of predictions

scSHARP.utilities.preprocess(data, normalize=True, scale=False, targetsum=10000.0, run_pca=True, comps=500, cell_fil=0, gene_fil=0)¶

Preprocesses raw counts DGE matrix

The default parameter values assume filtered, but not normalized DGE counts matrix with rows representing cells and columns representing genes

Parameters¶

normalize: bool: row norm and lognorm
scale: bool: scale by gene to mean 0 and std 1
targetsum: float: row norm then multiply by target sum
run_pca: bool: Whether or not to run PCA
comps: int: how many components to use for PCA
cel_fil: int: Filter param. Minimum number of cells containing a given gene to be included
gene_fil: int: Filter param. Minimum number of genes containing a given cell to be included

Returns¶

preprocessed dataset as an nD-array

scSHARP.utilities.read_marker_file(file_path)¶

parses marker file

Returns¶

Tuple of:: markers: list of marker genes marker_names: list of string gene names

scSHARP.utilities.weighted_encode(df, encoded_y, tool_weights)¶: More advanced consensus method df: cells x tools tool_weights: cell_types x tools

Interpret¶

scSHARP.interpret.interpret_model(model, X, predictions, genes, batch_size, device, batches=None)¶: Performs PCA interpretation on model

Parameters¶

model: X: predictions: genes: gene names for output dataframe labels batch_size: size of batch for deeplift interpretation device: torch device for running deeplift computations batches: Number of batches to run dataset on deeplift. If None, run deeplift on entire dataset (default: None)

scSHARP Usage¶

scSHARP¶

Attributes:¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Note¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Utilities¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Returns¶

Interpret¶

Parameters¶

Table of Contents

Previous topic

Next topic

This Page