Skip to content

stamp

Main module.

STAMP

__init__(adata, n_topics=20, n_layers=1, hidden_size=128, layer=None, dropout=0.0, train_size=1, rank=None, categorical_covariate_keys=None, continous_covariate_keys=None, time_covariate_keys=None, enc_distribution='mvn', gene_likelihood='nb', mode='sign', verbose=False)

Initialize model

Parameters:

Name Type Description Default
adata _type_

AnnData object

required
n_topics int

Number of topics to model. Defaults to 10.

20
n_layers int

Number of layers to do SGC. Defaults to 1.

1
hidden_size int

Number of nodes in the hidden layer of the encoder. Defaults to 50.

128
layer _type_

Layer where the counts data are stored. X is used

None
dropout float

Dropout used for the encoder. Defaults to 0.0.

0.0
categorical_covariate_keys _type_

Categorical batch keys

None
continous_covariate_keys _type_

Continous bathc key

None
verbose bool

Print out information on the model. Defaults to True.

False
batch_size int

Batch size. Defaults to 1024.

required
enc_distribution str

Encoder distribution. Choices are multivariate normal. Defaults to "mvn".

'mvn'
mode str

sign vs sgc(simplified graph convolutions).

'sign'
beta float

Beta as in Beta-VAE. Defaults to 1.

required
train(max_epochs=800, min_epochs=100, learning_rate=0.01, betas=(0.9, 0.999), not_cov_epochs=5, device='cuda:0', batch_size=256, sampler='R', weight_decay=0, iterations_to_anneal=1, min_kl=1, max_kl=1, early_stop=True, patience=20, shuffle=True, num_particles=1)

Training the data

Parameters:

Name Type Description Default
max_epochs int

Maximum number of epochs to run. Defaults to 2000.

800
learning_rate float

Learning rate of AdamW optimi er. DefaRults to 0.01.

0.01
device str

Which device to run model on. Use "cpu" to run on cpu and cuda to run on gpu. Defaults to "cuda:0".

'cuda:0'
weight_decay float

Weight decay of AdamW optimizer. Defaults to 0.1.

0
early_stop bool

Whether to early stop when training plateau. Defaults to True.

True
patience int

How many epochs to stop training when training plateau. Defaults to 20.

20
get_metrics(topk=20, layer=None, TGC=True, pseudocount=0.1)

Get metrics

Parameters:

Name Type Description Default
topk int

Number of top genes to use to score the metrics. Defaults to 10.

20
layer _type_

Which layer to use to score the metrics. If none is chosen, use X. Defaults to None.

None
TGC bool

Whether to calculate the topic gene correlation. Defaults to True.

True

Returns:

Name Type Description
_type_

description

get_cell_by_topic(adata=None, batch_size=None, device=None)

Get latent topics after training.

Parameters:

Name Type Description Default
device str

What device to use. Defaults to "cpu".

None

Returns:

Name Type Description
_type_

A dataframe of cell by topics where each row sum to one.

get_feature_by_topic(device='cpu', return_softmax=False, transpose=False, pseudocount=0.1)

Get the gene modules

Parameters:

Name Type Description Default
device str

Which device to use. Defaults to "cpu".

'cpu'
num_samples int

Number of samples to use for calculation. Defaults to 1000.

required
pct float

Depreciated . Defaults to 0.5.

required
return_softmax bool

Depreciated. Defaults to False.

False

Returns:

Name Type Description
_type_

description