SpatialClustering_SpatialGenomicsData
This tutorial demonstrates how to identify spatial domains on spatial genomics data using Pysodb and Spaceflow.
The reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w and https://www.nature.com/articles/s41586-021-04217-4.
Import packages and set configurations
[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Scanpy is a package for single-cell RNA sequencing analysis.
import scanpy as sc
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors
When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.
Streamline development of loading spatial data with Pysodb
[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'zhao2022spatial'
experiment_name = 'mouse_cerebellum_1_dna_200114_14'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[mouse_cerebellum_1_dna_200114_14] in dataset[zhao2022spatial]
Perform SpaceFlow to spatial clustering for spatial genomics data
[8]:
# Create SpaceFlow Object
sf = SpaceFlow.SpaceFlow(
count_matrix=adata.X,
spatial_locs=adata.obsm['spatial'],
sample_names=adata.obs_names,
gene_names=adata.var_names
)
When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().” In the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value respectively.
[9]:
# Preprocess data
sf.preprocessing_data()
When dealing with anndata (adata) where the count or expression matrix is extremely sparse, or where there are a very limited number of features (as is often the case with spatial proteomics data), it may be preferable to forego data preprocessing. This is because over-processing in these instances could lead to errors or diminished performance in downstream tasks. To skip preprocessing, user will need to make modifications to the preprocessing_data function within the “SpaceFlow.py” file of the SpaceFlow package. Specifically, user should comment out the sc.pp.normalize_total(), sc.pp.log1p(), and sc.pp.highly_variable_genes() functions.
When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”, in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.
When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.
[10]:
# Train a deep graph network model
embedding = sf.train(
spatial_regularization_strength=0.1,
z_dim=50,
lr=1e-3,
epochs=1000,
max_patience=50,
min_stop=100,
random_seed=42,
gpu=0,
regularization_acceleration=True,
edge_subset_sz=1000000
)
Epoch 2/1000, Loss: 1.4685497283935547
Epoch 12/1000, Loss: 1.4596877098083496
Epoch 22/1000, Loss: 1.4531407356262207
Epoch 32/1000, Loss: 1.4495148658752441
Epoch 42/1000, Loss: 1.4438341856002808
Epoch 52/1000, Loss: 1.4338985681533813
Epoch 62/1000, Loss: 1.413347601890564
Epoch 72/1000, Loss: 1.3782589435577393
Epoch 82/1000, Loss: 1.3242926597595215
Epoch 92/1000, Loss: 1.2605986595153809
Epoch 102/1000, Loss: 1.2157366275787354
Epoch 112/1000, Loss: 1.182152271270752
Epoch 122/1000, Loss: 1.1559337377548218
Epoch 132/1000, Loss: 1.1408302783966064
Epoch 142/1000, Loss: 1.1258575916290283
Epoch 152/1000, Loss: 1.100624918937683
Epoch 162/1000, Loss: 1.0962754487991333
Epoch 172/1000, Loss: 1.0729572772979736
Epoch 182/1000, Loss: 1.0589983463287354
Epoch 192/1000, Loss: 1.059975266456604
Epoch 202/1000, Loss: 1.0544735193252563
Epoch 212/1000, Loss: 1.0473089218139648
Epoch 222/1000, Loss: 1.048572063446045
Epoch 232/1000, Loss: 1.0420039892196655
Epoch 242/1000, Loss: 1.0365196466445923
Epoch 252/1000, Loss: 1.025299072265625
Epoch 262/1000, Loss: 1.0289864540100098
Epoch 272/1000, Loss: 1.0161466598510742
Epoch 282/1000, Loss: 1.0162009000778198
Epoch 292/1000, Loss: 1.0055137872695923
Epoch 302/1000, Loss: 0.9996150732040405
Epoch 312/1000, Loss: 0.984171986579895
Epoch 322/1000, Loss: 0.9808189868927002
Epoch 332/1000, Loss: 0.9775058627128601
Epoch 342/1000, Loss: 0.9651961326599121
Epoch 352/1000, Loss: 0.953350305557251
Epoch 362/1000, Loss: 0.9430199265480042
Epoch 372/1000, Loss: 0.9377722144126892
Epoch 382/1000, Loss: 0.9295996427536011
Epoch 392/1000, Loss: 0.9214633703231812
Epoch 402/1000, Loss: 0.9171395301818848
Epoch 412/1000, Loss: 0.9016039967536926
Epoch 422/1000, Loss: 0.881766140460968
Epoch 432/1000, Loss: 0.8796168565750122
Epoch 442/1000, Loss: 0.8762509822845459
Epoch 452/1000, Loss: 0.8636870980262756
Epoch 462/1000, Loss: 0.860791802406311
Epoch 472/1000, Loss: 0.8389173746109009
Epoch 482/1000, Loss: 0.8420767784118652
Epoch 492/1000, Loss: 0.830423891544342
Epoch 502/1000, Loss: 0.8228366374969482
Epoch 512/1000, Loss: 0.8186403512954712
Epoch 522/1000, Loss: 0.8139156103134155
Epoch 532/1000, Loss: 0.7973632216453552
Epoch 542/1000, Loss: 0.8030251264572144
Epoch 552/1000, Loss: 0.7787685990333557
Epoch 562/1000, Loss: 0.7840163707733154
Epoch 572/1000, Loss: 0.7958685159683228
Epoch 582/1000, Loss: 0.7602176666259766
Epoch 592/1000, Loss: 0.7602773308753967
Epoch 602/1000, Loss: 0.7595182657241821
Epoch 612/1000, Loss: 0.7394869327545166
Epoch 622/1000, Loss: 0.7507429122924805
Epoch 632/1000, Loss: 0.7416149377822876
Epoch 642/1000, Loss: 0.7401753067970276
Epoch 652/1000, Loss: 0.7358796000480652
Epoch 662/1000, Loss: 0.7172746062278748
Epoch 672/1000, Loss: 0.7284640073776245
Epoch 682/1000, Loss: 0.7057000398635864
Epoch 692/1000, Loss: 0.7158027291297913
Epoch 702/1000, Loss: 0.7134007215499878
Epoch 712/1000, Loss: 0.7041587829589844
Epoch 722/1000, Loss: 0.6904285550117493
Epoch 732/1000, Loss: 0.6939340829849243
Epoch 742/1000, Loss: 0.6793048977851868
Epoch 752/1000, Loss: 0.6911025047302246
Epoch 762/1000, Loss: 0.6908837556838989
Epoch 772/1000, Loss: 0.6817575693130493
Epoch 782/1000, Loss: 0.6719300746917725
Epoch 792/1000, Loss: 0.6639779806137085
Epoch 802/1000, Loss: 0.6610084176063538
Epoch 812/1000, Loss: 0.6704531311988831
Epoch 822/1000, Loss: 0.645722508430481
Epoch 832/1000, Loss: 0.6501308679580688
Epoch 842/1000, Loss: 0.6539092659950256
Epoch 852/1000, Loss: 0.6520346403121948
Epoch 862/1000, Loss: 0.6397005319595337
Epoch 872/1000, Loss: 0.6325197815895081
Epoch 882/1000, Loss: 0.6182982921600342
Epoch 892/1000, Loss: 0.6249979138374329
Epoch 902/1000, Loss: 0.6265581250190735
Epoch 912/1000, Loss: 0.6115624308586121
Epoch 922/1000, Loss: 0.6079530715942383
Epoch 932/1000, Loss: 0.6105392575263977
Epoch 942/1000, Loss: 0.6221534013748169
Epoch 952/1000, Loss: 0.6090223789215088
Epoch 962/1000, Loss: 0.613954484462738
Epoch 972/1000, Loss: 0.592529296875
Epoch 982/1000, Loss: 0.6049216389656067
Epoch 992/1000, Loss: 0.5998843908309937
Training complete!
Embedding is saved at ./embedding.tsv
[11]:
# Save the embeddings of the trained SpaceFlow model to adata.obsm['SpaceFlow'].
adata.obsm['SpaceFlow'] = embedding
[12]:
# Calculate the nearest neighbors in the 'SpaceFlow' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep= 'SpaceFlow')
sc.tl.umap(adata)
[23]:
# Perform a Leiden clustering.
sc.tl.leiden(adata, resolution= 0.3)
[24]:
# Plot a UMAP embedding.
sc.pl.umap(adata, color= 'leiden', color_map= cmp_pspace)
[25]:
# Display a spatial embedding plot with clustering information.
ax = sc.pl.embedding(adata, basis= 'spatial', color= 'leiden', show=False, color_map=cmp_pspace)
ax.axis('equal')
[25]:
(70.69998570611773, 6261.998413379076, -230.5226537216829, 6311.13445831407)