Protocols - Streamline development of spatial omics analysis tools using SODB!

_images/PYSODB.png

SOView

Installation

This tutorial demonstrates how to install Pysodb on SOVIew (a spatial omics visualization tool).

Reference tutorials can be found at https://soview-doc.readthedocs.io/en/latest/index.html and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda and Jupyter notebook in advance.

Reference tutorials presents how to install Visual Studio Code, Conda and Jupyter notebook,respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal and https://code.visualstudio.com/docs/datascience/data-science-tutorial.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation SOVIew

3. Select the installation path and open it
[ ]:
cd <path>
4. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
5. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
6. Clone SOView code
[ ]:
git clone https://github.com/yuanzhiyuan/SOView

If cloning the code fails through git, please download it at https://github.com/yuanzhiyuan/SOView, upload it to the folder created above, and extract it.

7. Open the SOView directory
[ ]:
cd SOView
8. Install SOView package from source code
[ ]:
pip install .

Installation Pysodb

Keep the conda environment active

9. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

10. Open the Pysodb directory
[ ]:
cd pysodb
11. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

10x

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = '10x'
experiment_name = 'V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix] in dataset[10x]
100%|██████████| 185M/185M [01:36<00:00, 2.01MB/s]
load experiment[V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix] in dataset[10x]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=30, # the marker size of the plot
    marker = 'h' # marker style
)

generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 335 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_10x_8_2.png
1.0 0.0
_images/SOView_10x_8_4.png

Biancalani2021Deep

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Biancalani2021Deep'
experiment_name = 'visium_fluo_crop'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[visium_fluo_crop] in dataset[Biancalani2021Deep]
100%|██████████| 66.1M/66.1M [00:34<00:00, 2.00MB/s]
load experiment[visium_fluo_crop] in dataset[Biancalani2021Deep]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=80, # the marker size of the plot
    marker = 'h' # marker style
)

generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 21 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_Biancalani2021Deep_8_2.png
1.0 0.0
_images/SOView_Biancalani2021Deep_8_4.png

Dataset5_MS_process

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Dataset5_MS_process'
experiment_name = 'Dataset5'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[Dataset5] in dataset[Dataset5_MS_process]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=3, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 143 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_Dataset5_MS_process_8_2.png
1.0 0.0
_images/SOView_Dataset5_MS_process_8_4.png

Fu2021Unsupervised

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Fu2021Unsupervised'
experiment_name = 'StereoSeq_MOB'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[StereoSeq_MOB] in dataset[Fu2021Unsupervised]
100%|██████████| 83.3M/83.3M [00:43<00:00, 2.01MB/s]
load experiment[StereoSeq_MOB] in dataset[Fu2021Unsupervised]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=1, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 766 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_Fu2021Unsupervised_8_2.png
1.0 0.0
_images/SOView_Fu2021Unsupervised_8_4.png

lohoff2021integration

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'lohoff2021integration'
experiment_name = 'lohoff2020highly_seqFISH_mouse_Gastrulation'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[lohoff2020highly_seqFISH_mouse_Gastrulation] in dataset[lohoff2021integration]
100%|██████████| 57.4M/57.4M [00:30<00:00, 2.00MB/s]
load experiment[lohoff2020highly_seqFISH_mouse_Gastrulation] in dataset[lohoff2021integration]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=0.5, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
_images/SOView_lohoff2021integration_8_1.png
1.0 0.0
_images/SOView_lohoff2021integration_8_3.png

maynard2021trans(151507)

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'maynard2021trans'
experiment_name = '151507'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[151507] in dataset[maynard2021trans]
100%|██████████| 116M/116M [01:03<00:00, 1.92MB/s]
load experiment[151507] in dataset[maynard2021trans]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=10, # the marker size of the plot
    marker = 'h' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 503 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_maynard2021trans%28151507%29_8_2.png
1.0 0.0
_images/SOView_maynard2021trans%28151507%29_8_4.png

maynard2021trans(151671)

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'maynard2021trans'
experiment_name = '151671'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[151671] in dataset[maynard2021trans]
100%|██████████| 129M/129M [01:07<00:00, 2.00MB/s]
load experiment[151671] in dataset[maynard2021trans]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=10, # the marker size of the plot
    marker = 'h' # marker style
)

generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 87 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_maynard2021trans%28151671%29_8_2.png
1.0 0.0
_images/SOView_maynard2021trans%28151671%29_8_4.png

maynard2021trans(151673)

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'maynard2021trans'
experiment_name = '151673'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[151673] in dataset[maynard2021trans]
100%|██████████| 131M/131M [01:08<00:00, 2.00MB/s]
load experiment[151673] in dataset[maynard2021trans]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[6]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=10, # the marker size of the plot
    marker = 'h' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 563 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_maynard2021trans%28151673%29_8_2.png
1.0 0.0
_images/SOView_maynard2021trans%28151673%29_8_4.png

Merfish_Visp

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Merfish_Visp'
experiment_name = 'mouse_VISp'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[mouse_VISp] in dataset[Merfish_Visp]
100%|██████████| 4.63M/4.63M [00:02<00:00, 1.97MB/s]
load experiment[mouse_VISp] in dataset[Merfish_Visp]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=3, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 385 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_Merfish_Visp_8_2.png
1.0 0.0
_images/SOView_Merfish_Visp_8_4.png

moncada2020integrating

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'moncada2020integrating'
experiment_name = 'GSM3036911_spatial_transcriptomics'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[GSM3036911_spatial_transcriptomics] in dataset[moncada2020integrating]
100%|██████████| 25.4M/25.4M [00:13<00:00, 2.00MB/s]
load experiment[GSM3036911_spatial_transcriptomics] in dataset[moncada2020integrating]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=50, # the marker size of the plot
    marker = 's' # marker style
)

generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 22 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_moncada2020integrating_8_2.png
1.0 0.0
_images/SOView_moncada2020integrating_8_4.png

stahl2016visualization

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'stahl2016visualization'
experiment_name = 'Rep4_MOB_trans'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[Rep4_MOB_trans] in dataset[stahl2016visualization]
100%|██████████| 12.7M/12.7M [00:06<00:00, 2.00MB/s]
load experiment[Rep4_MOB_trans] in dataset[stahl2016visualization]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=70, # the marker size of the plot
    marker = 's' # marker style
)

generating color coding...
1.0 0.0
_images/SOView_stahl2016visualization_8_1.png
1.0 0.0
_images/SOView_stahl2016visualization_8_3.png

stickels2020highly

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'stickels2020highly'
experiment_name = 'stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15] in dataset[stickels2020highly]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=1, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 7252 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_stickels2020highly_8_2.png
1.0 0.0
_images/SOView_stickels2020highly_8_4.png

Sun2021Integrating

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Sun2021Integrating'
experiment_name = 'Slice_1'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[Slice_1] in dataset[Sun2021Integrating]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=3, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
_images/SOView_Sun2021Integrating_8_1.png
1.0 0.0
_images/SOView_Sun2021Integrating_8_3.png

Wang2018Three_1k

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Wang2018Three_1k'
experiment_name = 'mouse_brain_STARmap'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[mouse_brain_STARmap] in dataset[Wang2018Three_1k]
100%|██████████| 3.27M/3.27M [00:01<00:00, 1.98MB/s]
load experiment[mouse_brain_STARmap] in dataset[Wang2018Three_1k]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by Pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=3, # the marker size of the plot
    marker = 'o' # marker style
)
generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 175 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_Wang2018Three_1k_8_2.png
1.0 0.0
_images/SOView_Wang2018Three_1k_8_4.png

Wang2018three

This tutorial demonstrates how to utilize SOVIew (a spatial omics visualization tool) to quickly visualize data fetched by Pysodb, helping users quickly determine what is the tissue organization of the data, and avoiding time-consuming preprocessing, clustering, etc.

load data using Pysodb

[1]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[2]:
# Initialize the sodb object
sodb = pysodb.SODB()
[3]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Wang2018three'
experiment_name = 'data_3D'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
download experiment[data_3D] in dataset[Wang2018three]
100%|██████████| 29.1M/29.1M [00:15<00:00, 2.00MB/s]
load experiment[data_3D] in dataset[Wang2018three]

plot SOView

[4]:
# Import SOView package
import SOView
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.22.4 scipy==1.7.3 pandas==1.5.3 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.1.2
[5]:
# Visualize data fetched by pysodb by utilizing SOVIew
SOView.SOViewer_plot(
    adata = adata, # the data to plot
    #save = f'figures/{dataset_name}_{experiment_name}', # save the result to specified path or don't save (None)
    embedding_use='X_umap', # which embedding to be used for plot
    dot_size=1, # the marker size of the plot
    marker = 'o' # marker style
)

generating color coding...
1.0 0.0
/home/linsenlin/anaconda3/envs/SOView/lib/python3.8/site-packages/skimage/color/colorconv.py:1109: UserWarning: Color data out of range: Z < 0 in 2459 pixels
  return xyz2rgb(lab2xyz(lab, illuminant, observer))
_images/SOView_Wang2018three_8_2.png
1.0 0.0
_images/SOView_Wang2018three_8_4.png

Spatially variable gene detection

Installation

This tutorial demonstrates how to install Pysodb on a method which use for identifying spatially variable gene.

Using sepal as an example, install Pysodb in its installation environment.

Reference tutorials presents can be found at https://github.com/almaan/sepal and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda and Jupyter notebook in advance.

Reference tutorials presents how to install Visual Studio Code, Conda and Jupyter notebook,respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal and https://code.visualstudio.com/docs/datascience/data-science-tutorial.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation sepal

3. Select the installation path and open it
[ ]:
cd <path>
4. Clone sepal code
[ ]:
git clone https://github.com/almaan/sepal.git

If cloning the code fails through git, please download it at https://github.com/almaan/sepal, upload it to the folder created above, and extract it.

5. Open the sepal directory
[ ]:
cd sepal
6. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
7. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
8. Install sepal package from source code
[ ]:
chmod +x setup.py
[ ]:
./setup.py install

Referring to the sepal’s installation instructions, a command is as follows: “chmod +x setup.py” and “./setup.py install”. Considering that some devices are not Linux-based systems, an alternative solution is to replace the execution commands with “python setup.py install”.

Recommended packages need to be installed if use a analysis modules. To do this, simply (in the same directory) run:

[ ]:
pip install -e ".[full]"

Installation Pysodb

Keep the conda environment active

9. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

10. Open the Pysodb directory
[ ]:
cd pysodb
11. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

Reproducibility with original data

This tutorial demonstrates spatially variable gene detection on ST mouse olfactory bulb data using Pysodb and Sepal.

A reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120.

This tutorial refers to the following tutorial at https://github.com/almaan/sepal/blob/master/examples/melanoma.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Import several Python packages commonly used in data analysis and visualization.
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
[2]:
# Import sepal package and its modules
import sepal
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut
import sepal.family as family
import sepal.enrich as fea

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Define names of the dataset_name and experiment_name
dataset_name = 'stahl2016visualization'
experiment_name = 'Rep4_MOB_trans'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[Rep4_MOB_trans] in dataset[stahl2016visualization]
[6]:
# Save the AnnData object to an H5AD file format.
adata.write_h5ad('MOB_pysodb.h5ad')

Perform Sepal for spatially variable gene detection

[7]:
# Load in the raw data using a RawData class.
raw_data = d.RawData('MOB_pysodb.h5ad')
[8]:
# Filter genes observed in less than 5 spots and/or less than 10 total observations
raw_data.cnt = ut.filter_genes(raw_data.cnt,
                               min_expr=10,
                               min_occur=5)
[9]:
# A subclass of the CountData class to hold ST1k array based data using a ST1K class
data = m.ST1K(raw_data,
              eps = 0.1)
[10]:
data.cnt.shape
[10]:
(264, 10869)
[11]:
# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
                    normalize = True,
                    scale =True)
[INFO] : Using 128 workers
[INFO] : Saturated Spots : 199
100%|██████████| 10869/10869 [00:29<00:00, 372.19it/s]
[12]:
# Selects the top 20 and bottom 20 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 20
n_top = 20
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 20 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 20 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 20 profiles from the times DataFrame
times.loc[top_profiles,:]
[12]:
average
Rbfox1 1.000000
Gpsm1 0.832367
Prkca 0.806763
Penk 0.796135
Tyro3 0.786957
Rbfox3 0.763285
Pcp4 0.733333
Cacng3 0.733333
Omp 0.732850
Kcnh3 0.723188
Grin1 0.694686
Nrgn 0.671981
Agap2 0.658937
S100a5 0.654589
Tshz1 0.647343
Cpne4 0.644444
Map2k1 0.643961
Camk4 0.642512
Gria3 0.623188
Sez6 0.602899
[13]:
# Inspect detecition visually by using the "plot_profiles function for first 20 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 100,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'o',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[top_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )

_images/Spatially_variable_gene_detection_Reproducibility_with_original_data_17_0.png
[14]:
# Inspect detecition visually by using the "plot_profiles function for last 20 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 100,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'o',
                )
# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[tail_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )

_images/Spatially_variable_gene_detection_Reproducibility_with_original_data_18_0.png

Application with new data

This tutorial demonstrates spatially variable gene detection on 10X Visium mouse brain data using Pysodb and Sepal.

A reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120.

This tutorial refers to the following tutorial at https://github.com/almaan/sepal/blob/master/examples/melanoma.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Import several Python packages commonly used in data analysis and visualization.
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
[2]:
# Import sepal package and its modules
import sepal
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut
import sepal.family as family
import sepal.enrich as fea

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Define names of the dataset_name and experiment_name
dataset_name = '10x'
experiment_name = 'V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix] in dataset[10x]
[6]:
# Save the AnnData object to an H5AD file format.
adata.write_h5ad('Visium_pysodb.h5ad')

Perform Sepal for spatially variable gene detection

[7]:
# Load in the raw data using a RawData class.
raw_data = d.RawData('Visium_pysodb.h5ad')
[8]:
# Filter genes observed in less than 5 spots and/or less than 10 total observations
raw_data.cnt = ut.filter_genes(raw_data.cnt,
                               min_expr=10,
                               min_occur=5)
[9]:
# A subclass of CountData class to hold Visium array based data using a VisiumData class
data = m.VisiumData(raw_data,
              eps = 0.1)
[10]:
# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
                    normalize = True,
                    scale =True)
[INFO] : Using 128 workers
[INFO] : Saturated Spots : 3095
100%|██████████| 16278/16278 [01:09<00:00, 234.29it/s]
[11]:
# Selects the top 20 and bottom 20 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 20
n_top = 20
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 20 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 20 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 20 profiles from the times DataFrame
times.loc[top_profiles,:]
[11]:
average
Calb1 1.000000
Prkcg 0.960986
Gfap 0.946612
Apod 0.919918
Hpca 0.919918
Sst 0.915811
Car8 0.899384
Itpka 0.893224
Mgp 0.891170
Itpr1 0.887064
Dner 0.874743
Hbb-bt 0.872690
Gad1 0.862423
Igfbp2 0.856263
Vim 0.850103
Pcp4 0.841889
Nefm 0.835729
Gria1 0.833676
Fam107a 0.825462
Hba-a2 0.821355
[12]:
# Inspect detecition visually by using the "plot_profiles function for first 20 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 10,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[top_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )
_images/Spatially_variable_gene_detection_Application_with_new_data_16_0.png
[13]:
# Inspect detecition visually by using the "plot_profiles function for last 20 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 10,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[tail_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )
_images/Spatially_variable_gene_detection_Application_with_new_data_17_0.png

Spatial clustering

Installation

This tutorial demonstrates how to install Pysodb on a method which use for spatial clustering.

Using stagate as an example, install Pysodb in its installation environment.

Reference tutorials presents can be found at https://github.com/QIFEIDKN/STAGATE_pyG and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda, Jupyter notebook and CUDA in advance.

Reference tutorials presents how to install Visual Studio Code, Conda and Jupyter notebook, respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal and https://code.visualstudio.com/docs/datascience/data-science-tutorial.

STAGATE’s tensorflow version is based on version 1.15.0. Officials have stopped updating TensorFlow1 and now many algorithms are more developed based on TensorFlow2, STAGATE_pyG will be selected and demonstrated for installation in this tutorial.

Since STAGATE_pyG is based on pyG (PyTorch Geometric) framework, it requires GPUs on a device and need to install CUDA tool. A tutorials is available at https://developer.nvidia.com/cuda-downloads.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation STAGATE_pyG

3. Select the installation path and open it
[ ]:
cd <path>
4. Clone STAGATE_pyG code
[ ]:
git clone https://github.com/QIFEIDKN/STAGATE_pyG.git

If cloning the code fails through git, please download it at https://github.com/QIFEIDKN/STAGATE_pyG, upload it to the folder created above, and extract it.

5. Open the STAGATE_pyG directory
[ ]:
cd STAGATE_pyG
6. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
7. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
8. Install torch with CUDA

eg. pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/cu117/torch_stable.html

[ ]:
pip install torch==<torch_version>+<cuda_version> -f https://download.pytorch.org/whl/<cuda_version>/torch_stable.html
9. Install PyTorch Geometric

Install a PyTorch Geometric according to the version of pytorch and CUDA, system. Select the package and copy a run command to run on your terminal. https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

eg. pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv torch_geometric -f https://data.pyg.org/whl/torch-1.13.0+cu117.html

10. Install other python packages
[ ]:
pip install -r requirement.txt

Installation mclust library in STAGATE_pyG conda environment

STAGATE uses the mclust package to identify spatial domains, which is an R library.

The next tutorial will show how to install the mclust package using r and rpy2 for python in a conda environment.

Activate by running the command “conda activate “

11. Install r and rpy2 in in a conda environment
[ ]:
conda install -c r rpy2

Run following command on the terminal to set the R_HOME environment variable to the location of the R executable within a conda environment.

[ ]:
export R_HOME=/home/<user_name>/anaconda3/envs/<environment_name>/lib/R

Run following command on the terminal to set the R_LIBS_USER environment variable to where R will install any packages that are installed within this conda environment.

[ ]:
export R_LIBS_USER=/home/<user_name>/anaconda3/envs/<environment_name>/lib/R/library

When configuring the installation paths for R_HOME and R_LIBS_USER, the standard Linux and Conda environment paths (such as .conda) may not be applicable. In such cases, users need to locate the R path and library path within the Conda virtual environment’s Python packages and replace them accordingly.

12. Enter the python interpreter to install mclust library

Run “python” at the terminal to enter the python interpreter and install mclust library in the python interpreter

[ ]:
python

Run the following commands in sequence:

[ ]:
import rpy2.robjects.packages as r
utils = r.importr("utils")
package_name = "mclust"
utils.install_packages(package_name)

Then, select a loading channel to install mclust.

Finally, exit the python interpreter.

[ ]:
exit()

Installation Pysodb

Keep the conda environment active

13. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

14. Open the Pysodb directory
[ ]:
cd pysodb
15. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

Reproducibility with original data

This tutorial demonstrates how to identify spatial domains on 10x Visium human dorsolateral prefrontal cortex data using Pysodb and STAGATE based on pyG (PyTorch Geometric) framework.

A reference paper can be found at https://www.nature.com/articles/s41467-022-29439-6.

This tutorial refers to the following tutorial at https://stagate.readthedocs.io/en/latest/T1_DLPFC.html. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Use Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several Python packages commonly used in data analysis and visualization:
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
# os is a package for interacting with the operating system, such as reading or writing files
import os
[3]:
# Import the adjusted_rand_score function from the sklearn.metrics.cluster module of the scikit-learn library.
from sklearn.metrics.cluster import adjusted_rand_score
[4]:
# Import STAGATE_pyG package
import STAGATE_pyG

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[5]:
# The location of R (used for the mclust clustering)
os.environ['R_HOME'] = '/home/<usr_name>/anaconda3/envs/<environment_name>/lib/R/'
os.environ['R_USER'] = '/home/<usr_name>/anaconda3/envs/<environment_name>/lib/R/rpy2'

When configuring the installation paths for R_HOME and R_LIBS_USER, the standard Linux and Conda environment paths (such as .conda) may not be applicable. In such cases, users need to locate the R path and library path within the Conda virtual environment’s Python packages and replace them accordingly.If the commands has been configured during the installation, users do not need to run the preceding commands.

Streamline development of loading spatial data with Pysodb

[6]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[7]:
# Initialization
sodb = pysodb.SODB()
[8]:
# Define names of the dataset_name and experiment_name
dataset_name = 'maynard2021trans'
experiment_name = '151676'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[151676] in dataset[maynard2021trans]
[9]:
adata
[9]:
AnnData object with n_obs × n_vars = 3460 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
    var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

Data preprocessing

[10]:
# Ensure that the variable names (i.e., the column names in the .var DataFrame) are unique
adata.var_names_make_unique()
[11]:
# Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
WARNING: adata.X seems to be already log-transformed.

When encountering the Error “Please install skmisc package via ‘pip install –user scikit-misc’”, users can follow the provided instructions: activate the virtual environment at the terminal, execute “pip install –user scikit-misc”, and then restart the kernel to ensure the package is properly installed and available for use.

Constructing the spatial network

[12]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 150.
STAGATE_pyG.Cal_Spatial_Net(adata, rad_cutoff=150)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information.
STAGATE_pyG.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 20052 edges, 3460 cells.
5.7954 neighbors per cell on average.
_images/Spatial_clustering_Reproducibility_with_original_data_20_1.png

Running STAGATE

[13]:
# Train a STAGATE model
adata = STAGATE_pyG.train_STAGATE(adata)
Size of Input:  (3460, 3000)
100%|██████████| 1000/1000 [00:06<00:00, 145.57it/s]

Spatial Clustering

[14]:
# Calculate the nearest neighbors in the 'STAGATE' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[15]:
# Use Mclust_R to cluster cells in the 'STAGATE' representation into 7 clusters.
adata = STAGATE_pyG.mclust_R(adata, used_obsm='STAGATE', num_cluster=7)
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.0.0
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
[16]:
# Compute the adjusted rand index (ARI) between the 'mclust' and the 'Ground Truth'.
obs_df = adata.obs.dropna()
ARI = adjusted_rand_score(obs_df['mclust'], obs_df['Region'])
print('Adjusted rand index = %.2f' %ARI)
Adjusted rand index = 0.62
[17]:
# Generate a plot of the UMAP embedding colored by both a mclust and a ground truth.
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color=["mclust", "Region"], title=['STAGATE (ARI=%.2f)'%ARI, "Region"])
_images/Spatial_clustering_Reproducibility_with_original_data_27_0.png
[18]:
# Visualize the result using mclust and Ground Truth.
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.spatial(adata, color=["mclust", "Region"], title=['STAGATE (ARI=%.2f)'%ARI, "Region"])
_images/Spatial_clustering_Reproducibility_with_original_data_28_0.png

Spatial trajectory inference (PAGA)

[19]:
"""
used_adata = adata[adata.obs['Ground Truth']!='nan']
used_adata
"""
[19]:
"\nused_adata = adata[adata.obs['Ground Truth']!='nan']\nused_adata\n"
[20]:
# Exclude any cells with missing values in the 'Ground Truth' column of the observation metadata.
used_adata = adata[pd.notna(adata.obs['Region'])]
used_adata
[20]:
View of AnnData object with n_obs × n_vars = 3431 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden', 'mclust'
    var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_rank', 'variances', 'variances_norm'
    uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap', 'Spatial_Net', 'mclust_colors', 'Region_colors'
    obsm: 'X_pca', 'X_umap', 'spatial', 'STAGATE'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[21]:
# Use PAGA to infer differentiation trajectories.
sc.tl.paga(used_adata, groups='Region')
[22]:
# Compare partition-based graph abstraction (PAGA) results.
plt.rcParams["figure.figsize"] = (4,3)
sc.pl.paga_compare(used_adata, legend_fontsize=10, frameon=False, size=20,
                   title=experiment_name+'_STGATE', legend_fontoutline=2, show=False)
[22]:
[<Axes: xlabel='UMAP1', ylabel='UMAP2'>, <Axes: >]
_images/Spatial_clustering_Reproducibility_with_original_data_33_1.png

Application with new data

This tutorial demonstrates how to identify spatial domains and how to train STAGATE in batches on new EEL FISH mouse brain data, as well as using Pysodb and STAGATE based on pyG (PyTorch Geometric) framework.

Since the EEL FISH mouse brain has 127591 cells, each with 440 genes, this tutorial presents how to download data using Pysodb and also uses a batch training strategy to deal with large-scale data.

A tutorial of a batch training strategy can be found at https://stagate.readthedocs.io/en/latest/T8_Batch.html

Strategies for dividing subgraphs

Because we build the spatial network based on spatial location, our network can be directly divided into subgraphs in the following form.

图片1.png

The above picture is an example of num_batch_x=3, num_batch_y=2. Specifically, we divide subgraphs according to quantiles on a single spatial coordinate.

Import packages and set configurations

[1]:
# Import several Python packages commonly used in data analysis and visualization:
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
# tqdm is a package for creating progress bars in Python, which is useful for tracking the progress of long-running loops or operations.
from tqdm import tqdm
[2]:
# Import PyTorch
import torch
# Import torch.nn.functional module
import torch.nn.functional as F
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[3]:
# Import STAGATE_pyG package
import STAGATE_pyG

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[4]:
# Set the PyTorch device  either available GPU or the CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb

[6]:
# Initialization
sodb = pysodb.SODB()
[7]:
# Define the name of the dataset_name and experiment_name
dataset_name = 'Borm2022Scalable'
experiment_name = 'mouse_brain'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[mouse_brain] in dataset[Borm2022Scalable]
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 127591 × 440
    obs: 'Clusters', 'TotalMolecules', 'X', 'X_um', 'Y', 'Y_um', 'leiden'
    var: 'GeneTotal'
    uns: 'Age', 'Clusters_colors', 'Codebook', 'ColorDict', 'CreationDate', 'Cycles', 'Expansion', 'Expansion_um', 'Experiment', 'ExperimentDate', 'FOVoverlapPercentage', 'GenerationDate', 'Joke', 'LOOM_SPEC_VERSION', 'MaxHammingDist', 'Operator', 'Orientation', 'Probes', 'Protocol', 'Quality', 'RNAfile', 'Removal', 'Sample', 'Segmentation', 'Species', 'Stitching', 'StitchingChannel', 'Strain', 'System', 'Tissue', 'TotalMolecules', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
    obsm: 'RGB', 'X_pca', 'X_umap', 'spatial', 'tSNE'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[9]:
# Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
WARNING: adata.X seems to be already log-transformed.

When encountering the Error “Please install skmisc package via ‘pip install –user scikit-misc’”, users can follow the provided instructions: activate the virtual environment at the terminal, execute “pip install –user scikit-misc”, and then restart the kernel to ensure the package is properly installed and available for use.

Dividing subgraphs

[10]:
# Assign spatial coordinates to adata.obs['X_coor'] and adata.obs['Y_coor']
adata.obs['X_coor'] = adata.obsm['spatial'][:,0]
adata.obs['Y_coor'] = adata.obsm['spatial'][:,1]
[11]:
# Grid setting
num_batch_x = 4
num_batch_y = 4
[12]:
# The adata is divided into multiple subgraphs based on the coordinate partitioning and consolidated into a batch list
Batch_list = STAGATE_pyG.Batch_Data(adata, num_batch_x=num_batch_x, num_batch_y=num_batch_y,
                                    spatial_key=['X_coor', 'Y_coor'], plot_Stats=True)
_images/Spatial_clustering_Application_with_new_data_22_0.png

Constructing the spatial network

[13]:
# Consturcting network for each batch
for temp_adata in Batch_list:
    STAGATE_pyG.Cal_Spatial_Net(temp_adata, rad_cutoff=80)
    STAGATE_pyG.Stats_Spatial_Net(temp_adata)
------Calculating spatial graph...
The graph contains 495224 edges, 11834 cells.
41.8476 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 545572 edges, 11937 cells.
45.7043 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 193628 edges, 6004 cells.
32.2498 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 53426 edges, 2123 cells.
25.1653 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 305286 edges, 9155 cells.
33.3464 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 198074 edges, 5830 cells.
33.9750 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 287706 edges, 7038 cells.
40.8789 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 347420 edges, 9875 cells.
35.1818 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 169026 edges, 4434 cells.
38.1204 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 257098 edges, 6948 cells.
37.0032 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 348072 edges, 8222 cells.
42.3342 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 563444 edges, 12294 cells.
45.8308 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 258650 edges, 6475 cells.
39.9459 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 367994 edges, 7183 cells.
51.2312 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 831018 edges, 10635 cells.
78.1399 neighbors per cell on average.
------Calculating spatial graph...
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
The graph contains 614454 edges, 7606 cells.
80.7854 neighbors per cell on average.
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value
_images/Spatial_clustering_Application_with_new_data_24_32.png
_images/Spatial_clustering_Application_with_new_data_24_33.png
_images/Spatial_clustering_Application_with_new_data_24_34.png
_images/Spatial_clustering_Application_with_new_data_24_35.png
_images/Spatial_clustering_Application_with_new_data_24_36.png
_images/Spatial_clustering_Application_with_new_data_24_37.png
_images/Spatial_clustering_Application_with_new_data_24_38.png
_images/Spatial_clustering_Application_with_new_data_24_39.png
_images/Spatial_clustering_Application_with_new_data_24_40.png
_images/Spatial_clustering_Application_with_new_data_24_41.png
_images/Spatial_clustering_Application_with_new_data_24_42.png
_images/Spatial_clustering_Application_with_new_data_24_43.png
_images/Spatial_clustering_Application_with_new_data_24_44.png
_images/Spatial_clustering_Application_with_new_data_24_45.png
_images/Spatial_clustering_Application_with_new_data_24_46.png
_images/Spatial_clustering_Application_with_new_data_24_47.png
[14]:
# Convert each subgraph in the Batch_list to a tensor used for training through the Transfer_pytorch_Data function, and collect them into a data list
data_list = [STAGATE_pyG.Transfer_pytorch_Data(adata) for adata in Batch_list]
for temp in data_list:
    temp.to(device)
[15]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 80.
STAGATE_pyG.Cal_Spatial_Net(adata, rad_cutoff=80)
------Calculating spatial graph...
The graph contains 6019678 edges, 127591 cells.
47.1795 neighbors per cell on average.
[16]:
# Convert the entire adata to a tensor used for evaluation
data = STAGATE_pyG.Transfer_pytorch_Data(adata)

DataLoader for bathces

[17]:
# Create a PyTorch Geometric DataLoader object for loading and processing graph data
from torch_geometric.loader import DataLoader

# Create a PyTorch DataLoader object named "loader" that will iterate through the "data_list"
# batch_size=1 or 2
loader = DataLoader(data_list, batch_size=1, shuffle=True)

Running STAGATE

[18]:
# Hyper-parameters setting
num_epoch = 500
lr=0.001
weight_decay=1e-4
hidden_dims = [512, 30]
[19]:
# Train a STAGATE model
model = STAGATE_pyG.STAGATE(hidden_dims = [data_list[0].x.shape[1]]+hidden_dims).to(device)

# Initializes an Adam optimizer with learning rate "lr" and weight decay "weight_decay" that will be used to update the parameters of the PyTorch model.
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
[20]:
# Train a PyTorch model using mini-batch stochastic gradient descent with the Adam optimizer by iterating over the data loader "loader" for "num_epoch" epochs, computing the MSE loss between the predicted output "out" and the ground truth "batch.x", clipping the gradients to avoid exploding gradients, and updating the model parameters using the optimizer's step function.
for epoch in tqdm(range(1, num_epoch+1)):
    for batch in loader:
        model.train()
        optimizer.zero_grad()
        z, out = model(batch.x, batch.edge_index)
        loss = F.mse_loss(batch.x, out) #F.nll_loss(out[data.train_mask], data.y[data.train_mask])
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 5.)
        optimizer.step()
100%|██████████| 500/500 [03:07<00:00,  2.67it/s]
[21]:
# The total network
data.to(device)
[21]:
Data(x=[127591, 440], edge_index=[2, 6147269])
[22]:
# Set the model in evaluation mode
model.eval()
# Get the hidden representation and output of the model
z, out = model(data.x, data.edge_index)

# Transfer a tensor from GPU to CPU and convert it to a numpy array
# Then assigning it to a specific key "STAGATE_rep" in adata.obsm['STAGATE']
STAGATE_rep = z.to('cpu').detach().numpy()
adata.obsm['STAGATE'] = STAGATE_rep

Spatial Clustering

[23]:
# Calculate the nearest neighbors in the 'STAGATE' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[24]:
# Use Mclust_R to cluster cells in the 'STAGATE' representation into 13 clusters.
adata = STAGATE_pyG.mclust_R(adata, used_obsm='STAGATE', num_cluster=13)
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.0.0
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
[25]:
# Display a spatial embedding plot with cell clustering information
ax = sc.pl.embedding(adata,basis='spatial',color=['mclust',],show=False,)
ax.axis('equal')
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
[25]:
(-611.2128013610841, 15985.261156463623, 72.22414855957027, 7865.117953491211)
_images/Spatial_clustering_Application_with_new_data_39_2.png
[26]:
# Display a UMAP plot colored by the mclust cluster
sc.pl.umap(adata, color='mclust')
/home/linsenlin/anaconda3/envs/stagate_sodb/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
_images/Spatial_clustering_Application_with_new_data_40_1.png

Application with new data (3D)

This tutorial demonstrates spatial clustering on 3D mouse visual cortex data using Pysodb.

A reference paper can be found at https://www-science-org.ezproxy.is.ed.ac.uk/doi/full/10.1126/science.aat5691.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt

[3]:
# Import STAGATE_pyG package
import STAGATE_pyG as STAGATE

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[4]:
# Imports a palettable package
import palettable
# Define three color maps, a diverging color map (cmp_pspace), and two qualitative color maps (cmp_domain and cmp_ct)
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialization
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Wang2018three'
experiment_name = 'data_3D'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[data_3D] in dataset[Wang2018three]

Data processing

[8]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

When encountering the Error “Please install skmisc package via ‘pip install –user scikit-misc’”, users can follow the provided instructions: activate the virtual environment at the terminal, execute “pip install –user scikit-misc”, and then restart the kernel to ensure the package is properly installed and available for use.

[9]:
adata.obsm['spatial_3D'] = adata.obs[['x','y','z']].values

Constructing the spatial network with diverse rad_cutoff

[10]:
# Define a list with different radius cutoff values
rad_cutoff_list = [15,20]
# Consturcting network for different radius cutoff
for rad_cutoff in rad_cutoff_list:
    STAGATE.Cal_Spatial_Net(adata, rad_cutoff=rad_cutoff)
    STAGATE.Stats_Spatial_Net(adata)
    adata = STAGATE.train_STAGATE(adata)
    adata.obsm[f'STAGATE_rad{rad_cutoff}'] = adata.obsm['STAGATE'].copy()
# Save the adata object
adata.write_h5ad('3D_spatialnet_h5ad')
------Calculating spatial graph...
The graph contains 369952 edges, 32845 cells.
11.2636 neighbors per cell on average.
Size of Input:  (32845, 28)
100%|██████████| 1000/1000 [00:27<00:00, 35.91it/s]
------Calculating spatial graph...
The graph contains 649580 edges, 32845 cells.
19.7771 neighbors per cell on average.
Size of Input:  (32845, 28)
100%|██████████| 1000/1000 [00:43<00:00, 22.99it/s]
_images/Spatial_clustering_Application_with_new_data_%283D%29_18_4.png
_images/Spatial_clustering_Application_with_new_data_%283D%29_18_5.png

Clustering and UMAP

rad_cutoff=15
[11]:
# Calculate the nearest neighbors in the 'STAGATE_rad15' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep='STAGATE_rad15')
sc.tl.umap(adata)
# Perform a Leiden clustering with 'STAGATE_rad15' representation, and add a key_added as 'leiden_0.2'
sc.tl.leiden(adata,resolution=0.2,key_added='leiden_0.2')

When encountering the Error “Please install the leiden algorithm: ‘conda install -c conda-forge leidenalg’ or ‘pip3 install leidenalg’”, users can follow the provided instructions: activate the virtual environment at the terminal, execute “pip3 install leidenalg”.

[12]:
# Plot a UMAP embedding ('STAGATE_rad15') colored 'leiden_0.2'
sc.pl.umap(adata, color='leiden_0.2')
_images/Spatial_clustering_Application_with_new_data_%283D%29_23_0.png
[13]:
# Perform mclust clustering with 'STAGATE_rad15' representation, and set 'num_cluster' to 7
adata = STAGATE.mclust_R(adata, used_obsm='STAGATE_rad15', num_cluster=7)
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.0.0
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
[14]:
# Plot a UMAP embedding ('STAGATE_rad15') colored 'mclust'
sc.pl.umap(adata, color='mclust')
_images/Spatial_clustering_Application_with_new_data_%283D%29_25_0.png
[15]:
# Plot a spatial distribution ('STAGATE_rad15') base on 'spatial_3D' with 'mclust' clustering
ax = sc.pl.embedding(adata, basis='spatial_3D',color='mclust',show=False)
# Ensure a consistent aspect ratio along all three dimensions
ax.axis('equal')
[15]:
(-78.95, 1899.95, -64.7, 1600.7)
_images/Spatial_clustering_Application_with_new_data_%283D%29_26_1.png
rad_cutoff=20
[16]:
# Calculate the nearest neighbors in the 'STAGATE_rad15' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep='STAGATE_rad20')
sc.tl.umap(adata)
# Perform a Leiden clustering with 'STAGATE_rad20' representation, and add a key_added as 'leiden_0.2'
sc.tl.leiden(adata,resolution=0.2,key_added='leiden_0.2')
[17]:
# Plot a UMAP embedding ('STAGATE_rad20') colored 'leiden_0.2'
sc.pl.umap(adata, color='leiden_0.2')
_images/Spatial_clustering_Application_with_new_data_%283D%29_29_0.png
[18]:
# Perform mclust clustering with 'STAGATE_rad20' representation, and set 'num_cluster' to 9
adata = STAGATE.mclust_R(adata, used_obsm='STAGATE_rad20', num_cluster=9)
fitting ...
  |======================================================================| 100%
[19]:
# Plot a UMAP embedding ('STAGATE_rad20') colored 'mclust'
sc.pl.umap(adata, color='mclust',palette=cmp_domain)
#plt.savefig('../figures/spatialclustering/3D_umap.png',dpi=400,transparent=True,bbox_inches='tight')
_images/Spatial_clustering_Application_with_new_data_%283D%29_31_0.png
[20]:
# Plot a spatial distribution ('STAGATE_rad20') base on 'spatial_3D' with 'mclust' clustering
ax = sc.pl.embedding(adata, basis='spatial_3D',color='mclust',palette=cmp_domain,show=False)
# Ensure a consistent aspect ratio along all three dimensions
ax.axis('equal')
#plt.savefig('../figures/spatialclustering/3D_spatial.png',dpi=400,transparent=True,bbox_inches='tight')
[20]:
(-78.95, 1899.95, -64.7, 1600.7)
_images/Spatial_clustering_Application_with_new_data_%283D%29_32_1.png
[21]:
# Plot a spatial distribution ('STAGATE_rad20') base on 'spatial_3D' with 'mclust' clustering to project in a 3D space
ax = sc.pl.embedding(adata, basis='spatial_3D',color='mclust', projection='3d',palette=cmp_domain,show=False)
# Ensure a consistent aspect ratio along all three dimensions
ax.axis('equal')
[21]:
(-78.95000000000016,
 1899.9500000000003,
 -221.45000000000016,
 1757.4500000000003)
_images/Spatial_clustering_Application_with_new_data_%283D%29_33_1.png
[22]:
# Plot a spatial distribution ('STAGATE_rad20') base on 'spatial_3D' with 'mclust' clustering for for enhanced visualization
sc.pl.embedding(adata, basis='spatial_3D',color='mclust', projection='3d',palette=cmp_domain,show=False)
[22]:
<Axes3D: title={'center': 'mclust'}, xlabel='spatial_3D1', ylabel='spatial_3D2', zlabel='spatial_3D3'>
_images/Spatial_clustering_Application_with_new_data_%283D%29_34_1.png
[23]:
# Save the result object
adata.write_h5ad('3D_mclust.h5ad')

Pseudo-spatiotemporal analysis

Installation

This tutorial demonstrates how to install Pysodb on a method which use for pseudo-spatiotemporal analysis.

Using spaceflow as an example, install Pysodb in its installation environment.

Reference tutorials presents can be found at https://github.com/hongleir/SpaceFlow and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda, Jupyter notebook, and CUDA in advance.

Reference tutorials presents how to install Visual Studio Code, Conda, Jupyter notebook and CUDA, respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal, https://code.visualstudio.com/docs/datascience/data-science-tutorial and https://developer.nvidia.com/cuda-downloads.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation SpaceFlow

3. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
4. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
5. Install torch with CUDA

eg. pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/cu117/torch_stable.html

[ ]:
pip install torch==<torch_version>+<cuda_version> -f https://download.pytorch.org/whl/<cuda_version>/torch_stable.html
6. Install torch_scatter, torch_sparse and torch_cluster

Install a Torch_scatter, Torch_sparse and torch_cluster according to the version of pytorch and CUDA, system. Select the package and copy a run command to run on your terminal. https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

eg. pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.0+cu117.html

7. Install SpaceFlow package
[ ]:
pip install SpaceFlow
8. Install palettable package
[ ]:
pip install palettable

Installation Pysodb

Keep the conda environment active

9. Select the installation path and open it
[ ]:
cd <path>
10. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

11. Open the Pysodb directory
[ ]:
cd pysodb
12. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

Reproducibility with original data (seqFISH)

This tutorial demonstrates how to pseudo-spatiotemporal analysis on seqFISH mouse embryo data using Pysodb and SpaceFlow.

A reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w.

This tutorial refers to the following tutorial at https://github.com/hongleir/SpaceFlow/blob/master/tutorials/seqfish_mouse_embryogenesis.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several python packages commonly used in data analysis and visualization.
# numpy (imported as np) is a package for numerical computing with arrays.
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis.
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization.
#import matplotlib.pyplot as plt
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'lohoff2021integration'
experiment_name = 'lohoff2020highly_seqFISH_mouse_Gastrulation'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
#%%time
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[lohoff2020highly_seqFISH_mouse_Gastrulation] in dataset[lohoff2021integration]
[8]:
# Filter out genes
sc.pp.filter_genes(adata, min_cells=3)

Perform SpaceFlow for pseudo-spatiotemporal analysis

[9]:
# Create SpaceFlow Object
#%%time
sf = SpaceFlow.SpaceFlow(
    count_matrix=adata.X,
    spatial_locs=adata.obsm['spatial'],
    sample_names=adata.obs_names,
    gene_names=adata.var_names
)

When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().”, in the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value. respectively.

[10]:
# Preprocess data
#%%time
sf.preprocessing_data(n_top_genes=3000)

When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”,in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.

When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.

[11]:
# Train a deep graph network model
#%%time
sf.train(
    spatial_regularization_strength=0.1,
    z_dim=50,
    lr=1e-3,
    epochs=1000,
    max_patience=50,
    min_stop=100,
    random_seed=42,
    gpu=0,
    regularization_acceleration=True,
    edge_subset_sz=1000000
)


Epoch 2/1000, Loss: 1.4433507919311523
Epoch 12/1000, Loss: 1.1957048177719116
Epoch 22/1000, Loss: 0.899651050567627
Epoch 32/1000, Loss: 0.5220355987548828
Epoch 42/1000, Loss: 0.25792792439460754
Epoch 52/1000, Loss: 0.15247754752635956
Epoch 62/1000, Loss: 0.11323283612728119
Epoch 72/1000, Loss: 0.09924617409706116
Epoch 82/1000, Loss: 0.08421587944030762
Epoch 92/1000, Loss: 0.08441664278507233
Epoch 102/1000, Loss: 0.07488397508859634
Epoch 112/1000, Loss: 0.07958334684371948
Epoch 122/1000, Loss: 0.07067480683326721
Epoch 132/1000, Loss: 0.07085031270980835
Epoch 142/1000, Loss: 0.06988634169101715
Epoch 152/1000, Loss: 0.06983151286840439
Epoch 162/1000, Loss: 0.07103468477725983
Epoch 172/1000, Loss: 0.06584274768829346
Epoch 182/1000, Loss: 0.06390106678009033
Epoch 192/1000, Loss: 0.06413406133651733
Epoch 202/1000, Loss: 0.06146261841058731
Epoch 212/1000, Loss: 0.06414571404457092
Epoch 222/1000, Loss: 0.06728968769311905
Epoch 232/1000, Loss: 0.0630531907081604
Epoch 242/1000, Loss: 0.05740426480770111
Epoch 252/1000, Loss: 0.05534536764025688
Epoch 262/1000, Loss: 0.06245315819978714
Epoch 272/1000, Loss: 0.06413358449935913
Epoch 282/1000, Loss: 0.058028869330883026
Epoch 292/1000, Loss: 0.05633276700973511
Epoch 302/1000, Loss: 0.05898240953683853
Epoch 312/1000, Loss: 0.056370943784713745
Epoch 322/1000, Loss: 0.05779092013835907
Training complete!
Embedding is saved at ./embedding.tsv
[11]:
array([[-0.5202136 , -0.17956705,  0.7582989 , ..., -0.1178573 ,
        -0.00734187,  2.4900703 ],
       [-0.46302137, -0.19299765,  1.0326352 , ..., -0.19934863,
        -0.06319354,  2.3966684 ],
       [-0.5225591 ,  0.6163565 ,  0.13151401, ..., -0.03753699,
        -0.00742718,  2.095408  ],
       ...,
       [-0.23238094,  0.99448997, -0.00804079, ..., -0.03581214,
         0.579624  ,  1.5177417 ],
       [-0.32503372, -0.07396829,  1.0361072 , ..., -0.13758945,
         0.98967546,  2.352452  ],
       [-0.21778385,  0.82494813,  0.03623015, ..., -0.06693349,
         0.8934428 ,  1.5172613 ]], dtype=float32)
[12]:
# Idenfify the spatiotemporal patterns through pseudo-Spatiotemporal Map (pSM)
sf.pseudo_Spatiotemporal_Map(pSM_values_save_filepath="./pSM_values.tsv", n_neighbors=20, resolution=1.0)
Performing pseudo-Spatiotemporal Map
pseudo-Spatiotemporal Map(pSM) calculation complete, pSM values of cells or spots saved at ./pSM_values.tsv!
[13]:
# Create a new column called 'pspace' from pSM values of cells or spots.
adata.obs['pspace'] = sf.pSM_values
[14]:
# Visualize spatial coordinates in a scatterplot colored by 'pspace'
ax = sc.pl.embedding(adata,basis='spatial',color='pspace',show=False,color_map=cmp_pspace)
ax.axis('equal')
#plt.savefig('figures/seqFISH_pspace.png',bbox_inches='tight',transparent=True,dpi=400)
#plt.savefig('figures/seqFISH_pspace.pdf',bbox_inches='tight',transparent=True,dpi=400)
[14]:
(-2.802851333716996,
 2.7737768053548297,
 -3.8412926523589537,
 3.841292652358961)
_images/Pseudo-spatiotemporal_analysis_Reproducibility_with_original_data_%28seqFISH%29_21_1.png
[15]:
# Visualize spatial coordinates in a scatterplot colored by 'celltype_mapped_refined'
ax = sc.pl.embedding(adata,basis='spatial',color='celltype_mapped_refined',show=False)
ax.axis('equal')
#plt.savefig('figures/seqFISH_ct.png',bbox_inches='tight',transparent=True,dpi=400)
#plt.savefig('figures/seqFISH_ct.pdf',bbox_inches='tight',transparent=True,dpi=400)
[15]:
(-2.802851333716996,
 2.7737768053548297,
 -3.8412926523589537,
 3.841292652358961)
_images/Pseudo-spatiotemporal_analysis_Reproducibility_with_original_data_%28seqFISH%29_22_1.png

Reproducibility with original data (DLPFC)

This tutorial demonstrates how to pseudo-spatiotemporal analysis on 10X Visium human dorsolateral prefrontal cortex data using Pysodb and SpaceFlow.

A reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w.

This tutorial refers to the following tutorial at https://github.com/hongleir/SpaceFlow/blob/master/tutorials/seqfish_mouse_embryogenesis.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several python packages commonly used in data analysis and visualization.
# numpy (imported as np) is a package for numerical computing with arrays.
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis.
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization.
import matplotlib.pyplot as plt
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'maynard2021trans'
experiment_name = '151671'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
#%%time
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[151671] in dataset[maynard2021trans]

Perform SpaceFlow for pseudo-spatiotemporal analysis

[8]:
# Create SpaceFlow Object
#%%time
sf = SpaceFlow.SpaceFlow(
    count_matrix=adata.X,
    spatial_locs=adata.obsm['spatial'],
    sample_names=adata.obs_names,
    gene_names=adata.var_names
)

When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().”, in the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value. respectively.

[9]:
adata
[9]:
AnnData object with n_obs × n_vars = 4110 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
    var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[10]:
# Preprocess data
#%%time
sf.preprocessing_data(n_top_genes=3000)

When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”,in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.

When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.

[11]:
# Train a deep graph network model
#%%time
sf.train(
    spatial_regularization_strength=0.1,
    z_dim=50,
    lr=1e-3,
    epochs=1000,
    max_patience=50,
    min_stop=100,
    random_seed=42,
    gpu=0,
    regularization_acceleration=True,
    edge_subset_sz=1000000
)
Epoch 2/1000, Loss: 1.6001609563827515
Epoch 12/1000, Loss: 1.4461232423782349
Epoch 22/1000, Loss: 1.4314894676208496
Epoch 32/1000, Loss: 1.4025373458862305
Epoch 42/1000, Loss: 1.3406403064727783
Epoch 52/1000, Loss: 1.202500820159912
Epoch 62/1000, Loss: 0.9297404289245605
Epoch 72/1000, Loss: 0.6375124454498291
Epoch 82/1000, Loss: 0.4029056429862976
Epoch 92/1000, Loss: 0.2597547173500061
Epoch 102/1000, Loss: 0.17297157645225525
Epoch 112/1000, Loss: 0.12740349769592285
Epoch 122/1000, Loss: 0.10466508567333221
Epoch 132/1000, Loss: 0.08336742222309113
Epoch 142/1000, Loss: 0.07964801788330078
Epoch 152/1000, Loss: 0.07678414136171341
Epoch 162/1000, Loss: 0.069277323782444
Epoch 172/1000, Loss: 0.0713764876127243
Epoch 182/1000, Loss: 0.06293175369501114
Epoch 192/1000, Loss: 0.06099879369139671
Epoch 202/1000, Loss: 0.061921969056129456
Epoch 212/1000, Loss: 0.05519292131066322
Epoch 222/1000, Loss: 0.06154436245560646
Epoch 232/1000, Loss: 0.05294334143400192
Epoch 242/1000, Loss: 0.047114331275224686
Epoch 252/1000, Loss: 0.051006168127059937
Epoch 262/1000, Loss: 0.04537639394402504
Epoch 272/1000, Loss: 0.05118394270539284
Epoch 282/1000, Loss: 0.050518788397312164
Epoch 292/1000, Loss: 0.045374199748039246
Epoch 302/1000, Loss: 0.04892214760184288
Epoch 312/1000, Loss: 0.04162988066673279
Epoch 322/1000, Loss: 0.04166189581155777
Epoch 332/1000, Loss: 0.043898724019527435
Epoch 342/1000, Loss: 0.04035808518528938
Epoch 352/1000, Loss: 0.04706918075680733
Epoch 362/1000, Loss: 0.04220199957489967
Epoch 372/1000, Loss: 0.03935319185256958
Epoch 382/1000, Loss: 0.050716839730739594
Epoch 392/1000, Loss: 0.0474902018904686
Epoch 402/1000, Loss: 0.041612617671489716
Epoch 412/1000, Loss: 0.0376250222325325
Epoch 422/1000, Loss: 0.03883547708392143
Epoch 432/1000, Loss: 0.03730320930480957
Epoch 442/1000, Loss: 0.03816107660531998
Epoch 452/1000, Loss: 0.03693684563040733
Epoch 462/1000, Loss: 0.03563809394836426
Epoch 472/1000, Loss: 0.038218624889850616
Epoch 482/1000, Loss: 0.042877957224845886
Epoch 492/1000, Loss: 0.035798318684101105
Epoch 502/1000, Loss: 0.045439526438713074
Epoch 512/1000, Loss: 0.04299004375934601
Epoch 522/1000, Loss: 0.0379679910838604
Epoch 532/1000, Loss: 0.03415396809577942
Epoch 542/1000, Loss: 0.03743215650320053
Epoch 552/1000, Loss: 0.039857201278209686
Epoch 562/1000, Loss: 0.03960778936743736
Epoch 572/1000, Loss: 0.04264502972364426
Epoch 582/1000, Loss: 0.03573727235198021
Epoch 592/1000, Loss: 0.03262363746762276
Epoch 602/1000, Loss: 0.0346861407160759
Epoch 612/1000, Loss: 0.03681035339832306
Epoch 622/1000, Loss: 0.048560068011283875
Epoch 632/1000, Loss: 0.03910374641418457
Epoch 642/1000, Loss: 0.03717295080423355
Epoch 652/1000, Loss: 0.032764457166194916
Epoch 662/1000, Loss: 0.035095784813165665
Epoch 672/1000, Loss: 0.030956480652093887
Epoch 682/1000, Loss: 0.033467892557382584
Epoch 692/1000, Loss: 0.033409349620342255
Epoch 702/1000, Loss: 0.030765127390623093
Epoch 712/1000, Loss: 0.030486250296235085
Epoch 722/1000, Loss: 0.03218876197934151
Epoch 732/1000, Loss: 0.03625836223363876
Epoch 742/1000, Loss: 0.031791478395462036
Epoch 752/1000, Loss: 0.03017842024564743
Epoch 762/1000, Loss: 0.02982253022491932
Epoch 772/1000, Loss: 0.02927223965525627
Epoch 782/1000, Loss: 0.036921072751283646
Epoch 792/1000, Loss: 0.036078497767448425
Epoch 802/1000, Loss: 0.031135806813836098
Epoch 812/1000, Loss: 0.031050482764840126
Epoch 822/1000, Loss: 0.046886757016181946
Epoch 832/1000, Loss: 0.029026398435235023
Epoch 842/1000, Loss: 0.03208329528570175
Epoch 852/1000, Loss: 0.029403438791632652
Training complete!
Embedding is saved at ./embedding.tsv
[11]:
array([[ 1.7401947e+00,  2.8369801e+00,  2.2377011e-01, ...,
         3.7823334e-01, -3.9555269e-01, -5.8662368e-04],
       [ 2.1318159e+00,  1.6539254e+00, -2.3631152e-02, ...,
         1.1321043e+00, -4.1971338e-01,  1.5606171e+00],
       [ 1.8066632e+00,  2.2979531e+00, -6.5132994e-03, ...,
         2.5315709e-02, -4.6509734e-01, -5.6492598e-03],
       ...,
       [ 1.7791069e+00,  2.6776686e+00, -2.0510532e-02, ...,
         7.8728390e-01, -3.9734748e-01,  1.3013610e+00],
       [ 1.5107570e+00,  2.0946822e+00,  1.1377124e+00, ...,
         8.1175212e-03, -5.2531648e-01,  9.9224053e-02],
       [ 1.4871329e+00,  1.9591911e+00, -3.1830516e-02, ...,
         8.9999894e-03, -3.4401137e-01, -1.1019838e-03]], dtype=float32)
[12]:
# Idenfify the spatiotemporal patterns through pseudo-Spatiotemporal Map (pSM)
sf.pseudo_Spatiotemporal_Map(pSM_values_save_filepath="./pSM_values.tsv", n_neighbors=20, resolution=1.0)
Performing pseudo-Spatiotemporal Map
pseudo-Spatiotemporal Map(pSM) calculation complete, pSM values of cells or spots saved at ./pSM_values.tsv!
[13]:
# Create a new column called 'pspace' from pSM values of cells or spots.
adata.obs['pspace'] = sf.pSM_values
[14]:
# Visualize spatial coordinates in a scatterplot colored by pspace
ax = sc.pl.embedding(adata,basis='spatial',color='pspace',show=False,color_map=cmp_pspace)
ax.axis('equal')
#plt.savefig('figures/DLPFC_pspace.png',bbox_inches='tight',transparent=True,dpi=400)
#plt.savefig('figures/DLPFC_pspace.pdf',bbox_inches='tight',transparent=True,dpi=400)


[14]:
(2755.35, 12383.65, 2191.15, 12197.85)
_images/Pseudo-spatiotemporal_analysis_Reproducibility_with_original_data_%28DLPFC%29_21_1.png
[15]:
# Visualize spatial coordinates in a scatterplot colored by Region
ax = sc.pl.embedding(adata,basis='spatial',color='Region',show=False,palette=cmp_domain)
ax.axis('equal')
#plt.savefig('figures/seqFISH_ct.png',bbox_inches='tight',transparent=True,dpi=400)
#plt.savefig('figures/seqFISH_ct.pdf',bbox_inches='tight',transparent=True,dpi=400)


[15]:
(2755.35, 12383.65, 2191.15, 12197.85)
_images/Pseudo-spatiotemporal_analysis_Reproducibility_with_original_data_%28DLPFC%29_22_1.png

Application with new data

This tutorial demonstrates how to pseudo-spatiotemporal analysis on BaristaSeq mouse visual cortex data using Pysodb and SpaceFlow.

A reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w.

This tutorial refers to the following tutorial at https://github.com/hongleir/SpaceFlow/blob/master/tutorials/seqfish_mouse_embryogenesis.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several python packages commonly used in data analysis and visualization.
# numpy (imported as np) is a package for numerical computing with arrays.
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis.
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization.
import matplotlib.pyplot as plt
# seaborn (imported as sns) is a package for statistical data visualization, providing high-level interfaces for creating informative and attractive visualizations.
import seaborn as sns
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'Sun2021Integrating'
experiment_name = 'Slice_1'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
#%%time
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[Slice_1] in dataset[Sun2021Integrating]
[8]:
# Remove cells belong to the layers 'outside_VISp' and 'VISp'
adata = adata[adata.obs['layer']!='outside_VISp']
adata = adata[adata.obs['layer']!='VISp']
[9]:
# Filter out genes
sc.pp.filter_genes(adata, min_cells=3)

Perform SpaceFlow for pseudo-spatiotemporal analysis

[10]:
# Create SpaceFlow Object
#%%time
#sf = SpaceFlow.SpaceFlow(adata=adata)
sf = SpaceFlow.SpaceFlow(
    count_matrix=adata.X,
    spatial_locs=adata.obsm['spatial'],
    sample_names=adata.obs_names,
    gene_names=adata.var_names
)

When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().”, in the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value. respectively.

[11]:
# Preprocess data
#%%time
sf.preprocessing_data(n_top_genes=3000)

When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”,in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.

When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.

[12]:
# Train a deep graph network model
#%%time
sf.train(
    spatial_regularization_strength=0.1,
    z_dim=50,
    lr=1e-3,
    epochs=1000,
    max_patience=50,
    min_stop=100,
    random_seed=42,
    gpu=0,
    regularization_acceleration=True,
    edge_subset_sz=1000000
)
Epoch 2/1000, Loss: 1.4427732229232788
Epoch 12/1000, Loss: 1.404854416847229
Epoch 22/1000, Loss: 1.355185866355896
Epoch 32/1000, Loss: 1.278242826461792
Epoch 42/1000, Loss: 1.1435610055923462
Epoch 52/1000, Loss: 0.9432040452957153
Epoch 62/1000, Loss: 0.714905321598053
Epoch 72/1000, Loss: 0.5822354555130005
Epoch 82/1000, Loss: 0.5121979713439941
Epoch 92/1000, Loss: 0.425785630941391
Epoch 102/1000, Loss: 0.37741926312446594
Epoch 112/1000, Loss: 0.3343759775161743
Epoch 122/1000, Loss: 0.3119865655899048
Epoch 132/1000, Loss: 0.26779788732528687
Epoch 142/1000, Loss: 0.22297403216362
Epoch 152/1000, Loss: 0.28504329919815063
Epoch 162/1000, Loss: 0.22740697860717773
Epoch 172/1000, Loss: 0.234877809882164
Epoch 182/1000, Loss: 0.1949552297592163
Epoch 192/1000, Loss: 0.20016708970069885
Epoch 202/1000, Loss: 0.2028239667415619
Epoch 212/1000, Loss: 0.173057422041893
Epoch 222/1000, Loss: 0.21973812580108643
Epoch 232/1000, Loss: 0.17184185981750488
Epoch 242/1000, Loss: 0.2074703425168991
Epoch 252/1000, Loss: 0.19310833513736725
Epoch 262/1000, Loss: 0.2128731906414032
Epoch 272/1000, Loss: 0.17560149729251862
Epoch 282/1000, Loss: 0.2080163210630417
Epoch 292/1000, Loss: 0.18244342505931854
Epoch 302/1000, Loss: 0.17130610346794128
Epoch 312/1000, Loss: 0.16225826740264893
Epoch 322/1000, Loss: 0.15506717562675476
Epoch 332/1000, Loss: 0.1312013417482376
Epoch 342/1000, Loss: 0.14738863706588745
Epoch 352/1000, Loss: 0.1790708303451538
Epoch 362/1000, Loss: 0.1254740208387375
Epoch 372/1000, Loss: 0.1862727850675583
Epoch 382/1000, Loss: 0.17113451659679413
Epoch 392/1000, Loss: 0.141239196062088
Epoch 402/1000, Loss: 0.11042129248380661
Epoch 412/1000, Loss: 0.1695185899734497
Epoch 422/1000, Loss: 0.11782366037368774
Epoch 432/1000, Loss: 0.14781805872917175
Epoch 442/1000, Loss: 0.17524565756320953
Epoch 452/1000, Loss: 0.13630664348602295
Epoch 462/1000, Loss: 0.15702477097511292
Epoch 472/1000, Loss: 0.11048941314220428
Epoch 482/1000, Loss: 0.12970110774040222
Epoch 492/1000, Loss: 0.15136636793613434
Epoch 502/1000, Loss: 0.10564137995243073
Epoch 512/1000, Loss: 0.14658908545970917
Epoch 522/1000, Loss: 0.11960890889167786
Epoch 532/1000, Loss: 0.13005761802196503
Epoch 542/1000, Loss: 0.11053520441055298
Epoch 552/1000, Loss: 0.11907373368740082
Epoch 562/1000, Loss: 0.1232338398694992
Epoch 572/1000, Loss: 0.11129128932952881
Epoch 582/1000, Loss: 0.10171715170145035
Epoch 592/1000, Loss: 0.09877597540616989
Epoch 602/1000, Loss: 0.12555311620235443
Epoch 612/1000, Loss: 0.10553622245788574
Epoch 622/1000, Loss: 0.13101576268672943
Epoch 632/1000, Loss: 0.1205601841211319
Training complete!
Embedding is saved at ./embedding.tsv
[12]:
array([[ 0.8899192 ,  0.01435345,  0.7866027 , ...,  0.5507234 ,
        -0.03758647, -0.08572218],
       [ 0.8003285 ,  0.00401396,  0.72124624, ...,  0.42994535,
        -0.04228881, -0.06209074],
       [ 0.8963664 ,  0.01202957,  0.7585074 , ...,  0.59778905,
        -0.03772109, -0.08880965],
       ...,
       [ 0.48202616,  0.01098088,  0.6471445 , ..., -0.00690297,
         0.73529345,  0.32098433],
       [ 0.5201945 ,  0.00505677,  0.6052369 , ..., -0.00182656,
         0.46181548,  0.19144082],
       [ 0.4544619 ,  0.00249925,  0.5253095 , ..., -0.00313081,
         0.6240246 ,  0.4014499 ]], dtype=float32)
[13]:
# Idenfify the spatiotemporal patterns through pseudo-Spatiotemporal Map (pSM)
sf.pseudo_Spatiotemporal_Map(pSM_values_save_filepath="./pSM_values.tsv", n_neighbors=20, resolution=1.0)

Performing pseudo-Spatiotemporal Map
pseudo-Spatiotemporal Map(pSM) calculation complete, pSM values of cells or spots saved at ./pSM_values.tsv!
[14]:
# Create a new column called 'pspace' from pSM values
adata.obs['pspace'] = np.array(sf.pSM_values)
[15]:
# Create a UMAP projection from SpaceFlow's embedding
adata.obsm['embedding'] = sf.embedding
sc.pp.neighbors(adata, n_neighbors=20, use_rep='embedding')
sc.tl.umap(adata)
[16]:
# Since this dataset contains 'depth_um' in obs, a iroot was set to be the cell with smallest depth_um.
# For datasets without depth information, one can use the pseudo_Spatiotemporal_Map function
# Here set the iroot according to "depth_um"

# Select the root cell for trajectory inference based on its depth, by setting the index of the cell with the smallest 'depth_um' value as the root
adata.uns['iroot'] = np.argmin(adata.obs['depth_um'])
sc.tl.diffmap(adata)
sc.tl.dpt(adata)
[17]:
# Plot spatial embedding and UMAP embedding for diffusion pseudotime and layer(label), respectively

# si = 'Slice_1'

ax = sc.pl.embedding(adata,basis='spatial',color=['dpt_pseudotime'],show=False,color_map=cmp_pspace)
ax.axis('equal')
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_pspace.png',bbox_inches='tight',transparent=True,dpi=400)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_pspace.pdf',bbox_inches='tight',transparent=True,dpi=400)

ax = sc.pl.embedding(adata,basis='spatial',color='layer',show=False,palette=cmp_domain)
ax.axis('equal')
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_domain.png',bbox_inches='tight',transparent=True,dpi=400)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_domain.pdf',bbox_inches='tight',transparent=True,dpi=400)

fig, ax = plt.subplots()
fig.set_size_inches(4, 4)
sc.pl.embedding(adata,basis='X_umap',color='layer',show=False,palette=cmp_domain,ax=ax)
#ax.axis('equal')
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_UMAP_domain.png',bbox_inches='tight',transparent=True,dpi=400)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_UMAP_domain.pdf',bbox_inches='tight',transparent=True,dpi=400)

fig, ax = plt.subplots()
fig.set_size_inches(4, 4)
sc.pl.embedding(adata,basis='X_umap',color=['dpt_pseudotime'],show=False,color_map=cmp_pspace,ax=ax)
#ax.axis('equal')
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_UMAP_pspace.png',bbox_inches='tight',transparent=True,dpi=400)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_UMAP_pspace.pdf',bbox_inches='tight',transparent=True,dpi=400)
[17]:
<Axes: title={'center': 'dpt_pseudotime'}, xlabel='X_umap1', ylabel='X_umap2'>
_images/Pseudo-spatiotemporal_analysis_Application_with_new_data_24_1.png
_images/Pseudo-spatiotemporal_analysis_Application_with_new_data_24_2.png
_images/Pseudo-spatiotemporal_analysis_Application_with_new_data_24_3.png
_images/Pseudo-spatiotemporal_analysis_Application_with_new_data_24_4.png
[18]:
# Check whether the pseudo-spatiotemporal (dpt_pseudotime) value from infer increases according to the layer of the cortex
adata_use = adata
fig,ax = plt.subplots(figsize=(4,4))
sns.violinplot(data=adata_use.obs,x='layer',y='dpt_pseudotime',palette = adata_use.uns['layer_colors'],ax=ax)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_violin.png',bbox_inches='tight',transparent=True,dpi=400)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_violin.pdf',bbox_inches='tight',transparent=True,dpi=400)
[18]:
<Axes: xlabel='layer', ylabel='dpt_pseudotime'>
_images/Pseudo-spatiotemporal_analysis_Application_with_new_data_25_1.png
[19]:
adata.obs
[19]:
Slice x y Dist to pia Dist to bottom Angle unused-1 unused-2 x_um y_um depth_um layer leiden pspace dpt_pseudotime
20022 1 12186.20 8617.70 1029.630 205.270 174.894 0 0 1218.620 861.770 1093.752193 VISp_VI 1 0.911279 0.903298
20023 1 12789.10 8690.82 1072.540 168.011 172.700 0 0 1278.910 869.082 1139.629730 VISp_VI 4 0.892462 0.882942
20024 1 11927.80 8715.20 1003.280 231.018 177.522 0 0 1192.780 871.520 1066.880345 VISp_VI 1 0.928142 0.921913
20025 1 12860.60 8729.82 1075.760 165.770 172.700 0 0 1286.060 872.982 1143.373530 VISp_VI 4 0.895546 0.886348
20027 1 12587.60 8760.70 1052.380 187.131 172.700 0 0 1258.760 876.070 1119.019231 VISp_VI 3 0.909229 0.901504
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
230184 1 3239.60 8952.13 307.763 971.248 178.930 0 0 323.960 895.213 333.538252 VISp_II/III 4 0.715771 0.520664
230186 1 2713.10 9006.56 261.052 1018.700 177.579 0 0 271.310 900.656 286.855966 VISp_II/III 2 0.703140 0.509154
230187 1 2193.10 9009.00 214.572 1063.490 172.257 0 0 219.310 900.900 243.621417 VISp_II/III 0 0.783840 0.606285
230188 1 2405.97 9015.50 234.015 1045.450 171.641 0 0 240.597 901.550 260.900559 VISp_II/III 0 0.727675 0.530846
230192 1 2963.35 9161.75 273.140 1005.830 178.946 0 0 296.335 916.175 298.913498 VISp_II/III 0 0.720848 0.534821

1525 rows × 15 columns

[20]:
adata_use.uns['layer_colors']
[20]:
['#66c5cc', '#f6cf71', '#f89c74', '#dcb0f2', '#87c55f', '#9eb9f3']
[21]:
# Check whether the pseudo-spatiotemporal (dpt_pseudotime) from infer was correlated with their depth_um (depth_um)
adata_use = adata
g = sns.jointplot(x="depth_um", y="dpt_pseudotime", data=adata_use.obs,hue='layer',
                  palette=list(adata_use.uns['layer_colors']),
                  # kind="reg",
                  # truncate=False,
                  # xlim=(0, 60), ylim=(0, 12),
                  # color="m",
                  height=7)
g.ax_joint.legend_.remove()
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_jointplot.png',bbox_inches='tight',transparent=True,dpi=400)
# plt.savefig(f'../figures/pspace/BaristaSeq_{si}_jointplot.pdf',bbox_inches='tight',transparent=True,dpi=400)
_images/Pseudo-spatiotemporal_analysis_Application_with_new_data_28_0.png

Spatial data integration

Installation

This tutorial demonstrates how to install Pysodb on a method which use for spatial data integration.

Using stagate as an example, install Pysodb in its installation environment.

Reference tutorials presents can be found at https://github.com/QIFEIDKN/STAGATE_pyG and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda, Jupyter notebook and CUDA in advance.

Reference tutorials presents how to install Visual Studio Code, Conda, Jupyter notebook and CUDA, respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal, https://code.visualstudio.com/docs/datascience/data-science-tutorial and https://developer.nvidia.com/cuda-downloads.

STAGATE’s tensorflow version is based on version 1.15.0. Officials have stopped updating TensorFlow1 and now many algorithms are more developed based on TensorFlow2, STAGATE_pyG will be selected and demonstrated for installation in this tutorial.

Since STAGATE_pyG is based on pyG (PyTorch Geometric) framework, it requires GPUs on a device and need to install CUDA tool. A tutorials is available at https://developer.nvidia.com/cuda-downloads.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation STAGATE_pyG

3. Select the installation path and open it
[ ]:
cd <path>
4. Clone STAGATE_pyG code
[ ]:
git clone https://github.com/QIFEIDKN/STAGATE_pyG.git

If cloning the code fails through git, please download it at https://github.com/QIFEIDKN/STAGATE_pyG, upload it to the folder created above, and extract it.

5. Open the STAGATE_pyG directory
[ ]:
cd STAGATE_pyG
6. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
7. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
8. Install torch with CUDA

eg. pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/cu117/torch_stable.html

[ ]:
pip install torch==<torch_version>+<cuda_version> -f https://download.pytorch.org/whl/<cuda_version>/torch_stable.html
9. Install PyTorch Geometric

Install a PyTorch Geometric according to the version of pytorch and CUDA, system. Select the package and copy a run command to run on your terminal. https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

eg. pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv torch_geometric -f https://data.pyg.org/whl/torch-1.13.0+cu117.html

10. Install other python packages from requirement.txt

pip install -r requirement.txt

11. Install palettable package
[ ]:
pip install palettable
12. Install harmonypy package
[ ]:
pip install harmonypy

Installation Pysodb

Keep the conda environment active

13. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

14. Open the Pysodb directory
[ ]:
cd pysodb
15. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

Reproducibility with original data

This tutorial demonstrates how to spatial data integration on Stereo-seq and Slide-seqV2 mouse olfactory bulb data using Pysodb and STAGATE based on pyG (PyTorch Geometric) framework.

A reference paper can be found at https://www.nature.com/articles/s41467-022-29439-6.

This tutorial refers to the following tutorial at https://stagate.readthedocs.io/en/latest/AT2.html. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several Python packages commonly used in data analysis and visualization:
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
import scanpy.external as sce
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
# Anndata is a package for working with annotated data.
import anndata as ad
[3]:
# Import a STAGATE_pyG module
import STAGATE_pyG as STAGATE

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[4]:
# Imports a palettable package
import palettable
# Create two variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_old = palettable.cartocolors.qualitative.Bold_10.mpl_colors
cmp_old_biotech = palettable.cartocolors.qualitative.Safe_4.mpl_colors

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialization
sodb = pysodb.SODB()
load Slide-seqV2
[7]:
# Define names of dataset_name and experiment_name
dataset_name = 'stickels2020highly'
experiment_name = 'stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15] in dataset[stickels2020highly]
[8]:
# The following three steps can be skipped, as the barcode information has already been added to the 'stickels2020highly' adata obtained through pysobd.
# Downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing
#used_barcode = pd.read_csv('data/used_barcodes.txt', sep='\t', header=None)
#used_barcode = used_barcode[0]
[9]:
#adata = adata[used_barcode,]
[10]:
# Filter genes to retain only those present in at least 50 cells
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
After flitering:  (20139, 11750)
[11]:
adata
[11]:
AnnData object with n_obs × n_vars = 20139 × 11750
    obs: 'leiden'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'n_cells'
    uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[12]:
# Create a dictionary named adata_list
adata_list = {}
[13]:
# Modify the names and save in the dictionary with the key 'SlideSeqV2'.
adata.obs_names = [x+'_SlideSeqV2' for x in adata.obs_names]
adata_list['SlideSeqV2'] = adata.copy()
load Stereo-seq
[14]:
# Define names of another dataset_name and experiment_name
dataset_name = 'Fu2021Unsupervised'
experiment_name = 'StereoSeq_MOB'
# Load another specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[StereoSeq_MOB] in dataset[Fu2021Unsupervised]
[15]:
# Filter out genes
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
After flitering:  (19109, 14376)
[16]:
# Update names and save in the dictionary under the key 'StereoSeq'
adata.obs_names = [x+'_StereoSeq' for x in adata.obs_names]
adata_list['StereoSeq'] = adata.copy()

Constructing the spatial network for each secion

[17]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 50 for adata_list['SlideSeqV2']
STAGATE.Cal_Spatial_Net(adata_list['SlideSeqV2'], rad_cutoff=50)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for adata_list['SlideSeqV2']
STAGATE.Stats_Spatial_Net(adata_list['SlideSeqV2'])
------Calculating spatial graph...
The graph contains 228300 edges, 20139 cells.
11.3362 neighbors per cell on average.
_images/Spatial_data_integration_Reproducibility_with_original_data_24_1.png
[18]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 50 for adata_list['StereoSeq']
STAGATE.Cal_Spatial_Net(adata_list['StereoSeq'], rad_cutoff=50)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for adata_list['StereoSeq']
STAGATE.Stats_Spatial_Net(adata_list['StereoSeq'])
------Calculating spatial graph...
The graph contains 144318 edges, 19109 cells.
7.5524 neighbors per cell on average.
_images/Spatial_data_integration_Reproducibility_with_original_data_25_1.png
[19]:
adata_list['SlideSeqV2'].uns['Spatial_Net']
[19]:
Cell1 Cell2 Distance
0 AAAAAAACAAAAGG_SlideSeqV2 CTCCGGGCTCTTCA_SlideSeqV2 44.777226
1 AAAAAAACAAAAGG_SlideSeqV2 ATAAGTTGCCCCGT_SlideSeqV2 41.494698
2 AAAAAAACAAAAGG_SlideSeqV2 CCAGCAAAGCTACA_SlideSeqV2 29.429237
3 AAAAAAACAAAAGG_SlideSeqV2 CCTCCTTAACGTTA_SlideSeqV2 33.634060
4 AAAAAAACAAAAGG_SlideSeqV2 ACGTTCGCTCATAT_SlideSeqV2 15.307514
... ... ... ...
9 TTTTTTTTTTTTAT_SlideSeqV2 CTGACTTTAATCTA_SlideSeqV2 46.076567
10 TTTTTTTTTTTTAT_SlideSeqV2 CCTATAACAGCCTG_SlideSeqV2 30.802922
11 TTTTTTTTTTTTAT_SlideSeqV2 CTTGGGCATATAAG_SlideSeqV2 37.316216
12 TTTTTTTTTTTTAT_SlideSeqV2 CGGCAGGGATCCCT_SlideSeqV2 47.548291
13 TTTTTTTTTTTTAT_SlideSeqV2 TGGCAGGGATCCCT_SlideSeqV2 47.594643

228300 rows × 3 columns

[20]:
# Concatenate 'SlideSeqV2' and 'StereoSeq' into a single AnnData object named 'adata'
adata = sc.concat([adata_list['SlideSeqV2'], adata_list['StereoSeq']], keys=None)
[21]:
# Concatenate two 'Spatial_Net'
adata.uns['Spatial_Net'] = pd.concat([adata_list['SlideSeqV2'].uns['Spatial_Net'], adata_list['StereoSeq'].uns['Spatial_Net']])
[22]:
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for whole adata
STAGATE.Stats_Spatial_Net(adata)
_images/Spatial_data_integration_Reproducibility_with_original_data_29_0.png
[23]:
# Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[24]:
adata
[24]:
AnnData object with n_obs × n_vars = 39248 × 10782
    obs: 'leiden'
    var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'Spatial_Net', 'hvg', 'log1p'
    obsm: 'X_pca', 'X_umap', 'spatial'

Running STAGATE

[25]:
adata = STAGATE.train_STAGATE(adata, n_epochs=500, device='cpu')
Size of Input:  (39248, 3000)
100%|██████████| 500/500 [1:41:41<00:00, 12.20s/it]

Spatial Clustering

[26]:
# Create a new column 'Tech' by splitting each name and selecting the last element
adata.obs['Tech'] = [x.split('_')[-1] for x in adata.obs_names]
[27]:
# Calculates neighbors in the 'STAGATE' representation, applies UMAP, and performs louvain clustering
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
sc.tl.louvain(adata,resolution=0.3)

When encountering the Error “No module named ‘igraph’”, users should activate the virtual environment at the terminal and execute “pip install igraph”.

When encountering the Error “No module named ‘louvain’”, users should activate the virtual environment at the terminal and execute “pip install louvain”.

[28]:
# Plot a UMAP projection
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Tech', title='Unintegrated',show=False,palette=cmp_old_biotech)
#plt.savefig('figures/old_before_umap.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/old_before_umap.pdf',dpi=400,transparent=True,bbox_inches='tight')
[28]:
<Axes: title={'center': 'Unintegrated'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Reproducibility_with_original_data_38_1.png
[29]:
# Generate a plot of the UMAP embedding colored by louvain
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='louvain',show=False,palette=cmp_old)
#plt.savefig('figures/old_before_umap_leiden.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/old_before_umap_leiden.pdf',dpi=400,transparent=True,bbox_inches='tight')
[29]:
<Axes: title={'center': 'louvain'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Reproducibility_with_original_data_39_1.png
[30]:
# Display spatial distribution of cells colored by louvain clustering for two sequencing technologies ('StereoSeq' and 'SlideSeqV2')
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['StereoSeq', 'SlideSeqV2']:
    temp_adata = adata[adata.obs['Tech']==temp_tech, ]
    if it == 1:
        sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=6, ax=axs[it],
                        show=False, title=temp_tech)
    else:
        sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=6, ax=axs[it], legend_loc=None,
                        show=False, title=temp_tech)
    it+=1
#plt.savefig('figures/old_before_spatial_leiden0.3.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/old_before_spatial_leiden0.3.pdf',dpi=400,transparent=True,bbox_inches='tight')
_images/Spatial_data_integration_Reproducibility_with_original_data_40_0.png

Perform Harmony for spatial data intergration

Harmony is an algorithm for integrating multiple high-dimensional datasets It can be employed as a reference at https://github.com/slowkow/harmonypy and https://pypi.org/project/harmonypy/

[31]:
# Import harmonypy package
import harmonypy as hm
[32]:
# Use STAGATE representation to create 'meta_data' for harmony
data_mat = adata.obsm['STAGATE'].copy()
meta_data = adata.obs.copy()
[33]:
# Run harmony for STAGATE representation
ho = hm.run_harmony(data_mat, meta_data, ['Tech'])
2023-04-03 11:20:28,820 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
2023-04-03 11:20:34,852 - harmonypy - INFO - sklearn.KMeans initialization complete.
2023-04-03 11:20:34,995 - harmonypy - INFO - Iteration 1 of 10
2023-04-03 11:20:42,857 - harmonypy - INFO - Iteration 2 of 10
2023-04-03 11:20:50,987 - harmonypy - INFO - Iteration 3 of 10
2023-04-03 11:20:58,029 - harmonypy - INFO - Converged after 3 iterations
[34]:
# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = adata.obs_names
[35]:
# Creates a new AnnData object adata_Harmony using a transpose of the res matrix
adata_Harmony = sc.AnnData(res.T)
[36]:
adata_Harmony.obsm['spatial'] = pd.DataFrame(adata.obsm['spatial'], index=adata.obs_names).loc[adata_Harmony.obs_names,].values
adata_Harmony.obs['Tech'] = adata.obs.loc[adata_Harmony.obs_names, 'Tech']

Spatial Clustering after integration

[37]:
# Calculate neighbors, apply UMAP, and perform louvain clustering for the integrated data
sc.pp.neighbors(adata_Harmony)
sc.tl.umap(adata_Harmony)
sc.tl.louvain(adata_Harmony, resolution=0.3)
[38]:
# Plot a UMAP embedding colored by louvain after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_Harmony, color='Tech', title='STAGATE + Harmony',show=False,palette=cmp_old_biotech)
#plt.savefig('figures/old_after_umap.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/old_after_umap.pdf',dpi=400,transparent=True,bbox_inches='tight')
[38]:
<Axes: title={'center': 'STAGATE + Harmony'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Reproducibility_with_original_data_51_1.png
[39]:
# Display spatial distribution of cells colored by louvain clustering for two sequencing technologies ('StereoSeq' and 'SlideSeqV2') after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_Harmony, color='louvain',show=False,palette=cmp_old)
#plt.savefig('figures/old_after_umap_leiden0.4.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/old_after_umap_leiden0.4.pdf',dpi=400,transparent=True,bbox_inches='tight')
[39]:
<Axes: title={'center': 'louvain'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Reproducibility_with_original_data_52_1.png
[40]:
# Display spatial distribution of cells colored by louvain clustering for two sequencing technologies ('StereoSeq' and 'SlideSeqV2') after integration
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['StereoSeq', 'SlideSeqV2']:
    temp_adata = adata_Harmony[adata_Harmony.obs['Tech']==temp_tech, ]
    if it == 1:
        sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=6, ax=axs[it],
                        show=False, title=temp_tech)
    else:
        sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=6, ax=axs[it], legend_loc=None,
                        show=False, title=temp_tech)
    it+=1
#plt.savefig('figures/old_after_spatial_leiden0.4.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/old_after_spatial_leiden0.4.pdf',dpi=400,transparent=True,bbox_inches='tight')
_images/Spatial_data_integration_Reproducibility_with_original_data_53_0.png

Application with new data

This tutorial demonstrates how to spatial data integration on new MERFISH and STARmap mouse visual cortex data using Pysodb and STAGATE based on pyG (PyTorch Geometric) framework.

A reference paper can be found at https://www.nature.com/articles/s41467-022-29439-6.

This tutorial refers to the following tutorial at https://stagate.readthedocs.io/en/latest/AT2.html. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several Python packages commonly used in data analysis and visualization:
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
# Seaborn is a package for statistical data visualization
import seaborn as sns
[3]:
# Import a STAGATE_pyG package
import STAGATE_pyG as STAGATE

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[4]:
# Imports a palettable package
import palettable
# Create two variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_new = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_new_biotech = palettable.cartocolors.qualitative.Safe_4.mpl_colors

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialization
sodb = pysodb.SODB()
[7]:
# Load new MERFISH and STARmap mouse visual cortex data

adata_merfish = sodb.load_experiment('Merfish_Visp','mouse_VISp')
adata_STARmap = sodb.load_experiment('Wang2018Three_1k','mouse_brain_STARmap')

adata_merfish.obs['Tech'] = 'MERFISH'
adata_STARmap.obs['Tech'] = 'STARmap'
load experiment[mouse_VISp] in dataset[Merfish_Visp]
load experiment[mouse_brain_STARmap] in dataset[Wang2018Three_1k]
[8]:
adata_list = {
    'MERFISH':adata_merfish,
    'STARmap':adata_STARmap
}

Constructing the spatial network for each secion

[9]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 50 for adata_list['MERFISH']
STAGATE.Cal_Spatial_Net(adata_list['MERFISH'], rad_cutoff=50)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for adata_list['MERFISH']
STAGATE.Stats_Spatial_Net(adata_list['MERFISH'])
------Calculating spatial graph...
The graph contains 19162 edges, 2399 cells.
7.9875 neighbors per cell on average.
_images/Spatial_data_integration_Application_with_new_data_14_1.png
[10]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 50 for adata_list['STARmap']
STAGATE.Cal_Spatial_Net(adata_list['STARmap'], rad_cutoff=400)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for adata_list['STARmap']
STAGATE.Stats_Spatial_Net(adata_list['STARmap'])
------Calculating spatial graph...
The graph contains 6990 edges, 930 cells.
7.5161 neighbors per cell on average.
_images/Spatial_data_integration_Application_with_new_data_15_1.png
[11]:
# Concatenate 'MERFISH' and 'STARmap' into a single AnnData object named 'adata'
adata = sc.concat([adata_list['MERFISH'], adata_list['STARmap']], keys=None)
[12]:
# Concatenate two 'Spatial_Net'
adata.uns['Spatial_Net'] = pd.concat([adata_list['MERFISH'].uns['Spatial_Net'], adata_list['STARmap'].uns['Spatial_Net']])
[13]:
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for whole adata
STAGATE.Stats_Spatial_Net(adata)
_images/Spatial_data_integration_Application_with_new_data_18_0.png
[14]:
# Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[15]:
adata
[15]:
AnnData object with n_obs × n_vars = 3329 × 102
    obs: 'leiden', 'Tech'
    var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'Spatial_Net', 'hvg', 'log1p'
    obsm: 'X_pca', 'X_umap', 'spatial'

Running STAGATE

[16]:
adata = STAGATE.train_STAGATE(adata, n_epochs=500)
Size of Input:  (3329, 102)
100%|██████████| 500/500 [00:02<00:00, 201.42it/s]

Spatial Clustering

[17]:
# Calculates neighbors in the 'STAGATE' representation, applies UMAP, and performs louvain clustering
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
sc.tl.louvain(adata,resolution=0.5)

When encountering the Error “No module named ‘igraph’”, users should activate the virtual environment at the terminal and execute “pip install igraph”.

When encountering the Error “No module named ‘louvain’”, users should activate the virtual environment at the terminal and execute “pip install louvain”.

[18]:
# Plot a UMAP projection
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Tech', title='Unintegrated',show=False,palette=cmp_new_biotech)
#plt.savefig('figures/new_before_umap.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/new_before_umap.pdf',dpi=400,transparent=True,bbox_inches='tight')
[18]:
<Axes: title={'center': 'Unintegrated'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Application_with_new_data_26_1.png
[19]:
# Generate a plot of the UMAP embedding colored by louvain
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='louvain',show=False,palette=cmp_new)
#plt.savefig('figures/new_before_umap_leiden.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/new_before_umap_leiden.pdf',dpi=400,transparent=True,bbox_inches='tight')
[19]:
<Axes: title={'center': 'louvain'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Application_with_new_data_27_1.png
[20]:
# Display spatial distribution of cells colored by louvain clustering for two sequencing technologies ('MERFISH' and 'STARmap')
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['MERFISH', 'STARmap']:
    temp_adata = adata[adata.obs['Tech']==temp_tech, ]
    if it == 1:
        sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=20, ax=axs[it],
                        show=False, title=temp_tech)
    else:
        sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=20, ax=axs[it], legend_loc=None,
                        show=False, title=temp_tech)
    it+=1
#plt.savefig('figures/new_before_spatial_leiden0.5.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/new_before_spatial_leiden0.5.pdf',dpi=400,transparent=True,bbox_inches='tight')
_images/Spatial_data_integration_Application_with_new_data_28_0.png

Perform Harmony for spatial data intergration

Harmony is an algorithm for integrating multiple high-dimensional datasets It can be employed as a reference at https://github.com/slowkow/harmonypy and https://pypi.org/project/harmonypy/

[21]:
# Import harmonypy package
import harmonypy as hm
[22]:
# Use STAGATE representation to create 'meta_data' for harmony
data_mat = adata.obsm['STAGATE'].copy()
meta_data = adata.obs.copy()
[23]:
# Run harmony for STAGATE representation
ho = hm.run_harmony(data_mat, meta_data, ['Tech'])
2023-04-03 11:47:20,623 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
2023-04-03 11:47:21,926 - harmonypy - INFO - sklearn.KMeans initialization complete.
2023-04-03 11:47:21,942 - harmonypy - INFO - Iteration 1 of 10
2023-04-03 11:47:22,342 - harmonypy - INFO - Iteration 2 of 10
2023-04-03 11:47:22,743 - harmonypy - INFO - Iteration 3 of 10
2023-04-03 11:47:23,185 - harmonypy - INFO - Iteration 4 of 10
2023-04-03 11:47:23,580 - harmonypy - INFO - Converged after 4 iterations
[24]:
# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = adata.obs_names
[25]:
# Create a new AnnData object adata_Harmony using a transpose of the res matrix
adata_Harmony = sc.AnnData(res.T)
[26]:
adata_Harmony.obsm['spatial'] = pd.DataFrame(adata.obsm['spatial'], index=adata.obs_names).loc[adata_Harmony.obs_names,].values
adata_Harmony.obs['Tech'] = adata.obs.loc[adata_Harmony.obs_names, 'Tech']

Spatial Clustering after integration

[27]:
# Calculate neighbors, apply UMAP, and perform leiden clustering for the integrated data
sc.pp.neighbors(adata_Harmony)
sc.tl.umap(adata_Harmony)
sc.tl.leiden(adata_Harmony, resolution=0.3)

When encountering the Error “Please install the leiden algorithm: ‘conda install -c conda-forge leidenalg’ or ‘pip3 install leidenalg’”, users can follow the provided instructions: activate the virtual environment, execute “pip3 install leidenalg”.

[28]:
# Plot a UMAP embedding colored by leiden after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_Harmony, color='Tech', title='STAGATE + Harmony',show=False,palette=cmp_new_biotech)
#plt.savefig('figures/new_after_umap.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/new_after_umap.pdf',dpi=400,transparent=True,bbox_inches='tight')
[28]:
<Axes: title={'center': 'STAGATE + Harmony'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Application_with_new_data_40_1.png
[29]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_Harmony, color='leiden',show=False,palette=cmp_new)
#plt.savefig('figures/new_after_umap_leiden0.3.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/new_after_umap_leiden0.3.pdf',dpi=400,transparent=True,bbox_inches='tight')
[29]:
<Axes: title={'center': 'leiden'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_data_integration_Application_with_new_data_41_1.png
[30]:
# Display spatial distribution of cells colored by leiden clustering for two sequencing technologies ('MERFISH' and 'STARmap') after integration
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['MERFISH', 'STARmap']:
    temp_adata = adata_Harmony[adata_Harmony.obs['Tech']==temp_tech, ]
    if it == 1:
        sc.pl.embedding(temp_adata, basis="spatial", color="leiden",s=20, ax=axs[it],
                        show=False, title=temp_tech)
    else:
        sc.pl.embedding(temp_adata, basis="spatial", color="leiden",s=20, ax=axs[it], legend_loc=None,
                        show=False, title=temp_tech)
    it+=1
#plt.savefig('figures/new_after_spatial_leiden0.3.png',dpi=400,transparent=True,bbox_inches='tight')
#plt.savefig('figures/new_after_spatial_leiden0.3.pdf',dpi=400,transparent=True,bbox_inches='tight')
_images/Spatial_data_integration_Application_with_new_data_42_0.png

Spatial data alignment

Installation

This tutorial demonstrates how to install Pysodb on a method which use for alignment.

Taking paste as an example, install Pysodb in its installation environment.

Reference tutorials presents can be found at https://github.com/raphael-group/paste and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda and Jupyter notebook in advance.

Reference tutorials presents how to install Visual Studio Code, Conda and Jupyter notebook, respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal and https://code.visualstudio.com/docs/datascience/data-science-tutorial.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation PASTE

3. Select the installation path and open it
[ ]:
cd <path>
4. Clone PASTE code
[ ]:
git clone https://github.com/raphael-group/paste.git

If cloning the code fails through git, please download it at https://github.com/raphael-group/paste, upload it to the folder created above, and extract it.

5. Open the PASTE directory
[ ]:
cd paste
6. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
7. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
8. Install paste package
[ ]:
pip install paste-bio
9. Install other packages required for paste
[ ]:
pip install -r requirements.txt

Pysodb installation

Keep the conda environment active

10. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

11. Open the Pysodb directory
[ ]:
cd pysodb
12. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

Reproducibility with original data

This tutorial demonstrates spatial data alignment on 10X Visium DLPFC data using Pysodb and Paste.

A reference paper can be found at https://www.nature.com/articles/s41592-022-01459-6.

This tutorial refers to the following tutorial at https://github.com/raphael-group/paste_reproducibility/blob/main/notebooks/DLPFC_pairwise.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Imports various packages for data analysis and visualization.
# pandas: used for data manipulation and analysis.
import pandas as pd
# numpy: used for numerical computing, including mathematical operations on arrays and matrices.
import numpy as np
# seaborn: used for statistical data visualization, providing high-level interfaces for creating informative and attractive visualizations.
import seaborn as sns
# matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib import style
# time: provides time-related functions, such as measuring execution time and converting between time formats.
import time
# scanpy: a Python package for single-cell gene expression analysis, including preprocessing, clustering, and differential expression analysis.
import scanpy as sc
# networkx: a Python package for creating, manipulating, and studying complex networks.
import networkx as nx
style.use('seaborn-white')
/tmp/ipykernel_62466/185176416.py:20: MatplotlibDeprecationWarning: The seaborn styles shipped by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn. However, they will remain available as 'seaborn-v0_8-<style>'. Alternatively, directly use the seaborn API instead.
  style.use('seaborn-white')
[2]:
# Import paste package
import paste as pst

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Get the list of datasets with specific category
# Categories ["Spatial Transcriptomics", "Spatial Proteomics", "Spatial Metabolomics", "Spatial Genomics", "Spatial MultiOmics"]
sodb.list_dataset_by_category('Spatial Transcriptomics')
[5]:
['wang2021easi',
 'Vickovic2019high',
 'xia2022the',
 'guilliams2022spatial',
 'xia2019spatial',
 'Ratz2022Clonal',
 'Shi2022Spatial',
 'ortiz2020molecular',
 'Marshall2022High_mouse',
 'he2020integrating',
 'Konieczny2022Interleukin',
 'Joglekar2021A',
 'gracia2021genome',
 'hildebrandt2021spatial',
 'stahl2016visualization',
 'Pascual2021Dietary',
 'kvastad2021the',
 'backdahl2021spatial',
 'Barkley2022Cancer',
 'maniatis2019spatiotemporal',
 'Fang2022Conservation',
 'Misra2021Characterizing',
 'wei2022single',
 'Dixon2022Spatially',
 'Zhang2023Amolecularly_rawcount',
 'Booeshaghi2021Isoform',
 'Vickovic2019high_update',
 'Alon2021Expansion',
 'wang2022high',
 'Kadur2022Human',
 'Wang2018Three_1k',
 'Sun2022Excitatory',
 'chen2021dissecting',
 'thrane2018spatially',
 'lohoff2021integration',
 'ji2020multimodal',
 'moncada2020integrating',
 'Marshall2022High_human',
 'Melo2021Integrating',
 'zhang2021spatially',
 'Garcia2021Mapping',
 'codeluppi2018spatial',
 'Wu2022spatial',
 'Biermann2022Dissecting',
 'hunter2021spatially',
 'liu2022spatiotemporal',
 'DARTFISH',
 'Juntaro2022MEK',
 'Navarro2020Spatial',
 'Sanchez2021A',
 'Buzzi2022Spatial',
 'Wang2018three',
 'Fu2021Unsupervised',
 'chen2020spatial',
 'Gouin2021An',
 'carlberg2019exploring',
 'chen2021decoding',
 'fawkner2021spatiotemporal',
 'parigi2022the',
 'stickels2020highly',
 'Allen2022Molecular_aging',
 'mantri2021spatiotemporal',
 'eng2019transcriptome',
 'asp2017spatial',
 'Zeng2023Integrative',
 'Merfish_Visp',
 'Tower2021Spatial',
 'Lebrigand2022The',
 'Visium_Allen',
 'chen2022spatiotemporal_compre_20',
 'rodriques2019slide',
 'Borm2022Scalable',
 'maynard2021trans',
 'chen2022spatiotemporal',
 'Dhainaut2022Spatial',
 'seqFISH_VISp',
 'bergenstrahle2021super',
 'moffitt2018molecular',
 'Sun2021Integrating',
 '10x',
 'berglund2018spatial',
 'asp2019a',
 'Kleshchevnikov2022Cell2location',
 'Shah2016InSitu',
 'scispace',
 'Allen2022Molecular_lps']
[6]:
# Get the list of datasets
adata_list = sodb.load_dataset('maynard2021trans')
load experiment[151508] in dataset[maynard2021trans]
load experiment[151671] in dataset[maynard2021trans]
load experiment[151507] in dataset[maynard2021trans]
load experiment[151674] in dataset[maynard2021trans]
load experiment[151670] in dataset[maynard2021trans]
load experiment[151669] in dataset[maynard2021trans]
load experiment[151676] in dataset[maynard2021trans]
load experiment[151675] in dataset[maynard2021trans]
load experiment[151509] in dataset[maynard2021trans]
load experiment[151673] in dataset[maynard2021trans]
load experiment[151672] in dataset[maynard2021trans]
load experiment[151510] in dataset[maynard2021trans]
[7]:
adata_list
[7]:
{'151508': AnnData object with n_obs × n_vars = 4384 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'Region_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151671': AnnData object with n_obs × n_vars = 4110 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151507': AnnData object with n_obs × n_vars = 4226 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151674': AnnData object with n_obs × n_vars = 3673 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151670': AnnData object with n_obs × n_vars = 3498 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151669': AnnData object with n_obs × n_vars = 3661 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151676': AnnData object with n_obs × n_vars = 3460 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151675': AnnData object with n_obs × n_vars = 3592 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151509': AnnData object with n_obs × n_vars = 4789 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151673': AnnData object with n_obs × n_vars = 3639 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151672': AnnData object with n_obs × n_vars = 4015 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
 '151510': AnnData object with n_obs × n_vars = 4634 × 33538
     obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
     obsm: 'X_pca', 'X_umap', 'spatial'
     varm: 'PCs'
     obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'}

Preparation

[8]:
# Define a list containing 12 samples
sample_list = ["151507", "151508", "151509","151510", "151669", "151670","151671", "151672", "151673","151674", "151675", "151676"]
[9]:
# Create a new dictionary called "adatas" by removing  missing values in the "Region" column from each dataset in the original dictionary "adata_list".
adatas = {}
for key in adata_list.keys():
    a = adata_list[key]
    a = a[np.logical_not(a.obs['Region'].isna())]
    adatas[key] = a
[10]:
# Define groups of samples based on IDs
sample_groups = [["151507", "151508", "151509","151510"],[ "151669", "151670","151671", "151672"],[ "151673","151674", "151675", "151676"]]
# Create a list called layer_groups where each sub-list contains AnnData objects for each sample in the corresponding to sample_groups
layer_groups = [[adatas[sample_groups[j][i]] for i in range(len(sample_groups[j]))] for j in range(len(sample_groups))]
# Create a dictionary that maps layers to various colors from the default Seaborn color palette.
layer_to_color_map = {'Layer{0}'.format(i+1):sns.color_palette()[i] for i in range(6)}
layer_to_color_map['WM'] = sns.color_palette()[6]
[11]:
layer_groups
[11]:
[[View of AnnData object with n_obs × n_vars = 4221 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 4381 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'Region_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 4788 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 4595 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'],
 [View of AnnData object with n_obs × n_vars = 3636 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 3484 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 4093 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 3888 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'],
 [View of AnnData object with n_obs × n_vars = 3611 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 3635 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 3566 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  View of AnnData object with n_obs × n_vars = 3431 × 33538
      obs: 'in_tissue', 'array_row', 'array_col', 'Region', 'leiden'
      var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances']]
[12]:
# Redefine a function called visualize_slices to generate a visualization of tissue regions for different samples and different slices
def visualize_slices(layer_groups, adatas, sample_list, slice_map, layer_to_color_map, key):
    n_rows = len(layer_groups)
    n_cols = len(layer_groups[0])

    plot, axs = plt.subplots(n_rows, n_cols, figsize=(15, 11.5))

    for j in range(n_rows):
        axs[j, 0].text(-0.1, 0.5, 'Sample ' + slice_map[j], fontsize=12, rotation='vertical', transform=axs[j, 0].transAxes, verticalalignment='center')
        for i in range(n_cols):
            adata = adatas[sample_list[j * n_cols + i]]
            colors = list(adata.obs[key].astype('str').map(layer_to_color_map))
            colors = [(r, g, b) for r, g, b in colors]

            axs[j, i].scatter(layer_groups[j][i].obsm['spatial'][:, 0], layer_groups[j][i].obsm['spatial'][:, 1], linewidth=0, s=20, marker=".",
                              color=colors
                              )
            axs[j, i].set_title('Slice ' + slice_map[i], size=12)
            axs[j, i].invert_yaxis()
            axs[j, i].axis('off')
            axs[j, i].axis('equal')

            if i<n_cols-1:
                s = '300$\mu$m' if i==1 else "10$\mu$m"
                delta = 0.05 if i==1 else 0
                axs[j,i].annotate('',xy=(1-delta, 0.5), xytext=(1.2+delta, 0.5),xycoords=axs[j,i].transAxes,textcoords=axs[j,i].transAxes,arrowprops=dict(arrowstyle='<->',lw=1))
                axs[j,0].text(1.1, 0.55, s,fontsize=9,transform = axs[j,i].transAxes,horizontalalignment='center')


        legend_handles = [mpatches.Patch(color=layer_to_color_map[adata.obs[key].cat.categories[i]], label=adata.obs[key].cat.categories[i]) for i in range(len(adata.obs[key].cat.categories))]
        axs[j, n_cols-1].legend(handles=legend_handles, fontsize=10, title='Cortex layer', title_fontsize=12, bbox_to_anchor=(1, 1))

    return plot

[13]:
# Define a dictionary for slice map
slice_map = {0: 'A', 1: 'B', 2: 'C', 3: 'D'}
# Visualize the different slices mapped by layer_groups
plot = visualize_slices(layer_groups, adatas, sample_list, slice_map, layer_to_color_map, key= 'Region')
plt.show()
_images/Spatial_data_alignment_Reproducibility_with_original_data_17_0.png

Running PASTE for alignment

[14]:
# Redefine compute_pairwise_alignment function to compute the pairwise alignment between consecutive layers within each layer group, stores the resulting mappings in a list called pis
def compute_pairwise_alignment(groups, alpha=0.1):
    pis = [[None for i in range(len(groups[j])-1)] for j in range(len(groups))]

    for j in range(len(groups)):
        for i in range(len(groups[j])-1):
            pi0 = pst.match_spots_using_spatial_heuristic(groups[j][i].obsm['spatial'], groups[j][i+1].obsm['spatial'], use_ot=True)
            start = time.time()
            pis[j][i] = pst.pairwise_align(groups[j][i], groups[j][i+1], alpha=alpha, G_init=pi0, norm=True, verbose=False)
            tt = time.time() - start
            print(j, i, 'time', tt)

    return pis
[15]:
pis = compute_pairwise_alignment(groups = layer_groups, alpha=0.1)
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
0 0 time 202.1374397277832
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/ot/lp/__init__.py:343: UserWarning: numItermax reached before optimality. Try to increase numItermax.
  result_code_string = check_result(result_code)
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
0 1 time 31.971270322799683
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/ot/lp/__init__.py:343: UserWarning: numItermax reached before optimality. Try to increase numItermax.
  result_code_string = check_result(result_code)
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
0 2 time 18.63774585723877
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
1 0 time 339.0843195915222
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/ot/lp/__init__.py:343: UserWarning: numItermax reached before optimality. Try to increase numItermax.
  result_code_string = check_result(result_code)
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
RESULT MIGHT BE INACURATE
Max number of iteration reached, currently 100000. Sometimes iterations go on in cycle even though the solution has been reached, to check if it's the case here have a look at the minimal reduced cost. If it is very close to machine precision, you might actually have the correct solution, if not try setting the maximum number of iterations a bit higher
1 1 time 16.74564242362976
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
1 2 time 96.79306101799011
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
2 0 time 98.5936119556427
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
2 1 time 120.30212950706482
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
2 2 time 139.2454333305359

When encountering the error “Error: module ‘ot.gromov’ has no attribute ‘cg’”, users should first activate the virtual environment at the terminal and then downgrade POT with the following command:”pip install POT==0.8.2”.

[16]:
# Align spatial coordinates of sequential pairwise slices
paste_layer_groups = [pst.stack_slices_pairwise(layer_groups[j], pis[j]) for j in range(len(layer_groups)) ]
[17]:
#  Define a function to plots spatial coordinates of cells in different groups
def plot_slices_overlap(groups, adatas, sample_list, layer_to_color_map,save=None,):
    marker_list = ['.','*','x','+']

    for j in range(len(groups)):
        plt.figure(figsize=(10,10))
        for i in range(len(groups[j])):
            adata = adatas[sample_list[j*4+i]]
            colors = list(adata.obs['Region'].astype('str').map(layer_to_color_map))
            plt.scatter(groups[j][i].obsm['spatial'][:,0],groups[j][i].obsm['spatial'][:,1],linewidth=1,s=80, marker=marker_list[i],color=colors,alpha=0.7)
        plt.legend(handles=[mpatches.Patch(color=layer_to_color_map[adata.obs['Region'].cat.categories[i]], label=adata.obs['Region'].cat.categories[i]) for i in range(len(adata.obs['Region'].cat.categories))],fontsize=10,title='Cortex layer',title_fontsize=15,bbox_to_anchor=(1, 1))
        plt.gca().invert_yaxis()
        plt.axis('off')
        if save is None:
            plt.show()
        else:
            plt.savefig(f'{save}_{j}.pdf',bbox_inches='tight',transparent=True)
[18]:
# Plot Stacking of Four slices without alignment
plot_slices_overlap(layer_groups, adatas, sample_list, layer_to_color_map)
_images/Spatial_data_alignment_Reproducibility_with_original_data_24_0.png
_images/Spatial_data_alignment_Reproducibility_with_original_data_24_1.png
_images/Spatial_data_alignment_Reproducibility_with_original_data_24_2.png
[19]:
# Plot Stacking of Four slices with PASTE alignment
plot_slices_overlap(paste_layer_groups, adatas, sample_list, layer_to_color_map)

_images/Spatial_data_alignment_Reproducibility_with_original_data_25_0.png
_images/Spatial_data_alignment_Reproducibility_with_original_data_25_1.png
_images/Spatial_data_alignment_Reproducibility_with_original_data_25_2.png
[20]:
# Redefine a generate_3d_spatial_data function to add a third spatial dimension to the spatial data by stacking an array of j*100 values for each group of the list 'groups'

def generate_3d_spatial_data(groups, dataset_name='dlpfc', save=False):
    spatial_3d_data = []

    for i in range(len(groups)):
        rsta_list = []

        for j in range(len(groups[i])):
            a = groups[i][j]
            spatial = a.obsm['spatial']
            spatial_z = np.ones(shape=(spatial.shape[0],1))*j*100
            spatial_3d = np.hstack([spatial,spatial_z])
            a.obsm['spatial_3d'] = spatial_3d
            rsta_list.append(a)

        a_concat = rsta_list[0].concatenate(rsta_list[1:])
        spatial_3d_data.append(a_concat)

        if save:
            a_concat.write_h5ad(f'{dataset_name}_sample{i}_3d.h5ad')

    return spatial_3d_data
[21]:
layer_groups_spatial_3d_data = generate_3d_spatial_data(layer_groups, dataset_name='dlpfc', save=True)
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
[22]:
paste_layer_groups_spatial_3d_data = generate_3d_spatial_data(paste_layer_groups, dataset_name='dlpfc', save=True)
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],

Application with new data

This tutorial demonstrates spatial data alignment on new BaristaSeq mouse data and STARmap mouse data using Pysodb and Paste.

This tutorial refers to the following tutorial at https://github.com/raphael-group/paste_reproducibility/blob/main/notebooks/DLPFC_pairwise.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:
# Imports various packages for data analysis and visualization.
# pandas: used for data manipulation and analysis.
import pandas as pd
# numpy: used for numerical computing, including mathematical operations on arrays and matrices.
import numpy as np
# seaborn: used for statistical data visualization, providing high-level interfaces for creating informative and attractive visualizations.
import seaborn as sns
# matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib import style
# time: provides time-related functions, such as measuring execution time and converting between time formats.
import time
# scanpy: a Python package for single-cell gene expression analysis, including preprocessing, clustering, and differential expression analysis.
import scanpy as sc
# networkx: a Python package for creating, manipulating, and studying complex networks.
import networkx as nx
style.use('seaborn-white')
/tmp/ipykernel_72494/185176416.py:20: MatplotlibDeprecationWarning: The seaborn styles shipped by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn. However, they will remain available as 'seaborn-v0_8-<style>'. Alternatively, directly use the seaborn API instead.
  style.use('seaborn-white')
[2]:
# Import paste package
import paste as pst

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()

[5]:
# Load two kind data
adata_list_baristaseq = sodb.load_dataset('Sun2021Integrating')
adata_starmap = sodb.load_dataset('Dataset11_MS_raw')['Dataset11']

load experiment[Slice_1] in dataset[Sun2021Integrating]
load experiment[Slice_3] in dataset[Sun2021Integrating]
load experiment[Slice_2] in dataset[Sun2021Integrating]
load experiment[Dataset11] in dataset[Dataset11_MS_raw]

Preparation

[6]:
# Create a adata_list_starmap dictionary and split adata_starmap into subsets based "slice_id"
adata_list_starmap = {}
for si in adata_starmap.obs['slice_id'].cat.categories:
    a = adata_starmap[adata_starmap.obs['slice_id']==si]
    a.obs['layer'] = a.obs['gt'].astype('str')
    a.obs['layer'] = a.obs['layer'].astype('category')
    adata_list_starmap[a.obs['slice_id'][0]] = a
/tmp/ipykernel_72494/4197310715.py:5: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  a.obs['layer'] = a.obs['gt'].astype('str')
/tmp/ipykernel_72494/4197310715.py:5: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  a.obs['layer'] = a.obs['gt'].astype('str')
/tmp/ipykernel_72494/4197310715.py:5: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  a.obs['layer'] = a.obs['gt'].astype('str')
[7]:
# Combine adata_list_baristaseq and adata_list_starmap
adata_list = adata_list_baristaseq
adata_list.update(adata_list_starmap)
[8]:
# Define a list containing the keys of adata_list
sample_list = ["Slice_1", "Slice_2", "Slice_3",'BZ5','BZ9','BZ14']
[9]:
# Define a function called rotate_translate to generate a random rotation and translation
import numpy as np
import random

def rotate_translate(matrix):
    # Create rotation matrix
    theta = random.uniform(0, 2 * np.pi)
    rotation_matrix = np.array([[np.cos(theta), -np.sin(theta)],
                                [np.sin(theta), np.cos(theta)]])

    # Calculate translation bounds
    max_coords = np.max(matrix, axis=0)
    min_coords = np.min(matrix, axis=0)
    translation_bounds = 0.5 * (max_coords - min_coords)

    # Generate random translation vector within bounds
    translation_vector = np.array([random.uniform(-translation_bounds[0], translation_bounds[0]),
                                   random.uniform(-translation_bounds[1], translation_bounds[1])])

    # Apply rotation and translation
    new_matrix = np.dot(matrix, rotation_matrix) + translation_vector

    return new_matrix
[10]:
# Remove either 'VISp' or 'outside_VISp' for each sub-dataset in adata_list
# Apply a transformation to 'spatial' of each sub-dataset via rotate_translate()
adatas = {}
for key in sample_list:
    a = adata_list[key]
    a = a[np.logical_not((a.obs['layer']=='VISp') | (a.obs['layer']=='outside_VISp'))]

    new_spatial = rotate_translate(a.obsm['spatial'])
    a.obsm['spatial'] = new_spatial
    adatas[key] = a
[11]:
# Define two lists, sample_groups and layer_groups, that organize data into different groups based on sample slices and brain regions
# Map specific brain regions to colors using a dictionary
# The seaborn color palette is used to assign colors to each brain region
sample_groups = [["Slice_1", "Slice_2", "Slice_3"],['BZ5','BZ9','BZ14',]]
layer_groups = [[adatas[sample_groups[j][i]] for i in range(len(sample_groups[j]))] for j in range(len(sample_groups))]
cmp = sns.color_palette()
layer_to_color_map = {
    'VISp_I':cmp[0],
    'VISp_II/III':cmp[1],
    'VISp_IV':cmp[2],
    'VISp_V':cmp[3],
    'VISp_VI':cmp[4],
    'VISp_wm':cmp[5],
    '1':cmp[6],
    '2':cmp[7],
    '3':cmp[8],
    '4':cmp[9],
}

[12]:
# Redefine a function called visualize_slices to generate a visualization of tissue regions for different samples and different slices
def visualize_slices(layer_groups, adatas, sample_list, slice_map, layer_to_color_map, key):
    n_rows = len(layer_groups)
    n_cols = len(layer_groups[0])

    plot, axs = plt.subplots(n_rows, n_cols, figsize=(15, 11.5))

    for j in range(n_rows):
        axs[j, 0].text(-0.1, 0.5, 'Sample ' + slice_map[j], fontsize=12, rotation='vertical', transform=axs[j, 0].transAxes, verticalalignment='center')
        for i in range(n_cols):
            adata = adatas[sample_list[j * n_cols + i]]
            colors = list(adata.obs[key].astype('str').map(layer_to_color_map))
            colors = [(r, g, b) for r, g, b in colors]

            axs[j, i].scatter(layer_groups[j][i].obsm['spatial'][:, 0], layer_groups[j][i].obsm['spatial'][:, 1], linewidth=0, s=20, marker=".",
                              color=colors
                              )
            axs[j, i].set_title('Slice ' + slice_map[i], size=12)
            axs[j, i].invert_yaxis()
            axs[j, i].axis('off')
            axs[j, i].axis('equal')
            """
            if i<n_cols-1:
                s = '300$\mu$m' if i==1 else "10$\mu$m"
                delta = 0.05 if i==1 else 0
                axs[j,i].annotate('',xy=(1-delta, 0.5), xytext=(1.2+delta, 0.5),xycoords=axs[j,i].transAxes,textcoords=axs[j,i].transAxes,arrowprops=dict(arrowstyle='<->',lw=1))
                axs[j,0].text(1.1, 0.55, s,fontsize=9,transform = axs[j,i].transAxes,horizontalalignment='center')
            """
        legend_handles = [mpatches.Patch(color=layer_to_color_map[adata.obs[key].cat.categories[i]], label=adata.obs[key].cat.categories[i]) for i in range(len(adata.obs[key].cat.categories))]
        axs[j, n_cols-1].legend(handles=legend_handles, fontsize=10, title='Cortex layer', title_fontsize=12, bbox_to_anchor=(1, 1))

    return plot
[13]:
# Define a dictionary for slice map
slice_map = {0: 'A', 1: 'B', 2: 'C'}
# Visualize the different slices mapped by layer_groups
plot = visualize_slices(layer_groups, adatas, sample_list, slice_map, layer_to_color_map, key= 'layer')
plt.show()
_images/Spatial_data_alignment_Application_with_new_data_17_0.png

Running PASTE for alignment

[14]:
# Redefine compute_pairwise_alignment function to compute the pairwise alignment between consecutive layers within each layer group, stores the resulting mappings in a list called pis
def compute_pairwise_alignment(groups, alpha=0.1):
    pis = [[None for i in range(len(groups[j])-1)] for j in range(len(groups))]

    for j in range(len(groups)):
        for i in range(len(groups[j])-1):
            pi0 = pst.match_spots_using_spatial_heuristic(groups[j][i].obsm['spatial'], groups[j][i+1].obsm['spatial'], use_ot=True)
            start = time.time()
            pis[j][i] = pst.pairwise_align(groups[j][i], groups[j][i+1], alpha=alpha, G_init=pi0, norm=True, verbose=False)
            tt = time.time() - start
            print(j, i, 'time', tt)

    return pis
[15]:
pis = compute_pairwise_alignment(groups = layer_groups, alpha=0.1)
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
0 0 time 20.315631866455078
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
0 1 time 26.11185622215271
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
1 0 time 6.043396711349487
Using selected backend cpu. If you want to use gpu, set use_gpu = True.
1 1 time 10.50022840499878

When encountering the error “Error: module ‘ot.gromov’ has no attribute ‘cg’”, users should first activate the virtual environment at the terminal and then downgrade POT with the following command:”pip install POT==0.8.2”.

[16]:
# Align spatial coordinates of sequential pairwise slices
paste_layer_groups = [pst.stack_slices_pairwise(layer_groups[j], pis[j]) for j in range(len(layer_groups)) ]
[17]:
paste_layer_groups
[17]:
[[AnnData object with n_obs × n_vars = 1525 × 79
      obs: 'Slice', 'x', 'y', 'Dist to pia', 'Dist to bottom', 'Angle', 'unused-1', 'unused-2', 'x_um', 'y_um', 'depth_um', 'layer', 'leiden'
      uns: 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  AnnData object with n_obs × n_vars = 2042 × 79
      obs: 'Slice', 'x', 'y', 'Dist to pia', 'Dist to bottom', 'Angle', 'unused-1', 'unused-2', 'x_um', 'y_um', 'depth_um', 'layer', 'leiden'
      uns: 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances',
  AnnData object with n_obs × n_vars = 1690 × 79
      obs: 'Slice', 'x', 'y', 'Dist to pia', 'Dist to bottom', 'Angle', 'unused-1', 'unused-2', 'x_um', 'y_um', 'depth_um', 'layer', 'leiden'
      uns: 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
      obsm: 'X_pca', 'X_umap', 'spatial'
      varm: 'PCs'
      obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'],
 [AnnData object with n_obs × n_vars = 1049 × 166
      obs: 'ct', 'gt', 'slice_id', 'batch', 'layer'
      uns: 'moranI', 'spatial_neighbors'
      obsm: 'spatial'
      obsp: 'spatial_connectivities', 'spatial_distances',
  AnnData object with n_obs × n_vars = 1053 × 166
      obs: 'ct', 'gt', 'slice_id', 'batch', 'layer'
      uns: 'moranI', 'spatial_neighbors'
      obsm: 'spatial'
      obsp: 'spatial_connectivities', 'spatial_distances',
  AnnData object with n_obs × n_vars = 1088 × 166
      obs: 'ct', 'gt', 'slice_id', 'batch', 'layer'
      uns: 'moranI', 'spatial_neighbors'
      obsm: 'spatial'
      obsp: 'spatial_connectivities', 'spatial_distances']]
[18]:
#  Define a function to plots spatial coordinates of cells in different groups
def plot_slices_overlap(groups, adatas, sample_list, layer_to_color_map,save=None):
    marker_list = ['*','^','+']
    for j in range(len(groups)):
        plt.figure(figsize=(10,10))
        for i in range(len(groups[j])):
            adata = adatas[sample_list[j*3+i]]
            colors = list(adata.obs['layer'].astype('str').map(layer_to_color_map))
            plt.scatter(groups[j][i].obsm['spatial'][:,0],groups[j][i].obsm['spatial'][:,1],linewidth=1,s=80, marker=marker_list[i],color=colors,alpha=0.8)
        plt.legend(handles=[mpatches.Patch(color=layer_to_color_map[adata.obs['layer'].cat.categories[i]], label=adata.obs['layer'].cat.categories[i]) for i in range(len(adata.obs['layer'].cat.categories))],fontsize=10,title='Cortex layer',title_fontsize=15,bbox_to_anchor=(1, 1))
        plt.gca().invert_yaxis()
        plt.axis('off')
        plt.axis('equal')
        if save is None:
            plt.show()
        else:
            plt.savefig(f'{save}_{j}.pdf',bbox_inches='tight',transparent=True)
[19]:
# Plot Stacking of slices without alignment
plot_slices_overlap(layer_groups, adatas, sample_list, layer_to_color_map)

_images/Spatial_data_alignment_Application_with_new_data_25_0.png
_images/Spatial_data_alignment_Application_with_new_data_25_1.png
[20]:
# Plot Stacking of slices with PASTE alignment
plot_slices_overlap(paste_layer_groups, adatas, sample_list, layer_to_color_map)

_images/Spatial_data_alignment_Application_with_new_data_26_0.png
_images/Spatial_data_alignment_Application_with_new_data_26_1.png
[21]:
# Redefine a generate_3d_spatial_data function to add a third spatial dimension to the spatial data by stacking an array of j*100 values for each group of the list 'groups'

def generate_3d_spatial_data(groups, dataset_name='new', save=False):
    spatial_3d_data = []

    for i in range(len(groups)):
        rsta_list = []

        for j in range(len(groups[i])):
            a = groups[i][j]
            spatial = a.obsm['spatial']
            spatial_z = np.ones(shape=(spatial.shape[0],1))*j*100
            spatial_3d = np.hstack([spatial,spatial_z])
            a.obsm['spatial_3d'] = spatial_3d
            rsta_list.append(a)

        a_concat = rsta_list[0].concatenate(rsta_list[1:])
        spatial_3d_data.append(a_concat)

        if save:
            a_concat.write_h5ad(f'{dataset_name}_sample{i}_3d.h5ad')

    return spatial_3d_data
[22]:
layer_groups_spatial_3d_data = generate_3d_spatial_data(layer_groups, dataset_name='new', save=True)
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
[23]:
paste_layer_groups_spatial_3d_data = generate_3d_spatial_data(paste_layer_groups, dataset_name='new', save=True)
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/linsenlin/anaconda3/envs/alignment/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],

Spatial spot deconvolution

Installation

This tutorial demonstrates how to install Pysodb on a method which use for spatial spot deconvolution.

Using tangram as an example, install Pysodb in its installation environment.

Reference tutorials presents can be found at https://github.com/broadinstitute/Tangram and https://github.com/TencentAILabHealthcare/pysodb.

Installing softwares and tools

1. The first step is to install Visual Studio Code, Conda, Jupyter notebook and CUDA in advance.

Reference tutorials presents how to install Reference tutorials presents how to install Visual Studio Code, Conda and Jupyter notebook, respectively. And they can be found at https://code.visualstudio.com/Docs/setup/setup-overview, https://code.visualstudio.com/docs/python/environments#_activating-an-environment-in-the-terminal and https://code.visualstudio.com/docs/datascience/data-science-tutorial., respectively.

2. Launch Visual Studio Code and open a terminal window.

Henceforth, various packages or modules will be installed via the command line

Installation Tangram

3. Select the installation path and open it
[ ]:
cd <path>
4. Create a conda environment
[ ]:
conda create -n <environment_name> python=3.8
5. Activate a conda environment

Run the following command on the terminal to activate the conda environment:

[ ]:
conda activate <environment_name>
6. Install Tangram package
[ ]:
pip install tangram-sc
7. Install squidpy package
[ ]:
pip install squidpy

Installation Pysodb

Keep the conda environment active

8. Clone Pysodb code
[ ]:
git clone https://github.com/TencentAILabHealthcare/pysodb.git

If cloning the code fails through git, please download it at https://github.com/TencentAILabHealthcare/pysodb, upload it to the folder created above, and extract it.

9. Open the Pysodb directory
[ ]:
cd pysodb
10. Install a Pysodb package from source code
[ ]:
python setup.py install

If “error: urllib3 2.0.0a3 is installed but urllib3<1.27,>=1.21.1 is required by {‘requests’}” appears, users can execute the following commands,respectively.

[ ]:
pip install 'urllib3>=1.21.1,<1.27'
python setup.py install
[1]:
print('finish!')
finish!

Reproducibility with original data

This tutorial demonstrates deconvolution to map cell types of the mouse cortex from sc-RNA-seq data to Visium data using Pysodb and Tangram.

A reference paper can be found at https://www.nature.com/articles/s41592-021-01264-7.

This tutorial refers to the following tutorial at https://squidpy.readthedocs.io/en/stable/external_tutorials/tutorial_tangram.html. At the same time, the way of loadding data is modified by using Pysodb.

The single cell data utilized in the tutorial can be accessed directly from Figshare at https://figshare.com/articles/dataset/Visium/22332667.

Import packages and set configurations

[1]:
# Import several Python packages, including:
# scanpy: a Python package for single-cell RNA sequencing analysis
import scanpy as sc
# squidpy: a Python package for spatial transcriptomics analysis
import squidpy as sq
# numpy: a Python package for scientific computing with arrays
import numpy as np
# pandas: a Python package for data manipulation and analysis
import pandas as pd

When encountering the Error “No module named ‘squidpy’”, users should activate the virtual environment at the terminal and execute “pip install squidpy”.

[2]:
# Import tangram for spatial deconvolution
import tangram as tg
[3]:
# print a header message, and the version of the squidpy and tangram packages
sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
print(f"tangram=={tg.__version__}")
scanpy==1.9.3 anndata==0.8.0 umap==0.5.3 numpy==1.22.4 scipy==1.9.1 pandas==1.5.3 scikit-learn==1.2.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.2.3
tangram==1.0.4
[4]:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import palettable

adjusted_qualitative_colors = [
    '#5e81ac', '#f47b56', '#7eaca9', '#e28b90', '#ab81bd', '#b68e7e', '#df8cc4', '#7f7f7f', '#bcbd22', '#17becf',
    '#aec7e8', '#ffbb78', '#98df8a', '#ff9896', '#c5b0d5', '#c49c94', '#f7b6d2', '#c7c7c7', '#dbdb8d', '#9edae5',
    '#393b79', '#5254a3', '#6b6ecf', '#9c9ede', '#637939', '#8ca252', '#b5cf6b', '#cedb9c', '#8c6d31', '#bd9e39'
]

# Create adjusted custom qualitative colormap
adjusted_qualitative_cmap = ListedColormap(adjusted_qualitative_colors)

# Example of using the custom colormap with Scanpy
# sc.pl.umap(adata, color='gene_name', cmap=adjusted_qualitative_cmap)
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the Error “No module named ‘palettable’”, users should activate the virtual environment at the terminal and execute “pip install pip install palettable”.

Load a single cell dataset

[5]:
# Load the reference single cell dataset
# The input sc data has been normalized and log-transformed
adata_sc = sc.read_h5ad('data/Visium/sc_mouse_cortex.h5ad')
[6]:
# Print out the metadata of adata_sc
adata_sc
[6]:
AnnData object with n_obs × n_vars = 21697 × 36826
    obs: 'sample_name', 'organism', 'donor_sex', 'cell_class', 'cell_subclass', 'cell_cluster', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts'
    var: 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cell_class_colors', 'cell_subclass_colors', 'hvg', 'neighbors', 'pca', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'
[7]:
# Visualize a UMAP projection colored by cell_subclass
fig,ax = plt.subplots(figsize=(4,4))
sc.pl.embedding(adata_sc,basis='umap',color=['cell_subclass'],ax=ax,show=False,palette=adjusted_qualitative_colors,s=5)
[7]:
<Axes: title={'center': 'cell_subclass'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_12_1.png

Streamline development of loading spatial data with Pysodb

[8]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[9]:
# Initialization
sodb = pysodb.SODB()
[10]:
# Define the name of the dataset_name and experiment_name
dataset_name = 'Biancalani2021Deep'
experiment_name = 'visium_fluo_crop'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata_st = sodb.load_experiment(dataset_name,experiment_name)
load experiment[visium_fluo_crop] in dataset[Biancalani2021Deep]
[11]:
adata_st
[11]:
AnnData object with n_obs × n_vars = 704 × 16562
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster'
    var: 'gene_ids', 'feature_types', 'genome', 'MT', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'dispersions', 'dispersions_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[12]:
# Create a spatial scatter plot colored by cluster label
cmp_ct = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct.append('gray')
cmp_ct_cmp = ListedColormap(cmp_ct)

ax = sq.pl.spatial_scatter(adata_st,color='cluster',size=1.2,palette=cmp_ct_cmp)
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_18_0.png

Preparation

[13]:
# Visualize embedding base on 'spatial' with points colored by 'cluster' label
sc.pl.embedding(adata_st,basis='spatial',color='cluster')
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_20_0.png
[14]:
# select a subset based on the "Cortex_{i}" of 'adata_st.obs.cluster'
# And creates a copy of the resulting subset

adata_st = adata_st[
    adata_st.obs.cluster.isin([f"Cortex_{i}" for i in np.arange(1, 5)])
].copy()
[15]:
# Visualize embedding base on 'spatial' with points colored by a new 'cluster' label
sc.pl.embedding(adata_st,basis='spatial',color='cluster')
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_22_0.png
[16]:
cmp_ct = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct_cmp = ListedColormap(cmp_ct)

ax = sq.pl.spatial_scatter(adata_st,color='cluster',size=1.2,palette=cmp_ct_cmp)
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_23_0.png
[17]:
# Perform differential gene expression analysis across 'cell_subclasses' in 'adata_sc'
sc.tl.rank_genes_groups(adata_sc, groupby="cell_subclass", use_raw=False)
WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
[18]:
# Create a Pandas DataFrame called "markers_df" by extracting the top 100 differentially expressed genes from 'adata_sc'
markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
# Create a NumPy array called "genes_sc" by extracting the unique values from the "value" column of a melted version of the "markers_df"
genes_sc = np.unique(markers_df.melt().value.values)
# Extracte the names of genes from "adata_st"
genes_st = adata_st.var_names.values
# Create a Python list called "genes"
# Contain the intersection of genes identified as differentially expressed in  "genes_sc" and genes detected in "genes_st".
genes = list(set(genes_sc).intersection(set(genes_st)))
# The length of "genes"
len(genes)
[18]:
1281

Perform Tangram for alignment

[19]:
# Use the Tangram to align the gene expression profiles of "adata_sc" and "adata_st" based on the shared set of genes identified by the intersection of "genes_sc" and "genes_st".
tg.pp_adatas(adata_sc, adata_st, genes=genes)
INFO:root:1280 training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.
INFO:root:14785 overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.
INFO:root:uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata.
INFO:root:rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata.
[20]:
# Use the map_cells_to_space function from the tangram to map cells from "adata_sc" onto "adata_st".
# The mapping use "cells" mode, which assign each cell from adata_sc to a location within the spatial transcriptomics space based on its gene expression profile.
ad_map = tg.map_cells_to_space(
    adata_sc,
    adata_st,
    mode="cells",
    # target_count=adata_st.obs.cell_count.sum(),
    # density_prior=np.array(adata_st.obs.cell_count) / adata_st.obs.cell_count.sum(),
    num_epochs=1000,
    device="cpu",
)
INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 1280 genes and rna_count_based density_prior in cells mode...
INFO:root:Printing scores every 100 epochs.
Score: 0.613, KL reg: 0.001
Score: 0.733, KL reg: 0.000
Score: 0.736, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.738, KL reg: 0.000
Score: 0.738, KL reg: 0.000
INFO:root:Saving results..
[21]:
ad_map
[21]:
AnnData object with n_obs × n_vars = 21697 × 324
    obs: 'sample_name', 'organism', 'donor_sex', 'cell_class', 'cell_subclass', 'cell_cluster', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts'
    var: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster', 'uniform_density', 'rna_count_based_density'
    uns: 'train_genes_df', 'training_history'
[22]:
# Project "Cell_subclass" annotations from a single-cell RNA sequencing (scRNA-seq) dataset onto a spatial transcriptomics dataset, based on a previously computed cell-to-space mapping
tg.project_cell_annotations(ad_map, adata_st, annotation="cell_subclass")
INFO:root:spatial prediction dataframe is saved in `obsm` `tangram_ct_pred` of the spatial AnnData.
[23]:
# Transfer cell type predictions from the AnnData object's ‘obsm’ attribute (adata_st.obsm['tangram_ct_pred']) to its observation metadata (adata_st.obs)
for ct in adata_st.obsm['tangram_ct_pred'].columns:
    adata_st.obs[ct] = np.array(adata_st.obsm['tangram_ct_pred'][ct].values)
[24]:
# Print adata_st.obsm['tangram_ct_pred']
adata_st.obsm['tangram_ct_pred']
[24]:
Pvalb L4 Vip L2/3 IT Lamp5 NP Sst L5 IT Oligo L6 CT ... L5 PT Astro L6b Endo Peri Meis2 Macrophage CR VLMC SMC
AAATGGCATGTCTTGT-1 7.004985 1.548934 6.902157 0.001669 4.129408 4.114066 3.220240 3.820235 0.288011 8.503994 ... 7.850845 2.551129 0.000456 0.421885 0.000057 0.000061 1.297545 0.071324 0.129917 0.685547
AACAACTGGTAGTTGC-1 4.209501 0.000783 13.903717 0.192844 4.696000 3.499797 6.033508 9.985192 0.456206 2.967823 ... 7.369421 1.928740 0.928254 0.526186 0.107533 0.000130 0.547133 0.079866 0.000185 0.275435
AACAGGAAATCGAATA-1 4.682822 0.526977 7.370772 0.571306 6.074437 1.000437 7.754989 5.081327 0.396358 15.313167 ... 1.394540 2.027951 0.473381 0.544031 0.228793 1.153768 0.685693 0.000441 0.000177 0.262540
AACCCAGAGACGGAGA-1 8.718892 4.313211 4.679913 3.914560 8.065018 0.000336 7.402516 9.868730 0.476459 2.293070 ... 0.230093 2.908436 0.000443 0.586561 0.000053 0.475399 0.902312 0.000050 0.578500 0.581497
AACCGTTGTGTTTGCT-1 8.815555 5.802182 4.890625 1.156238 5.074965 0.487821 8.204679 14.393296 1.581857 0.000475 ... 2.226670 1.070311 1.225542 1.491899 0.074052 0.000590 0.215931 0.055336 0.000093 0.551992
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TTGGATTGGGTACCAC-1 5.916564 2.063360 13.772404 1.952126 2.691196 2.123059 12.909450 9.272922 0.410706 0.005365 ... 6.397924 2.403446 0.000223 0.458006 0.038058 0.000083 0.227499 0.018278 0.252095 0.642331
TTGGCTCGCATGAGAC-1 2.514913 4.281378 4.322568 10.849935 8.974291 0.000461 13.464813 9.720862 0.139286 0.609628 ... 0.000646 0.838765 0.076363 0.416698 0.059241 0.025889 0.272974 0.000290 0.260999 0.389371
TTGTATCACACAGAAT-1 3.621018 0.001693 6.560889 1.507911 5.990241 4.790297 8.296551 8.652060 0.563544 5.002628 ... 3.863253 0.750675 1.071313 0.396808 0.058517 0.034917 0.193589 0.000341 0.272419 0.238980
TTGTGGCCCTGACAGT-1 9.345364 2.639463 9.780218 0.001793 0.349541 0.998307 4.595858 5.134462 0.792228 7.001610 ... 4.198365 2.206220 1.493556 0.653867 0.068874 0.000080 0.696597 0.000134 0.323000 0.212188
TTGTTAGCAAATTCGA-1 4.926483 22.468429 5.517707 0.864528 1.313571 0.000482 5.217501 12.868741 0.667749 0.740462 ... 0.435651 1.406598 0.000301 0.509059 0.103536 0.436183 0.113370 0.000399 0.191394 0.058175

324 rows × 23 columns

[25]:
# Create a spatial embedding plot that visualizes the distribution of diverse cell types
from palettable.colorbrewer.sequential import YlGnBu_9
to_plot_list = ['VLMC','Astro',"L2/3 IT", "L4", "L5 IT", "L5 PT", "L6 CT", "L6 IT", "L6b"]

for to_plot in to_plot_list:
    sq.pl.spatial_scatter(
        adata_st,

        color=to_plot,
        cmap=YlGnBu_9.mpl_colormap
    )
    to_plot = to_plot.replace('/','_')
    to_plot = to_plot.replace(' ','_')
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_0.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_1.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_2.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_3.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_4.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_5.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_6.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_7.png
_images/Spatial_spot_deconvolution_Reproducibility_with_original_data_33_8.png

Application with new data

This tutorial demonstrates deconvolution on new ST human pancreatic cancer data using Pysodb and Tangram.

This tutorial refers to the following tutorial at https://squidpy.readthedocs.io/en/stable/external_tutorials/tutorial_tangram.html. At the same time, the way of loadding data is modified by using Pysodb.

The single cell data utilized in the tutorial is directly available at https://figshare.com/articles/dataset/PDAC/22332574.

Import packages and set configurations

[1]:
# Import several Python packages, including:
# scanpy: a Python package for single-cell RNA sequencing analysis
import scanpy as sc
# squidpy: a Python package for spatial transcriptomics analysis
import squidpy as sq
# numpy: a Python package for scientific computing with arrays
import numpy as np
# pandas: a Python package for data manipulation and analysis
import pandas as pd
# anndata: a Python package for handling annotated data objects in genomics
import anndata as ad

When encountering the Error “No module named ‘squidpy’”, users should activate the virtual environment at the terminal and execute “pip install squidpy”.

[2]:
# Import tangram for spatial deconvolution
import tangram as tg
[3]:
# Print a header message, and the version of the squidpy and tangram packages
sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
print(f"tangram=={tg.__version__}")
scanpy==1.9.3 anndata==0.8.0 umap==0.5.3 numpy==1.22.4 scipy==1.9.1 pandas==1.5.3 scikit-learn==1.2.2 statsmodels==0.13.5 python-igraph==0.10.4 pynndescent==0.5.8
squidpy==1.2.3
tangram==1.0.4
[4]:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import palettable
# Define a list of color hex codes
adjusted_qualitative_colors = [
    '#5e81ac', '#f47b56', '#7eaca9', '#e28b90', '#ab81bd', '#b68e7e', '#df8cc4', '#7f7f7f', '#bcbd22', '#17becf',
    '#aec7e8', '#ffbb78', '#98df8a', '#ff9896', '#c5b0d5', '#c49c94', '#f7b6d2', '#c7c7c7', '#dbdb8d', '#9edae5',
    '#393b79', '#5254a3', '#6b6ecf', '#9c9ede', '#637939', '#8ca252', '#b5cf6b', '#cedb9c', '#8c6d31', '#bd9e39'
]

# Create adjusted custom qualitative colormap
adjusted_qualitative_cmap = ListedColormap(adjusted_qualitative_colors)

# Example of using the custom colormap with Scanpy
# sc.pl.umap(adata, color='gene_name', cmap=adjusted_qualitative_cmap)

# Create a list of 10 soft pastel colormap
cmp_ct = palettable.cartocolors.qualitative.Pastel_10.mpl_colors

When encountering the Error “No module named ‘palettable’”, users should activate the virtual environment at the terminal and execute “pip install pip install palettable”.

Load a single cell dataset

[5]:
# Read CSV files
#pd_sc = pd.read_csv('data/pdac/sc_data.csv')
#pd_sc_meta = pd.read_csv('data/pdac/sc_meta.csv')
pd_sc = pd.read_csv('data/pdac/sc_data.csv')
pd_sc_meta = pd.read_csv('data/pdac/sc_meta.csv')
[6]:
# Set the index
pd_sc = pd_sc.set_index('Unnamed: 0')
pd_sc_meta = pd_sc_meta.set_index('Cell')
[7]:
# Extract gene names and cell identifiers
sc_genes = np.array(pd_sc.index)
sc_obs = np.array(pd_sc.columns)
# Transpose the expression data
sc_X = np.array(pd_sc.values).transpose()
[8]:
#Create an AnnData object
adata_sc = ad.AnnData(sc_X)
adata_sc.var_names = sc_genes
adata_sc.obs_names = sc_obs
# Add cell type information to the AnnData object
adata_sc.obs['CellType'] = pd_sc_meta['Cell_type'].values
[9]:
# Print out the metadata of adata_sc
adata_sc
[9]:
AnnData object with n_obs × n_vars = 1926 × 19104
    obs: 'CellType'
[10]:
# Preprocess scRNA-seq data by selecting highly variable genes, normalizing expression values per cell, and applying a log transformation
sc.pp.highly_variable_genes(adata_sc, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata_sc, target_sum=1e4)
sc.pp.log1p(adata_sc)
[11]:
# Perform dimensionality reduction and constructs a neighborhood graph
sc.pp.pca(adata_sc)
sc.pp.neighbors(adata_sc)
sc.tl.umap(adata_sc)
[12]:
# Visualize the UMAP results
fig,ax = plt.subplots(figsize=(4,4))
sc.pl.embedding(adata_sc,basis='umap',color=['CellType'],ax=ax,show=False,palette=adjusted_qualitative_colors,s=20)
[12]:
<Axes: title={'center': 'CellType'}, xlabel='UMAP1', ylabel='UMAP2'>
_images/Spatial_spot_deconvolution_Application_with_new_data_17_1.png

Streamline development of loading spatial data with Pysodb

[13]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[14]:
# Initialization
sodb = pysodb.SODB()
[15]:
# Define the name of the dataset_name and experiment_name
dataset_name = 'moncada2020integrating'
experiment_name = 'GSM3036911_spatial_transcriptomics'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata_st = sodb.load_experiment(dataset_name,experiment_name)
load experiment[GSM3036911_spatial_transcriptomics] in dataset[moncada2020integrating]
[16]:
# Preprocess data
sc.pp.highly_variable_genes(adata_st, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata_st, target_sum=1e4)
sc.pp.log1p(adata_st)
[17]:
# Dimensionality reduction and neighborhood graph construction
sc.pp.pca(adata_st)
sc.pp.neighbors(adata_st)
sc.tl.umap(adata_st)
# Cluster cells using the Leiden algorithm
sc.tl.leiden(adata_st,resolution=0.7)
[18]:
# Visualize the spatial embedding
ax = sc.pl.embedding(adata_st,basis='spatial',color=['leiden'],palette=cmp_ct,show=False)
ax.axis('equal')
[18]:
(0.5999999999999999, 31.4, 6.7, 35.3)
_images/Spatial_spot_deconvolution_Application_with_new_data_24_1.png

Preparation

[19]:
# Perform differential gene expression analysis across 'CellType' in 'adata_sc'
sc.tl.rank_genes_groups(adata_sc, groupby="CellType", use_raw=False)
WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
[20]:
# Extract the top 100 marker genes per cell type from single-cell dataset
markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
# Find unique marker genes
genes_sc = np.unique(markers_df.melt().value.values)
# Get gene names from spatial transcriptomics dataset
genes_st = adata_st.var_names.values
# Find the intersection of gene sets
genes = list(set(genes_sc).intersection(set(genes_st)))
# The length of "genes"
len(genes)
[20]:
1103

Perform Tangram for alignment

[21]:
# Use the Tangram to align the gene expression profiles of "adata_sc" and "adata_st" based on the shared set of genes identified by the intersection of "genes_sc" and "genes_st".
tg.pp_adatas(adata_sc, adata_st, genes=genes)
INFO:root:1079 training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.
INFO:root:13775 overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.
INFO:root:uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata.
INFO:root:rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata.
[22]:
# Use the map_cells_to_space function from the tangram to map cells from "adata_sc")" onto "adata_st".
# The mapping use "cells" mode, which assign each cell from adata_sc to a location within the spatial transcriptomics space based on its gene expression profile.
ad_map = tg.map_cells_to_space(
    adata_sc,
    adata_st,
    mode="cells",
    # target_count=adata_st.obs.cell_count.sum(),
    # density_prior=np.array(adata_st.obs.cell_count) / adata_st.obs.cell_count.sum(),
    num_epochs=1000,
    device="cpu",
)
INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 1079 genes and rna_count_based density_prior in cells mode...
INFO:root:Printing scores every 100 epochs.
Score: 0.340, KL reg: 0.113
Score: 0.587, KL reg: 0.001
Score: 0.591, KL reg: 0.001
Score: 0.592, KL reg: 0.001
Score: 0.592, KL reg: 0.001
Score: 0.592, KL reg: 0.001
Score: 0.592, KL reg: 0.001
Score: 0.592, KL reg: 0.001
Score: 0.592, KL reg: 0.001
Score: 0.592, KL reg: 0.001
INFO:root:Saving results..
[23]:
# Project "CellType" annotations from a single-cell RNA sequencing (scRNA-seq) dataset onto a spatial transcriptomics dataset, based on a previously computed cell-to-space mapping
tg.project_cell_annotations(ad_map, adata_st, annotation="CellType")
INFO:root:spatial prediction dataframe is saved in `obsm` `tangram_ct_pred` of the spatial AnnData.
[24]:
# Create new columns in "adata_st.obs" that correspond to the values in "adata_st.obsm['tangram_ct_pred']"
for ct in adata_st.obsm['tangram_ct_pred'].columns:
    adata_st.obs[ct] = np.array(adata_st.obsm['tangram_ct_pred'][ct].values)
[25]:
# Print adata_st.obsm['tangram_ct_pred']
adata_st.obsm['tangram_ct_pred']
[25]:
Acinar cells Ductal Cancer clone A Cancer clone B mDCs Tuft cells pDCs Endocrine cells Endothelial cells Macrophages Mast cells T cells NK cells Monocytes RBCs Fibroblasts
spots
10x10 0.009497 6.672646 0.424408 0.368327 0.592680 0.037527 0.085090 0.006506 0.115381 0.300945 0.080646 0.312955 0.104370 0.022690 0.021423
10x13 0.004136 5.520368 0.069313 0.077433 0.118182 0.051383 0.015343 0.005291 0.016948 0.039191 0.044960 0.117301 0.099030 0.043749 0.002422
10x14 0.025058 4.375388 0.174041 0.484484 0.190605 0.016902 0.065869 0.004310 0.035297 0.112845 0.093955 0.062976 0.000027 0.035016 0.004687
10x15 0.042269 3.250308 0.125651 0.129064 0.094200 0.052903 0.071260 0.004727 0.039667 0.066190 0.031777 0.054321 0.117018 0.025751 0.004090
10x16 0.012410 3.084373 0.000099 0.031348 0.000151 0.083004 0.062752 0.004804 0.007365 0.032401 0.036338 0.237107 0.000112 0.003946 0.005928
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9x29 0.005264 4.249710 0.000088 0.401962 0.194918 0.018993 0.034039 0.006070 0.059100 0.033828 0.031062 0.126530 0.000050 0.024517 0.000026
9x30 0.003174 2.255114 0.023105 0.211828 0.038126 0.077600 0.024611 0.004768 0.011907 0.083275 0.065433 0.099359 0.047894 0.003502 0.009234
9x31 0.013383 0.674886 0.325171 0.213344 0.129682 0.025250 0.032006 0.005734 0.009255 0.294749 0.004396 0.155496 0.019770 0.002794 0.004159
9x32 0.039534 1.179861 0.000359 0.001055 0.040355 0.048031 0.000704 0.007145 0.009434 0.043061 0.027588 0.068613 0.017337 0.006709 0.001506
9x33 0.007216 0.775240 0.000538 0.077365 0.029202 0.055797 0.017878 0.005448 0.018875 0.016053 0.025088 0.044807 0.020676 0.019617 0.001025

428 rows × 15 columns

[26]:
adata_st
[26]:
AnnData object with n_obs × n_vars = 428 × 14576
    obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mito', 'log1p_total_counts_mito', 'pct_counts_mito', 'clusters', 'leiden', 'uniform_density', 'rna_count_based_density', 'Acinar cells', 'Ductal', 'Cancer clone A', 'Cancer clone B', 'mDCs', 'Tuft cells', 'pDCs', 'Endocrine cells', 'Endothelial cells', 'Macrophages', 'Mast cells', 'T cells   NK cells', 'Monocytes', 'RBCs', 'Fibroblasts'
    var: 'mito', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_rank', 'variances', 'variances_norm', 'n_cells', 'sparsity'
    uns: 'hvg', 'leiden', 'leiden_colors', 'moranI', 'neighbors', 'pca', 'rank_genes_groups', 'spatial_neighbors', 'umap', 'log1p', 'training_genes', 'overlap_genes'
    obsm: 'X_pca', 'X_umap', 'spatial', 'tangram_ct_pred'
    varm: 'PCs'
    layers: 'raw_count', 'raw_counts'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[27]:
from palettable.colorbrewer.sequential import YlGnBu_9
# to_plot_list = ['Acinar cells','Cancer clone A','Cancer clone B','Ductal']
to_plot_list = adata_sc.obs['CellType'].cat.categories
for to_plot in to_plot_list:
    ax = sc.pl.embedding(
        adata_st,
        basis='spatial',
        color=to_plot,
        show=False,
        color_map=YlGnBu_9.mpl_colormap
    )
    ax.axis('equal')
_images/Spatial_spot_deconvolution_Application_with_new_data_35_0.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_1.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_2.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_3.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_4.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_5.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_6.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_7.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_8.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_9.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_10.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_11.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_12.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_13.png
_images/Spatial_spot_deconvolution_Application_with_new_data_35_14.png

Generalizability to more spatial omics data

SpatiallyVariableGeneDetection_SpatialGenomicsData

This tutorial demonstrates spatially variable gene detection on spatial genomics data using Pysodb and Sepal.

The reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120 and https://www.nature.com/articles/s41586-021-04217-4.

Import packages and set configurations

[1]:
# Numpy is a package for numerical computing with arrays
import numpy as np
[2]:
# Import sepal package and its modules
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Define names of the dataset_name and experiment_name
dataset_name = 'zhao2022spatial'
experiment_name = 'mouse_cerebellum_1_dna_200114_14'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[mouse_cerebellum_1_dna_200114_14] in dataset[zhao2022spatial]
[6]:
adata
[6]:
AnnData object with n_obs × n_vars = 31382 × 2738
    obs: 'leiden'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'moranI', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[7]:
# Save the AnnData object to an H5AD file format.
adata.write_h5ad('mouse_cerebellum_1_dna_200114_14.h5ad')

Perform Sepal to spatially variable gene detection for spatial genomics data

[8]:
# Load in the raw data using a RawData class.
raw_data = d.RawData('mouse_cerebellum_1_dna_200114_14.h5ad')
[9]:
raw_data
[9]:
RawData object
        > loaded from mouse_cerebellum_1_dna_200114_14.h5ad
        > using pixel coordinates
[10]:
# A subclass of the CountData class that uses the UnstructuredData class to hold data from non-Visium or non-ST arrays.
data = m.UnstructuredData(raw_data,
                          eps = 0.1)
[11]:
# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
                    normalize = True,
                    scale =True)
[INFO] : Using 128 workers
[INFO] : Saturated Spots : 30706
100%|██████████| 2738/2738 [00:35<00:00, 76.33it/s]
[12]:
# Selects the top 10 and bottom 10 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 10
n_top = 10
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 10 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 10 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 10 profiles from the times DataFrame
times.loc[top_profiles,:]
[12]:
average
chrM_1_16299 1.0
chr6_60000000_61000000 0.0
chr6_68000000_69000000 0.0
chr6_67000000_68000000 0.0
chr6_66000000_67000000 0.0
chr6_65000000_66000000 0.0
chr6_64000000_65000000 0.0
chr6_63000000_64000000 0.0
chr6_62000000_63000000 0.0
chr6_61000000_62000000 0.0
[13]:
# Inspect detecition visually by using the "plot_profiles function for first 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 5,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[top_profiles,:].values.flatten(),
                          pltargs = pltargs,
                          )
_images/Generalizability_to_more_spatial_omics_data_SpatiallyVariableGeneDetection_SpatialGenomicsData_17_0.png
[14]:
# Inspect detecition visually by using the "plot_profiles function for last 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 5,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[tail_profiles,:].values.flatten(),
                          pltargs = pltargs,
                          )
_images/Generalizability_to_more_spatial_omics_data_SpatiallyVariableGeneDetection_SpatialGenomicsData_18_0.png

SpatiallyVariableGeneDetection_SpatialProteomicsData

This tutorial demonstrates spatially variable gene detection on spatial proteomics data using Pysodb and Sepal.

A reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120 and https://www.cell.com/fulltext/S0092-8674(18)31100-0.

Import packages and set configurations

[1]:
# Numpy is a package for numerical computing with arrays
import numpy as np
[2]:
# Import sepal package and its modules
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Define names of the dataset_name and experiment_name
dataset_name = 'keren2018a'
experiment_name = 'p9'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[p9] in dataset[keren2018a]
[6]:
# Save the AnnData object to an H5AD file format.
adata.write_h5ad('keren2018a_p9.h5ad')

Perform Sepal to spatially variable gene detection for spatial proteomics data

[7]:
# Load in the raw data using a RawData class.
raw_data = d.RawData('keren2018a_p9.h5ad')
[8]:
raw_data
[8]:
RawData object
        > loaded from keren2018a_p9.h5ad
        > using pixel coordinates
[9]:
# A subclass of the CountData class that uses the UnstructuredData class to hold data from non-Visium or non-ST arrays.
data = m.UnstructuredData(raw_data,
                          eps = 0.1)
[10]:
# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
                    normalize = True,
                    scale =True)
[INFO] : Using 128 workers
[INFO] : Saturated Spots : 5806
/home/linsenlin/PROTOCOLS_SODB/Spatially variable gene/sepal/sepal/utils.py:80: RuntimeWarning: invalid value encountered in log2
  return np.log2(x + c)
100%|██████████| 36/36 [00:00<00:00, 2841.88it/s]
[11]:
# Selects the top 10 and bottom 10 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 10
n_top = 10
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 10 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 10 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 10 profiles from the times DataFrame
times.loc[top_profiles,:]
[11]:
average
Vimentin 1.000000
Beta catenin 0.746855
CD45 0.716981
CD45RO 0.646226
H3K9ac 0.632075
CD16 0.566038
phospho-S6 0.531447
CD11b 0.523585
CD11c 0.503145
CD68 0.484277
[12]:
# Inspect detecition visually by using the "plot_profiles function for first 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 15,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[top_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )
_images/Generalizability_to_more_spatial_omics_data_SpatiallyVariableGeneDetection_SpatialProteomicsData_16_0.png
[13]:
# Inspect detecition visually by using the "plot_profiles function for last 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 15,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[tail_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )
/home/linsenlin/PROTOCOLS_SODB/Spatially variable gene/sepal/sepal/utils.py:80: RuntimeWarning: invalid value encountered in log2
  return np.log2(x + c)
_images/Generalizability_to_more_spatial_omics_data_SpatiallyVariableGeneDetection_SpatialProteomicsData_17_1.png

SpatiallyVariableGeneDetection_SpatialMultiomicsData

This tutorial demonstrates spatially variable gene detection on spatial multi-omics data using Pysodb and Sepal.

The reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120 and https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8.

Import packages and set configurations

[1]:
# Numpy is a package for numerical computing with arrays
import numpy as np
[2]:
# Import sepal package and its modules
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut

Streamline development of loading spatial data with Pysodb

[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Define names of the dataset_name and experiment_name
dataset_name = 'liu2020high'
experiment_name = 'E10_whole_gene_best'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[E10_whole_gene_best] in dataset[liu2020high]
[6]:
# Save the AnnData object to an H5AD file format.
adata.write_h5ad('E10_whole_gene_best.h5ad')

Perform Sepal to spatially variable gene detection for spatial multi-omics data

[7]:
# Load in the raw data using a RawData class.
raw_data = d.RawData('E10_whole_gene_best.h5ad')
[8]:
raw_data
[8]:
RawData object
        > loaded from E10_whole_gene_best.h5ad
        > using pixel coordinates
[9]:
# Filter genes observed in less than 5 spots and/or less than 10 total observations
raw_data.cnt = ut.filter_genes(raw_data.cnt,
                               min_expr=10,
                               min_occur=5)
[10]:
# A subclass of the CountData class that uses the UnstructuredData class to hold data from non-Visium or non-ST arrays.
data = m.UnstructuredData(raw_data,
                          eps = 0.1)

[11]:
# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
                    normalize = True,
                    scale =True)
[INFO] : Using 128 workers
[INFO] : Saturated Spots : 819
100%|██████████| 15309/15309 [00:50<00:00, 304.95it/s]
[12]:
# Selects the top 10 and bottom 20 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 10
n_top = 10
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 10 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 10 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 10 profiles from the times DataFrame
times.loc[top_profiles,:]
[12]:
average
Ttn 1.000000
Myl7 0.870286
Epha3 0.836571
Fabp7 0.719429
Sncg 0.693714
Adgrv1 0.671429
Gap43 0.654857
Myh7 0.649714
Onecut2 0.636000
Sox2 0.629714
[13]:
# Inspect detecition visually by using the "plot_profiles function for first 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 25,
               cmap = "magma",
               edgecolor = 'none',
               marker = 'H',
               )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[top_profiles,:].values.flatten(),
                          pltargs = pltargs,
                          )
_images/Generalizability_to_more_spatial_omics_data_SpatiallyVariableGeneDetection_SpatialMultiomicsData_17_0.png
[14]:
# Inspect detecition visually by using the "plot_profiles function for last 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 25,
               cmap = "magma",
               edgecolor = 'none',
               marker = 'H',
               )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[tail_profiles,:].values.flatten(),
                          pltargs = pltargs,
                          )
_images/Generalizability_to_more_spatial_omics_data_SpatiallyVariableGeneDetection_SpatialMultiomicsData_18_0.png

SpatialClustering_SpatialGenomicsData

This tutorial demonstrates how to identify spatial domains on spatial genomics data using Pysodb and Spaceflow.

The reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w and https://www.nature.com/articles/s41586-021-04217-4.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Scanpy is a package for single-cell RNA sequencing analysis.
import scanpy as sc
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'zhao2022spatial'
experiment_name = 'mouse_cerebellum_1_dna_200114_14'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[mouse_cerebellum_1_dna_200114_14] in dataset[zhao2022spatial]

Perform SpaceFlow to spatial clustering for spatial genomics data

[8]:
# Create SpaceFlow Object
sf = SpaceFlow.SpaceFlow(
    count_matrix=adata.X,
    spatial_locs=adata.obsm['spatial'],
    sample_names=adata.obs_names,
    gene_names=adata.var_names
)

When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().” In the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value respectively.

[9]:
# Preprocess data
sf.preprocessing_data()

When dealing with anndata (adata) where the count or expression matrix is extremely sparse, or where there are a very limited number of features (as is often the case with spatial proteomics data), it may be preferable to forego data preprocessing. This is because over-processing in these instances could lead to errors or diminished performance in downstream tasks. To skip preprocessing, user will need to make modifications to the preprocessing_data function within the “SpaceFlow.py” file of the SpaceFlow package. Specifically, user should comment out the sc.pp.normalize_total(), sc.pp.log1p(), and sc.pp.highly_variable_genes() functions.

When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”, in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.

When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.

[10]:
# Train a deep graph network model
embedding = sf.train(
    spatial_regularization_strength=0.1,
    z_dim=50,
    lr=1e-3,
    epochs=1000,
    max_patience=50,
    min_stop=100,
    random_seed=42,
    gpu=0,
    regularization_acceleration=True,
    edge_subset_sz=1000000
)
Epoch 2/1000, Loss: 1.4685497283935547
Epoch 12/1000, Loss: 1.4596877098083496
Epoch 22/1000, Loss: 1.4531407356262207
Epoch 32/1000, Loss: 1.4495148658752441
Epoch 42/1000, Loss: 1.4438341856002808
Epoch 52/1000, Loss: 1.4338985681533813
Epoch 62/1000, Loss: 1.413347601890564
Epoch 72/1000, Loss: 1.3782589435577393
Epoch 82/1000, Loss: 1.3242926597595215
Epoch 92/1000, Loss: 1.2605986595153809
Epoch 102/1000, Loss: 1.2157366275787354
Epoch 112/1000, Loss: 1.182152271270752
Epoch 122/1000, Loss: 1.1559337377548218
Epoch 132/1000, Loss: 1.1408302783966064
Epoch 142/1000, Loss: 1.1258575916290283
Epoch 152/1000, Loss: 1.100624918937683
Epoch 162/1000, Loss: 1.0962754487991333
Epoch 172/1000, Loss: 1.0729572772979736
Epoch 182/1000, Loss: 1.0589983463287354
Epoch 192/1000, Loss: 1.059975266456604
Epoch 202/1000, Loss: 1.0544735193252563
Epoch 212/1000, Loss: 1.0473089218139648
Epoch 222/1000, Loss: 1.048572063446045
Epoch 232/1000, Loss: 1.0420039892196655
Epoch 242/1000, Loss: 1.0365196466445923
Epoch 252/1000, Loss: 1.025299072265625
Epoch 262/1000, Loss: 1.0289864540100098
Epoch 272/1000, Loss: 1.0161466598510742
Epoch 282/1000, Loss: 1.0162009000778198
Epoch 292/1000, Loss: 1.0055137872695923
Epoch 302/1000, Loss: 0.9996150732040405
Epoch 312/1000, Loss: 0.984171986579895
Epoch 322/1000, Loss: 0.9808189868927002
Epoch 332/1000, Loss: 0.9775058627128601
Epoch 342/1000, Loss: 0.9651961326599121
Epoch 352/1000, Loss: 0.953350305557251
Epoch 362/1000, Loss: 0.9430199265480042
Epoch 372/1000, Loss: 0.9377722144126892
Epoch 382/1000, Loss: 0.9295996427536011
Epoch 392/1000, Loss: 0.9214633703231812
Epoch 402/1000, Loss: 0.9171395301818848
Epoch 412/1000, Loss: 0.9016039967536926
Epoch 422/1000, Loss: 0.881766140460968
Epoch 432/1000, Loss: 0.8796168565750122
Epoch 442/1000, Loss: 0.8762509822845459
Epoch 452/1000, Loss: 0.8636870980262756
Epoch 462/1000, Loss: 0.860791802406311
Epoch 472/1000, Loss: 0.8389173746109009
Epoch 482/1000, Loss: 0.8420767784118652
Epoch 492/1000, Loss: 0.830423891544342
Epoch 502/1000, Loss: 0.8228366374969482
Epoch 512/1000, Loss: 0.8186403512954712
Epoch 522/1000, Loss: 0.8139156103134155
Epoch 532/1000, Loss: 0.7973632216453552
Epoch 542/1000, Loss: 0.8030251264572144
Epoch 552/1000, Loss: 0.7787685990333557
Epoch 562/1000, Loss: 0.7840163707733154
Epoch 572/1000, Loss: 0.7958685159683228
Epoch 582/1000, Loss: 0.7602176666259766
Epoch 592/1000, Loss: 0.7602773308753967
Epoch 602/1000, Loss: 0.7595182657241821
Epoch 612/1000, Loss: 0.7394869327545166
Epoch 622/1000, Loss: 0.7507429122924805
Epoch 632/1000, Loss: 0.7416149377822876
Epoch 642/1000, Loss: 0.7401753067970276
Epoch 652/1000, Loss: 0.7358796000480652
Epoch 662/1000, Loss: 0.7172746062278748
Epoch 672/1000, Loss: 0.7284640073776245
Epoch 682/1000, Loss: 0.7057000398635864
Epoch 692/1000, Loss: 0.7158027291297913
Epoch 702/1000, Loss: 0.7134007215499878
Epoch 712/1000, Loss: 0.7041587829589844
Epoch 722/1000, Loss: 0.6904285550117493
Epoch 732/1000, Loss: 0.6939340829849243
Epoch 742/1000, Loss: 0.6793048977851868
Epoch 752/1000, Loss: 0.6911025047302246
Epoch 762/1000, Loss: 0.6908837556838989
Epoch 772/1000, Loss: 0.6817575693130493
Epoch 782/1000, Loss: 0.6719300746917725
Epoch 792/1000, Loss: 0.6639779806137085
Epoch 802/1000, Loss: 0.6610084176063538
Epoch 812/1000, Loss: 0.6704531311988831
Epoch 822/1000, Loss: 0.645722508430481
Epoch 832/1000, Loss: 0.6501308679580688
Epoch 842/1000, Loss: 0.6539092659950256
Epoch 852/1000, Loss: 0.6520346403121948
Epoch 862/1000, Loss: 0.6397005319595337
Epoch 872/1000, Loss: 0.6325197815895081
Epoch 882/1000, Loss: 0.6182982921600342
Epoch 892/1000, Loss: 0.6249979138374329
Epoch 902/1000, Loss: 0.6265581250190735
Epoch 912/1000, Loss: 0.6115624308586121
Epoch 922/1000, Loss: 0.6079530715942383
Epoch 932/1000, Loss: 0.6105392575263977
Epoch 942/1000, Loss: 0.6221534013748169
Epoch 952/1000, Loss: 0.6090223789215088
Epoch 962/1000, Loss: 0.613954484462738
Epoch 972/1000, Loss: 0.592529296875
Epoch 982/1000, Loss: 0.6049216389656067
Epoch 992/1000, Loss: 0.5998843908309937
Training complete!
Embedding is saved at ./embedding.tsv
[11]:
# Save the embeddings of the trained SpaceFlow model to adata.obsm['SpaceFlow'].
adata.obsm['SpaceFlow'] = embedding
[12]:
# Calculate the nearest neighbors in the 'SpaceFlow' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep= 'SpaceFlow')
sc.tl.umap(adata)
[23]:
# Perform a Leiden clustering.
sc.tl.leiden(adata, resolution= 0.3)
[24]:
# Plot a UMAP embedding.
sc.pl.umap(adata, color= 'leiden', color_map= cmp_pspace)
_images/Generalizability_to_more_spatial_omics_data_SpatialClustering_SpatialGenomicsData_21_0.png
[25]:
# Display a spatial embedding plot with clustering information.
ax = sc.pl.embedding(adata, basis= 'spatial', color= 'leiden', show=False, color_map=cmp_pspace)
ax.axis('equal')
[25]:
(70.69998570611773, 6261.998413379076, -230.5226537216829, 6311.13445831407)
_images/Generalizability_to_more_spatial_omics_data_SpatialClustering_SpatialGenomicsData_22_1.png

SpatialClustering_SpatialProteomicsData

This tutorial demonstrates how to identify spatial domains on spatial proteomics data using Pysodb and Spaceflow.

The reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w and https://www.cell.com/fulltext/S0092-8674(18)31100-0.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Scanpy is a package for single-cell RNA sequencing analysis.
import scanpy as sc
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'keren2018a'
experiment_name = 'p4'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[p4] in dataset[keren2018a]

Perform SpaceFlow to spatial clustering for spatial proteomics data

[8]:
# Create SpaceFlow Object
sf = SpaceFlow.SpaceFlow(
    count_matrix=adata.X,
    spatial_locs=adata.obsm['spatial'],
    sample_names=adata.obs_names,
    gene_names=adata.var_names
)

When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().” In the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value respectively.

[9]:
# Preprocess data
sf.preprocessing_data()

When dealing with anndata (adata) where the count or expression matrix is extremely sparse, or where there are a very limited number of features (as is often the case with spatial proteomics data), it may be preferable to forego data preprocessing. This is because over-processing in these instances could lead to errors or diminished performance in downstream tasks. To skip preprocessing, user will need to make modifications to the preprocessing_data function within the “SpaceFlow.py” file of the SpaceFlow package. Specifically, user should comment out the sc.pp.normalize_total(), sc.pp.log1p(), and sc.pp.highly_variable_genes() functions.

When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”, in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.

When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.

[10]:
# Train a deep graph network model
embedding = sf.train(
    spatial_regularization_strength=0.1,
    z_dim=50,
    lr=1e-3,
    epochs=1000,
    max_patience=50,
    min_stop=100,
    random_seed=42,
    gpu=0,
    regularization_acceleration=True,
    edge_subset_sz=1000000
)
Epoch 2/1000, Loss: 1.407133936882019
Epoch 12/1000, Loss: 1.152597188949585
Epoch 22/1000, Loss: 0.8628092408180237
Epoch 32/1000, Loss: 0.6013652682304382
Epoch 42/1000, Loss: 0.44088295102119446
Epoch 52/1000, Loss: 0.3423008918762207
Epoch 62/1000, Loss: 0.27678847312927246
Epoch 72/1000, Loss: 0.243853360414505
Epoch 82/1000, Loss: 0.2291972041130066
Epoch 92/1000, Loss: 0.18289655447006226
Epoch 102/1000, Loss: 0.17134132981300354
Epoch 112/1000, Loss: 0.1678486168384552
Epoch 122/1000, Loss: 0.14209605753421783
Epoch 132/1000, Loss: 0.13936235010623932
Epoch 142/1000, Loss: 0.12762600183486938
Epoch 152/1000, Loss: 0.12430068105459213
Epoch 162/1000, Loss: 0.1275191307067871
Epoch 172/1000, Loss: 0.11915998160839081
Epoch 182/1000, Loss: 0.1285843551158905
Epoch 192/1000, Loss: 0.11072829365730286
Epoch 202/1000, Loss: 0.11861023306846619
Epoch 212/1000, Loss: 0.10268115997314453
Epoch 222/1000, Loss: 0.09695495665073395
Epoch 232/1000, Loss: 0.11048431694507599
Epoch 242/1000, Loss: 0.10876314342021942
Epoch 252/1000, Loss: 0.09246634691953659
Epoch 262/1000, Loss: 0.10014219582080841
Epoch 272/1000, Loss: 0.09144902974367142
Epoch 282/1000, Loss: 0.095871701836586
Epoch 292/1000, Loss: 0.09363307058811188
Epoch 302/1000, Loss: 0.0878312885761261
Epoch 312/1000, Loss: 0.09541191160678864
Epoch 322/1000, Loss: 0.08759468048810959
Epoch 332/1000, Loss: 0.08533327281475067
Epoch 342/1000, Loss: 0.08197614550590515
Epoch 352/1000, Loss: 0.07628679275512695
Epoch 362/1000, Loss: 0.08480516076087952
Epoch 372/1000, Loss: 0.08929148316383362
Epoch 382/1000, Loss: 0.07579313218593597
Epoch 392/1000, Loss: 0.07411038875579834
Epoch 402/1000, Loss: 0.08086317777633667
Epoch 412/1000, Loss: 0.07415079325437546
Epoch 422/1000, Loss: 0.07764127850532532
Epoch 432/1000, Loss: 0.07051204890012741
Epoch 442/1000, Loss: 0.07536381483078003
Epoch 452/1000, Loss: 0.08093470335006714
Epoch 462/1000, Loss: 0.07462986558675766
Epoch 472/1000, Loss: 0.07670363783836365
Epoch 482/1000, Loss: 0.07004383206367493
Epoch 492/1000, Loss: 0.07707950472831726
Epoch 502/1000, Loss: 0.07900045067071915
Epoch 512/1000, Loss: 0.08827245235443115
Epoch 522/1000, Loss: 0.06595028191804886
Epoch 532/1000, Loss: 0.07947414368391037
Epoch 542/1000, Loss: 0.06654597818851471
Epoch 552/1000, Loss: 0.08173666149377823
Epoch 562/1000, Loss: 0.06277875602245331
Epoch 572/1000, Loss: 0.08507055789232254
Training complete!
Embedding is saved at ./embedding.tsv
[11]:
# Save the embeddings of the trained SpaceFlow model to adata.obsm['SpaceFlow'].
adata.obsm['SpaceFlow'] = embedding
[12]:
# Calculate the nearest neighbors in the 'SpaceFlow' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep='SpaceFlow')
sc.tl.umap(adata)
[13]:
# Perform a Leiden clustering.
sc.tl.leiden(adata, resolution=0.05)
[14]:
# Plot a UMAP embedding.
sc.pl.umap(adata, color= 'leiden', color_map= cmp_pspace)
_images/Generalizability_to_more_spatial_omics_data_SpatialClustering_SpatialProteomicsData_21_0.png
[15]:
# Display a spatial embedding plot with clustering information.
ax = sc.pl.embedding(adata, basis= 'spatial', color='leiden', show= False, color_map=cmp_pspace)
ax.axis('equal')
[15]:
(-68.28090909090909, 2115.899090909091, -67.68806853582555, 2115.9914953271027)
_images/Generalizability_to_more_spatial_omics_data_SpatialClustering_SpatialProteomicsData_22_1.png

SpatialClustering_SpatialMetabolomicsData

This tutorial demonstrates how to identify spatial domains on spatial metabolomics data using Pysodb and Spaceflow.

The reference paper can be found at https://www.nature.com/articles/s41467-022-31739-w and https://cordis.europa.eu/project/id/634402.

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Scanpy (imported as sc) is a package for single-cell RNA sequencing analysis.
import scanpy as sc
[3]:
# from SpaceFlow package import SpaceFlow module
from SpaceFlow import SpaceFlow
[4]:
# Imports a palettable package
import palettable
# Create three variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_pspace = palettable.cartocolors.diverging.TealRose_7.mpl_colormap
cmp_domain = palettable.cartocolors.qualitative.Pastel_10.mpl_colors
cmp_ct = palettable.cartocolors.qualitative.Safe_10.mpl_colors

When encountering the error “No module name ‘palettable’”, users need to activate conda’s virtual environment first at the terminal and run the following command in the terminal: “pip install palettable”. This approach can be applied to other packages as well, by replacing ‘palettable’ with the name of the desired package.

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialize the sodb object
sodb = pysodb.SODB()
[7]:
# Define names of the dataset_name and experiment_name
dataset_name = 'MALDI_seed'
experiment_name = 'S655_WS22_320x200_15um_E110'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[S655_WS22_320x200_15um_E110] in dataset[MALDI_seed]

Perform SpaceFlow to spatial clustering for spatial metabolomics data

[8]:
# Create SpaceFlow Object
sf = SpaceFlow.SpaceFlow(
    count_matrix=adata.X,
    spatial_locs=adata.obsm['spatial'],
    sample_names=adata.obs_names,
    gene_names=adata.var_names
)

When encountering the error “Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().” In the “SpaceFlow.py” file from the SpaceFlow package, the user is advised to make the following modifications within the init function. Replace “elif count_matrix and spatial_locs:” with “elif count_matrix is not None and spatial_locs is not None:”. Additionally, modify “if gene_names:” and “if sample_names:” to “if gene_names is not None:” and “if sample_names is not None:” respectively. The above modifications ensure that the if statement returns a single boolean value respectively.

[9]:
# Preprocess data
sf.preprocessing_data()

When dealing with anndata (adata) where the count or expression matrix is extremely sparse, or where there are a very limited number of features (as is often the case with spatial proteomics data), it may be preferable to forego data preprocessing. This is because over-processing in these instances could lead to errors or diminished performance in downstream tasks. To skip preprocessing, user will need to make modifications to the preprocessing_data function within the “SpaceFlow.py” file of the SpaceFlow package. Specifically, user should comment out the sc.pp.normalize_total(), sc.pp.log1p(), and sc.pp.highly_variable_genes() functions.

When encountering the error “Error: You can drop duplicate edges by setting the ‘duplicates’ kwarg”, in “SpaceFlow.py” from the SpaceFlow package, modify the preprocessing_data function by: (1) removing target_sum=1e4 from sc.pp.normalize_total(); (2) changing the flavor argument to ‘seurat’ in sc.pp.highly_variable_genes(); (3) Save and rerun the analysis.

When encountering the error “Error: module ‘networkx’ has no attribute ‘to_scipy_sparse_matrix’”, users should first activate the virtual environment at the terminal and then downgrade NetworkX with the following command:”pip install networkx==2.8”. This will ensure that the correct version of NetworkX is installed within the specified virtual environment.

[10]:
# Train a deep graph network model
embedding = sf.train(
    spatial_regularization_strength=0.1,
    z_dim=50,
    lr=1e-3,
    epochs=1000,
    max_patience=50,
    min_stop=100,
    random_seed=42,
    gpu=0,
    regularization_acceleration=True,
    edge_subset_sz=1000000
)
Epoch 2/1000, Loss: 34.639442443847656
Epoch 12/1000, Loss: 34.61619186401367
Epoch 22/1000, Loss: 34.61435317993164
Epoch 32/1000, Loss: 34.614418029785156
Epoch 42/1000, Loss: 34.61484909057617
Epoch 52/1000, Loss: 34.6142692565918
Epoch 62/1000, Loss: 34.611209869384766
Epoch 72/1000, Loss: 34.61259841918945
Epoch 82/1000, Loss: 34.61189651489258
Epoch 92/1000, Loss: 34.61252212524414
Epoch 102/1000, Loss: 34.61070251464844
Epoch 112/1000, Loss: 34.60990524291992
Epoch 122/1000, Loss: 34.61140823364258
Epoch 132/1000, Loss: 34.6107063293457
Epoch 142/1000, Loss: 34.61091232299805
Epoch 152/1000, Loss: 34.609031677246094
Epoch 162/1000, Loss: 34.60935974121094
Epoch 172/1000, Loss: 34.60832595825195
Epoch 182/1000, Loss: 34.60786437988281
Epoch 192/1000, Loss: 34.60796356201172
Epoch 202/1000, Loss: 34.60761642456055
Epoch 212/1000, Loss: 34.60805130004883
Epoch 222/1000, Loss: 34.60633850097656
Epoch 232/1000, Loss: 34.60551071166992
Epoch 242/1000, Loss: 34.60599899291992
Epoch 252/1000, Loss: 34.606834411621094
Epoch 262/1000, Loss: 34.604793548583984
Epoch 272/1000, Loss: 34.604347229003906
Epoch 282/1000, Loss: 34.603111267089844
Epoch 292/1000, Loss: 34.60285568237305
Epoch 302/1000, Loss: 34.602264404296875
Epoch 312/1000, Loss: 34.601409912109375
Epoch 322/1000, Loss: 34.600948333740234
Epoch 332/1000, Loss: 34.60084915161133
Epoch 342/1000, Loss: 34.602237701416016
Epoch 352/1000, Loss: 34.59991455078125
Epoch 362/1000, Loss: 34.60015106201172
Epoch 372/1000, Loss: 34.59867858886719
Epoch 382/1000, Loss: 34.59828567504883
Epoch 392/1000, Loss: 34.59810256958008
Epoch 402/1000, Loss: 34.59739303588867
Epoch 412/1000, Loss: 34.59801483154297
Epoch 422/1000, Loss: 34.59721374511719
Epoch 432/1000, Loss: 34.597496032714844
Epoch 442/1000, Loss: 34.59624099731445
Epoch 452/1000, Loss: 34.595703125
Epoch 462/1000, Loss: 34.59564971923828
Epoch 472/1000, Loss: 34.595947265625
Epoch 482/1000, Loss: 34.59563446044922
Epoch 492/1000, Loss: 34.59565734863281
Epoch 502/1000, Loss: 34.59561538696289
Epoch 512/1000, Loss: 34.594581604003906
Epoch 522/1000, Loss: 34.59492492675781
Epoch 532/1000, Loss: 34.59452438354492
Epoch 542/1000, Loss: 34.59488296508789
Epoch 552/1000, Loss: 34.59366226196289
Epoch 562/1000, Loss: 34.59366989135742
Epoch 572/1000, Loss: 34.593658447265625
Epoch 582/1000, Loss: 34.593902587890625
Epoch 592/1000, Loss: 34.59398651123047
Epoch 602/1000, Loss: 34.59341812133789
Epoch 612/1000, Loss: 34.594024658203125
Epoch 622/1000, Loss: 34.5934944152832
Epoch 632/1000, Loss: 34.5926399230957
Epoch 642/1000, Loss: 34.5951042175293
Epoch 652/1000, Loss: 34.593170166015625
Epoch 662/1000, Loss: 34.59335708618164
Epoch 672/1000, Loss: 34.59296798706055
Epoch 682/1000, Loss: 34.59357452392578
Epoch 692/1000, Loss: 34.593448638916016
Epoch 702/1000, Loss: 34.5923957824707
Epoch 712/1000, Loss: 34.592681884765625
Epoch 722/1000, Loss: 34.59284210205078
Epoch 732/1000, Loss: 34.592811584472656
Epoch 742/1000, Loss: 34.59242630004883
Epoch 752/1000, Loss: 34.59189224243164
Epoch 762/1000, Loss: 34.59220504760742
Epoch 772/1000, Loss: 34.59148025512695
Epoch 782/1000, Loss: 34.59242248535156
Epoch 792/1000, Loss: 34.591590881347656
Epoch 802/1000, Loss: 34.59206008911133
Epoch 812/1000, Loss: 34.5922737121582
Epoch 822/1000, Loss: 34.59100341796875
Epoch 832/1000, Loss: 34.590911865234375
Epoch 842/1000, Loss: 34.59068298339844
Training complete!
Embedding is saved at ./embedding.tsv
[11]:
# Save the embeddings of the trained SpaceFlow model to adata.obsm['SpaceFlow'].
adata.obsm['SpaceFlow'] = embedding
[12]:
# Calculate the nearest neighbors in the 'SpaceFlow' representation and computes the UMAP embedding.
sc.pp.neighbors(adata, use_rep='SpaceFlow')
sc.tl.umap(adata)
[13]:
# Perform a Leiden clustering.
sc.tl.leiden(adata, resolution=0.04, key_added='leiden')
[14]:
# Plot a UMAP embedding.
sc.pl.umap(adata, color= 'leiden', color_map= cmp_pspace)
_images/Generalizability_to_more_spatial_omics_data_SpatialClustering_SpatialMetabolomicsData_21_0.png
[15]:
# Display a spatial embedding plot with clustering information.
ax = sc.pl.embedding(adata, basis= 'spatial', color='leiden', show= False, color_map=cmp_pspace)
ax.axis('equal')
[15]:
(-14.950000000000001, 335.95, -8.950000000000001, 209.95)
_images/Generalizability_to_more_spatial_omics_data_SpatialClustering_SpatialMetabolomicsData_22_1.png

SpatialDataIntegration_SpatialProteomicsData

This tutorial demonstrates how to spatial data integration on spatial proteomics data using Pysodb and STAGATE+Harmony.

The reference paper can be found at https://www.nature.com/articles/s41467-022-29439-6 (STAGATE), https://www.nature.com/articles/s41592-019-0619-0 (Harmony) and https://www.cell.com/fulltext/S0092-8674(18)31100-0 (spatial proteomics data).

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several Python packages commonly used in data analysis and visualization:
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
[3]:
# Import a STAGATE_pyG module
import STAGATE_pyG as STAGATE

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[4]:
# Imports a palettable package
import palettable
# Create two variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_old = palettable.cartocolors.qualitative.Bold_10.mpl_colors
cmp_old_biotech = palettable.cartocolors.qualitative.Safe_4.mpl_colors

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialization
sodb = pysodb.SODB()
[7]:
# Define names of dataset_name and experiment_name
dataset_name = 'keren2018a'
experiment_name = 'p4'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[p4] in dataset[keren2018a]
[8]:
# Create a dictionary named adata_list
adata_list = {}
[9]:
# Modify the names and save in the dictionary with the key 'p4'.
adata.obs_names = [x+'_p4' for x in adata.obs_names]
adata_list['p4'] = adata.copy()
[10]:
# Define names of another dataset_name and experiment_name
dataset_name = 'keren2018a'
experiment_name = 'p9'
# Load another specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[p9] in dataset[keren2018a]
[11]:
# Update names and save in the dictionary under the key 'p9'
adata.obs_names = [x+'_p9' for x in adata.obs_names]
adata_list['p9'] = adata.copy()

Running STAGATE for training

[12]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 50 for adata_list['p4']
STAGATE.Cal_Spatial_Net(adata_list['p4'], rad_cutoff=50)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for adata_list['p4']
STAGATE.Stats_Spatial_Net(adata_list['p4'])
------Calculating spatial graph...
The graph contains 84154 edges, 6643 cells.
12.6681 neighbors per cell on average.
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_17_1.png
[13]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate a spatial graph with a radius cutoff of 50 for adata_list['p9']
STAGATE.Cal_Spatial_Net(adata_list['p9'], rad_cutoff=50)
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for adata_list['p9']
STAGATE.Stats_Spatial_Net(adata_list['p9'])
------Calculating spatial graph...
The graph contains 76056 edges, 6139 cells.
12.3890 neighbors per cell on average.
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_18_1.png
[14]:
# Train the STAGATE model on each individual sample in the adata_list
for section_id in ['p4', 'p9']:
    adata_list[section_id] = STAGATE.train_STAGATE(adata_list[section_id],n_epochs=500)
Size of Input:  (6643, 36)
100%|██████████| 500/500 [00:03<00:00, 125.55it/s]
Size of Input:  (6139, 36)
100%|██████████| 500/500 [00:03<00:00, 162.64it/s]
[15]:
# Concatenate 'p4' and 'p9' into a new AnnData object named 'adata'
adata = sc.concat([adata_list['p4'], adata_list['p9']], keys=None)
[16]:
# Calculates neighbors in the 'STAGATE' representation, applies UMAP, and performs leiden clustering
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
sc.tl.leiden(adata,resolution=0.08)
[17]:
# Save UMAP and Leiden clustering results before integration
adata.obsm['UMAP_before'] = adata.obsm['X_umap']
adata.obs['leiden_before'] = adata.obs['leiden']
[18]:
# Delete the STAGATE embedding from each individual sample
del adata.obsm['STAGATE']
[19]:
# Concatenate two 'Spatial_Net'
adata.uns['Spatial_Net'] = pd.concat([adata_list['p4'].uns['Spatial_Net'], adata_list['p9'].uns['Spatial_Net']])
[20]:
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for whole adata
STAGATE.Stats_Spatial_Net(adata)
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_25_0.png
[21]:
# Train the STAGATE model on the whole samples
adata = STAGATE.train_STAGATE(adata, n_epochs=500)
Size of Input:  (12782, 36)
 11%|█         | 54/500 [00:00<00:05, 86.13it/s]100%|██████████| 500/500 [00:05<00:00, 85.97it/s]
[22]:
# Create a new column 'Sample' by splitting each name and selecting the last element
adata.obs['Sample'] = [x.split('_')[-1] for x in adata.obs_names]
[23]:
# Plot a UMAP projection across different samples before integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_before', color='Sample', title='Unintegrated',show=False,palette=cmp_old_biotech)
[23]:
<Axes: title={'center': 'Unintegrated'}, xlabel='UMAP_before1', ylabel='UMAP_before2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_28_1.png
[24]:
# Generate a plot of the UMAP embedding colored by leiden before integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_before', color='leiden_before',show=False,palette=cmp_old)
[24]:
<Axes: title={'center': 'leiden_before'}, xlabel='UMAP_before1', ylabel='UMAP_before2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_29_1.png
[25]:
# Display spatial distribution of cells colored by leiden clustering for two samples ('p4' and 'p9')
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['p4', 'p9']:
    temp_adata = adata[adata.obs['Sample']==temp_tech, ]
    if it == 1:
        ax = sc.pl.embedding(temp_adata, basis="spatial", color="leiden_before",s=6, ax=axs[it],
                        show=False, title=temp_tech)
        ax.axis('equal')
    else:
        ax = sc.pl.embedding(temp_adata, basis="spatial", color="leiden_before",s=6, ax=axs[it], legend_loc=None,
                        show=False, title=temp_tech)
        ax.axis('equal')
    it+=1
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_30_0.png

Perform Harmony for spatial data intergration

Harmony is an algorithm for integrating multiple high-dimensional datasets It can be employed as a reference at https://github.com/slowkow/harmonypy and https://pypi.org/project/harmonypy/

[26]:
# Import harmonypy package
import harmonypy as hm
[27]:
# Use STAGATE representation to create 'meta_data' for harmony
data_mat = adata.obsm['STAGATE'].copy()
meta_data = adata.obs.copy()
[28]:
# Run harmony for STAGATE representation
ho = hm.run_harmony(data_mat, meta_data, ['Sample'])
2023-07-15 06:29:36,293 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
2023-07-15 06:29:38,362 - harmonypy - INFO - sklearn.KMeans initialization complete.
2023-07-15 06:29:38,398 - harmonypy - INFO - Iteration 1 of 10
2023-07-15 06:29:40,274 - harmonypy - INFO - Iteration 2 of 10
2023-07-15 06:29:42,312 - harmonypy - INFO - Iteration 3 of 10
2023-07-15 06:29:44,386 - harmonypy - INFO - Iteration 4 of 10
2023-07-15 06:29:46,491 - harmonypy - INFO - Iteration 5 of 10
2023-07-15 06:29:48,491 - harmonypy - INFO - Iteration 6 of 10
2023-07-15 06:29:50,465 - harmonypy - INFO - Iteration 7 of 10
2023-07-15 06:29:52,434 - harmonypy - INFO - Iteration 8 of 10
2023-07-15 06:29:54,389 - harmonypy - INFO - Iteration 9 of 10
2023-07-15 06:29:56,365 - harmonypy - INFO - Iteration 10 of 10
2023-07-15 06:29:58,338 - harmonypy - INFO - Stopped before convergence
[29]:
# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = adata.obs_names
[30]:
# Creates a new AnnData object adata_Harmony using a transpose of the res matrix
adata_Harmony = sc.AnnData(res.T)
[31]:
adata_Harmony.obsm['spatial'] = pd.DataFrame(adata.obsm['spatial'], index=adata.obs_names).loc[adata_Harmony.obs_names,].values
adata_Harmony.obs['Sample'] = adata.obs.loc[adata_Harmony.obs_names, 'Sample']
[32]:
# Calculate neighbors, apply UMAP, and perform louvain clustering for the integrated data
sc.pp.neighbors(adata_Harmony)
sc.tl.umap(adata_Harmony)
sc.tl.leiden(adata_Harmony, resolution=0.08)
[33]:
# Save UMAP and Leiden clustering results after integration
adata.obsm['UMAP_after'] = adata_Harmony.obsm['X_umap']
adata.obs['leiden_after'] = adata_Harmony.obs['leiden']
[34]:
# Plot a UMAP projection across different samples after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_after', color='Sample', title='STAGATE + Harmony',show=False, palette=cmp_old_biotech)
[34]:
<Axes: title={'center': 'STAGATE + Harmony'}, xlabel='UMAP_after1', ylabel='UMAP_after2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_41_1.png
[35]:
# Generate a plot of the UMAP embedding colored by leiden after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_after', color='leiden_after', show=False, palette=cmp_old)
[35]:
<Axes: title={'center': 'leiden_after'}, xlabel='UMAP_after1', ylabel='UMAP_after2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_42_1.png
[36]:
# Display spatial distribution of cells colored by leiden clustering for two samples ('p4' and 'p9') after integration
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['p4', 'p9']:
    temp_adata = adata[adata.obs['Sample']==temp_tech, ]
    if it == 1:
        ax = sc.pl.embedding(temp_adata, basis="spatial", color="leiden_after",s=6, ax=axs[it],
                        show=False, title=temp_tech)
        ax.axis('equal')
    else:
        ax = sc.pl.embedding(temp_adata, basis="spatial", color="leiden_after",s=6, ax=axs[it], legend_loc=None,
                        show=False, title=temp_tech)
        ax.axis('equal')
    it+=1
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData_43_0.png

SpatialDataIntegration_SpatialProteomicsData2

This tutorial demonstrates how to spatial data integration on spatial proteomics data2 using Pysodb and STAGATE+Harmony.

The reference paper can be found at https://www.nature.com/articles/s41467-022-29439-6 (STAGATE), https://www.nature.com/articles/s41592-019-0619-0 (Harmony) and https://www.science.org/doi/10.1126/science.aar7042 (spatial proteomics data2).

Import packages and set configurations

[1]:
# Use the Python warnings module to filter and ignore any warnings that may occur in the program after this point.
import warnings
warnings.filterwarnings("ignore")
[2]:
# Import several Python packages commonly used in data analysis and visualization:
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# scanpy (imported as sc) is a package for single-cell RNA sequencing analysis
import scanpy as sc
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
[3]:
# Import a STAGATE_pyG module
import STAGATE_pyG as STAGATE

If users encounter the error “No module named ‘STAGATE_pyG’” when trying to import STAGATE_pyG package, first ensure that the “STAGATE_pyG” folder is located in the current script’s directory.

[4]:
# Imports a palettable package
import palettable
# Create two variables with lists of colors for categorical visualizations and biotechnology-related visualizations, respectively.
cmp_old = palettable.cartocolors.qualitative.Bold_10.mpl_colors
cmp_old_biotech = palettable.cartocolors.qualitative.Safe_4.mpl_colors

Streamline development of loading spatial data with Pysodb

[5]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[6]:
# Initialization
sodb = pysodb.SODB()
[7]:
# Define a section_list with samples from different experiments
section_list = ['cell_129', 'cell_143', 'cell_140', 'cell_127']
[8]:
# Download experiments into adata_list using pysodb
dataset_name = 'gut2018multiplexed'
adata_list = {}
for section_id in section_list:
    temp_adata = sodb.load_experiment(dataset_name,section_id)
    temp_adata.var_names_make_unique()
    temp_adata.obs_names = [x+'_'+section_id for x in temp_adata.obs_names]

    adata_list[section_id] = temp_adata.copy()
load experiment[cell_129] in dataset[gut2018multiplexed]
load experiment[cell_143] in dataset[gut2018multiplexed]
load experiment[cell_140] in dataset[gut2018multiplexed]
load experiment[cell_127] in dataset[gut2018multiplexed]
[9]:
# Visualize different experiments color by 'cluster'
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    if it == 3:
        ax = sc.pl.embedding(adata_list[section_id], basis= 'spatial', ax=axs[it],
                      color=['cluster'], title=section_id, show=False)
        ax.axis('equal')
    else:
        ax = sc.pl.embedding(adata_list[section_id], basis= 'spatial', ax=axs[it],
                      color=['cluster'], title=section_id, show=False)
        ax.axis('equal')
    it+=1
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_13_0.png

Running STAGATE for training

[10]:
# Use "STAGATE_pyG.Cal_Spatial_Net" to calculate spatial graph for different samples
# And use "STAGATE_pyG.Stats_Spatial_Net" to summarize their cells and edges information
for section_id in section_list:
    STAGATE.Cal_Spatial_Net(adata_list[section_id], rad_cutoff=3)
    STAGATE.Stats_Spatial_Net(adata_list[section_id])
------Calculating spatial graph...
The graph contains 331176 edges, 15213 cells.
21.7693 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 418616 edges, 19053 cells.
21.9711 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 464924 edges, 21371 cells.
21.7549 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 552110 edges, 25415 cells.
21.7238 neighbors per cell on average.
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_15_1.png
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_15_2.png
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_15_3.png
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_15_4.png
[11]:
# Train the STAGATE model on each individual sample in the adata_list
for section_id in section_list:
    adata_list[section_id] = STAGATE.train_STAGATE(adata_list[section_id],n_epochs= 1500)
Size of Input:  (15213, 43)
100%|██████████| 1500/1500 [00:34<00:00, 43.69it/s]
Size of Input:  (19053, 43)
100%|██████████| 1500/1500 [00:42<00:00, 35.49it/s]
Size of Input:  (21371, 43)
100%|██████████| 1500/1500 [00:47<00:00, 31.85it/s]
Size of Input:  (25415, 43)
100%|██████████| 1500/1500 [00:55<00:00, 26.92it/s]
[12]:
# Concatenate each individual sample in the adata_list into a AnnData object named 'adata_before'
adata_before = sc.concat([adata_list[x] for x in section_list], keys=None)
[13]:
# Calculate the nearest neighbors in the 'STAGATE' representation and computes the UMAP embedding.
sc.pp.neighbors(adata_before, use_rep='STAGATE')
sc.tl.umap(adata_before)
[14]:
# Use Mclust_R to cluster cells in the 'STAGATE' representation into 10 clusters.
adata_before = STAGATE.mclust_R(adata_before, used_obsm='STAGATE', num_cluster=10)
adata_before.obs['mclust10_before'] = adata_before.obs['mclust']
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.0.0
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
[15]:
# Concatenate each individual sample in the adata_list into another new AnnData object named 'adata'
adata = sc.concat([adata_list[x] for x in section_list], keys=None)
[16]:
# Save UMAP and mclust clustering results before integration
adata.obsm['UMAP_before'] = adata_before.obsm['X_umap']
adata.obs['mclust10_before'] = adata_before.obs['mclust10_before']
[17]:
# Delete the STAGATE embedding from each individual sample
del adata.obsm['STAGATE']
[18]:
# Concatenate all 'Spatial_Net'
adata.uns['Spatial_Net'] = pd.concat([adata_list[x].uns['Spatial_Net'] for x in section_list])
[19]:
# Use "STAGATE_pyG.Stats_Spatial_Net" to summarize cells and edges information for whole adata
STAGATE.Stats_Spatial_Net(adata)
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_24_0.png
[20]:
# Train the STAGATE model on the whole samples
adata = STAGATE.train_STAGATE(adata, n_epochs= 1500)
Size of Input:  (81052, 43)
100%|██████████| 1500/1500 [02:57<00:00,  8.44it/s]
[21]:
# Create a new column 'Sample' by splitting each name and selecting the last two element
adata.obs['Sample'] = [x.split('_')[-2] + '_' + x.split('_')[-1] for x in adata.obs_names]
[22]:
adata.obs['Sample']
[22]:
747877_cell_129    cell_129
708635_cell_129    cell_129
728396_cell_129    cell_129
784709_cell_129    cell_129
768447_cell_129    cell_129
                     ...
814036_cell_127    cell_127
728906_cell_127    cell_127
657777_cell_127    cell_127
702680_cell_127    cell_127
778308_cell_127    cell_127
Name: Sample, Length: 81052, dtype: object
[23]:
# Plot a UMAP projection before integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_before', color='Sample', title='Unintegrated',show=False, palette=cmp_old_biotech)
[23]:
<Axes: title={'center': 'Unintegrated'}, xlabel='UMAP_before1', ylabel='UMAP_before2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_28_1.png
[24]:
# Generate a plot of the UMAP embedding colored by mclust before integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_before', color='mclust10_before', show=False, palette=cmp_old)
[24]:
<Axes: title={'center': 'mclust10_before'}, xlabel='UMAP_before1', ylabel='UMAP_before2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_29_1.png
[25]:
# Display spatial distribution of cells colored by mclust clustering for four samples
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    ax = sc.pl.embedding(adata[adata.obs['Sample']==section_id], basis= 'spatial', ax=axs[it],
                      color=['mclust10_before'], title=section_id, show=False)
    ax.axis('equal')
    it+=1
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_30_0.png

Perform Harmony for spatial data intergration

Harmony is an algorithm for integrating multiple high-dimensional datasets It can be employed as a reference at https://github.com/slowkow/harmonypy and https://pypi.org/project/harmonypy/

[26]:
# Import harmonypy package
import harmonypy as hm
[27]:
# Use STAGATE representation to create 'meta_data' for harmony
data_mat = adata.obsm['STAGATE'].copy()
meta_data = adata.obs.copy()
[28]:
# Run harmony for STAGATE representation
ho = hm.run_harmony(data_mat, meta_data, ['Sample'])
2023-07-15 09:15:02,591 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
2023-07-15 09:15:11,568 - harmonypy - INFO - sklearn.KMeans initialization complete.
2023-07-15 09:15:11,799 - harmonypy - INFO - Iteration 1 of 10
2023-07-15 09:15:28,517 - harmonypy - INFO - Iteration 2 of 10
2023-07-15 09:15:45,385 - harmonypy - INFO - Iteration 3 of 10
2023-07-15 09:16:02,717 - harmonypy - INFO - Iteration 4 of 10
2023-07-15 09:16:19,980 - harmonypy - INFO - Iteration 5 of 10
2023-07-15 09:16:31,743 - harmonypy - INFO - Iteration 6 of 10
2023-07-15 09:16:41,966 - harmonypy - INFO - Iteration 7 of 10
2023-07-15 09:16:48,974 - harmonypy - INFO - Iteration 8 of 10
2023-07-15 09:16:55,928 - harmonypy - INFO - Iteration 9 of 10
2023-07-15 09:17:04,016 - harmonypy - INFO - Converged after 9 iterations
[29]:
# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = adata.obs_names
[30]:
# Creates a new AnnData object adata_Harmony using a transpose of the res matrix
adata_Harmony = sc.AnnData(res.T)
[31]:
adata_Harmony.obsm['spatial'] = pd.DataFrame(adata.obsm['spatial'], index=adata.obs_names).loc[adata_Harmony.obs_names,].values
adata_Harmony.obs['Sample'] = adata.obs.loc[adata_Harmony.obs_names, 'Sample']
[32]:
# Calculate the nearest neighbors in the 'STAGATE' representation and computes the UMAP embedding for the integrated data
sc.pp.neighbors(adata_Harmony)
sc.tl.umap(adata_Harmony)
[33]:
# Use Mclust_R to cluster cells in the 'Harmony' representation into 4 clusters.
adata_Harmony.obsm['Harmony'] = adata_Harmony.X
adata_Harmony = STAGATE.mclust_R(adata_Harmony, used_obsm='Harmony', num_cluster=4)
adata_Harmony.obs['mclust4_after'] = adata_Harmony.obs['mclust']
fitting ...
  |======================================================================| 100%
[34]:
# Save UMAP and mclust clustering results after integration
adata.obsm['UMAP_after'] = adata_Harmony.obsm['X_umap']
adata.obs['mclust4_after'] = adata_Harmony.obs['mclust4_after']
[35]:
# Plot a UMAP projection different samples after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_after', color='Sample', title='STAGATE + Harmony',show=False, palette=cmp_old_biotech)
[35]:
<Axes: title={'center': 'STAGATE + Harmony'}, xlabel='UMAP_after1', ylabel='UMAP_after2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_42_1.png
[36]:
# Generate a plot of the UMAP embedding colored by mclust after integration
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis= 'UMAP_after', color='mclust4_after', show=False, palette=cmp_old)
[36]:
<Axes: title={'center': 'mclust4_after'}, xlabel='UMAP_after1', ylabel='UMAP_after2'>
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_43_1.png
[37]:
# Display spatial distribution of cells colored by mclust clustering for four samples after integration
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    ax = sc.pl.embedding(adata[adata.obs['Sample']==section_id], basis= 'spatial', ax=axs[it],
                      color=['mclust4_after'], title=section_id, show=False)
    ax.axis('equal')
    it+=1
_images/Generalizability_to_more_spatial_omics_data_SpatialDataIntegration_SpatialProteomicsData2_44_0.png