SpatiallyVariableGeneDetection_SpatialProteomicsData
This tutorial demonstrates spatially variable gene detection on spatial proteomics data using Pysodb and Sepal.
A reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120 and https://www.cell.com/fulltext/S0092-8674(18)31100-0.
Import packages and set configurations
[1]:
# Numpy is a package for numerical computing with arrays
import numpy as np
[2]:
# Import sepal package and its modules
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut
Streamline development of loading spatial data with Pysodb
[3]:
# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb
[4]:
# Initialization
sodb = pysodb.SODB()
[5]:
# Define names of the dataset_name and experiment_name
dataset_name = 'keren2018a'
experiment_name = 'p9'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)
load experiment[p9] in dataset[keren2018a]
[6]:
# Save the AnnData object to an H5AD file format.
adata.write_h5ad('keren2018a_p9.h5ad')
Perform Sepal to spatially variable gene detection for spatial proteomics data
[7]:
# Load in the raw data using a RawData class.
raw_data = d.RawData('keren2018a_p9.h5ad')
[8]:
raw_data
[8]:
RawData object
> loaded from keren2018a_p9.h5ad
> using pixel coordinates
[9]:
# A subclass of the CountData class that uses the UnstructuredData class to hold data from non-Visium or non-ST arrays.
data = m.UnstructuredData(raw_data,
eps = 0.1)
[10]:
# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
normalize = True,
scale =True)
[INFO] : Using 128 workers
[INFO] : Saturated Spots : 5806
/home/linsenlin/PROTOCOLS_SODB/Spatially variable gene/sepal/sepal/utils.py:80: RuntimeWarning: invalid value encountered in log2
return np.log2(x + c)
100%|██████████| 36/36 [00:00<00:00, 2841.88it/s]
[11]:
# Selects the top 10 and bottom 10 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 10
n_top = 10
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 10 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 10 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 10 profiles from the times DataFrame
times.loc[top_profiles,:]
[11]:
| average | |
|---|---|
| Vimentin | 1.000000 |
| Beta catenin | 0.746855 |
| CD45 | 0.716981 |
| CD45RO | 0.646226 |
| H3K9ac | 0.632075 |
| CD16 | 0.566038 |
| phospho-S6 | 0.531447 |
| CD11b | 0.523585 |
| CD11c | 0.503145 |
| CD68 | 0.484277 |
[12]:
# Inspect detecition visually by using the "plot_profiles function for first 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 15,
cmap = "magma",
edgecolor = 'none',
marker = 'H',
)
# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
crd = data.real_crd,
rank_values = times.loc[top_profiles,:].values.flatten(),
pltargs = pltargs,
)
[13]:
# Inspect detecition visually by using the "plot_profiles function for last 10 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 15,
cmap = "magma",
edgecolor = 'none',
marker = 'H',
)
# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
crd = data.real_crd,
rank_values = times.loc[tail_profiles,:].values.flatten(),
pltargs = pltargs,
)
/home/linsenlin/PROTOCOLS_SODB/Spatially variable gene/sepal/sepal/utils.py:80: RuntimeWarning: invalid value encountered in log2
return np.log2(x + c)