Application with new data

This tutorial demonstrates spatially variable gene detection on 10X Visium mouse brain data using Pysodb and Sepal.

A reference paper can be found at https://academic.oup.com/bioinformatics/article/37/17/2644/6168120.

This tutorial refers to the following tutorial at https://github.com/almaan/sepal/blob/master/examples/melanoma.ipynb. At the same time, the way of loadding data is modified by using Pysodb.

Import packages and set configurations

[1]:

# Import several Python packages commonly used in data analysis and visualization.
# numpy (imported as np) is a package for numerical computing with arrays
import numpy as np
# pandas (imported as pd) is a package for data manipulation and analysis
import pandas as pd
# matplotlib.pyplot (imported as plt) is a package for data visualization
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2

[2]:

# Import sepal package and its modules
import sepal
import sepal.datasets as d
import sepal.models as m
import sepal.utils as ut
import sepal.family as family
import sepal.enrich as fea

Streamline development of loading spatial data with Pysodb

[3]:

# Import pysodb package
# Pysodb is a Python package that provides a set of tools for working with SODB databases.
# SODB is a format used to store data in memory-mapped files for efficient access and querying.
# This package allows users to interact with SODB files using Python.
import pysodb

[4]:

# Initialization
sodb = pysodb.SODB()

[5]:

# Define names of the dataset_name and experiment_name
dataset_name = '10x'
experiment_name = 'V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix'
# Load a specific experiment
# It takes two arguments: the name of the dataset and the name of the experiment to load.
# Two arguments are available at https://gene.ai.tencent.com/SpatialOmics/.
adata = sodb.load_experiment(dataset_name,experiment_name)

load experiment[V1_Mouse_Brain_Sagittal_Posterior_filtered_feature_bc_matrix] in dataset[10x]

[6]:

# Save the AnnData object to an H5AD file format.
adata.write_h5ad('Visium_pysodb.h5ad')

Perform Sepal for spatially variable gene detection

[7]:

# Load in the raw data using a RawData class.
raw_data = d.RawData('Visium_pysodb.h5ad')

[8]:

# Filter genes observed in less than 5 spots and/or less than 10 total observations
raw_data.cnt = ut.filter_genes(raw_data.cnt,
                               min_expr=10,
                               min_occur=5)

[9]:

# A subclass of CountData class to hold Visium array based data using a VisiumData class
data = m.VisiumData(raw_data,
              eps = 0.1)

[10]:

# A propagate class is employ to normalize count data and then propagate it in time, to measure the diffusion time.
# Set scale = True to perform
# Minmax scaling of the diffusion times
times = m.propagate(data,
                    normalize = True,
                    scale =True)

[INFO] : Using 128 workers
[INFO] : Saturated Spots : 3095

100%|██████████| 16278/16278 [01:09<00:00, 234.29it/s]

[11]:

# Selects the top 20 and bottom 20 profiles based on their diffusion times
# Set the number of top and bottom profiles to be selected as 20
n_top = 20
# Computes the indices that would sort the times DataFrame in ascending order
sorted_indices = np.argsort(times.values.flatten())
# Reverses the order of the sorted indices to obtain a descending order
sorted_indices = sorted_indices[::-1]
# Retrieves the profile names corresponding to the sorted indices
sorted_profiles = times.index.values[sorted_indices]
# Select the top 20 profile names with the highest diffusion times
top_profiles = sorted_profiles[0:n_top]
# Selects the bottom 20 profile names with the lowest diffusion times
tail_profiles = sorted_profiles[-n_top:]
# Retrieves the top 20 profiles from the times DataFrame
times.loc[top_profiles,:]

[11]:

	average
Calb1	1.000000
Prkcg	0.960986
Gfap	0.946612
Apod	0.919918
Hpca	0.919918
Sst	0.915811
Car8	0.899384
Itpka	0.893224
Mgp	0.891170
Itpr1	0.887064
Dner	0.874743
Hbb-bt	0.872690
Gad1	0.862423
Igfbp2	0.856263
Vim	0.850103
Pcp4	0.841889
Nefm	0.835729
Gria1	0.833676
Fam107a	0.825462
Hba-a2	0.821355

[12]:

# Inspect detecition visually by using the "plot_profiles function for first 20 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 10,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,top_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[top_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )

../_images/Spatially_variable_gene_detection_Application_with_new_data_16_0.png

[13]:

# Inspect detecition visually by using the "plot_profiles function for last 20 SVG
# Define a custom pltargs dictionary with plot style options
pltargs = dict(s = 10,
                cmap = "magma",
                edgecolor = 'none',
                marker = 'H',
                )

# plot the profiles
fig,ax = ut.plot_profiles(cnt = data.cnt.loc[:,tail_profiles],
                          crd = data.real_crd,
                          rank_values = times.loc[tail_profiles,:].values.flatten(),
                          pltargs = pltargs,
                         )

../_images/Spatially_variable_gene_detection_Application_with_new_data_17_0.png