当前位置: 首页 > article >正文

cellphoneDB进行CCI以及可视化

除了cellchat,在单细胞转录组或者空间组的分析中,cellphoneDB也是一个常用的细胞通讯软件,这个数据库更注重配受体关系,对于有明确先验知识的配受体研究比较友好。

但值得注意的是,它的数据库只包括人的基因名称信息,所以在开始前,可以参考biomaRt进行同源基因转换-CSDN博客转换物种的同源基因名称

导入环境

import numpy as np
import pandas as pd
import scanpy as sc
import os
import sys
from scipy import sparse

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
import os
import anndata as ad
import pandas as pd
import ktplotspy as kpy
import matplotlib.pyplot as plt
%matplotlib inline

ktplotspy是利用cellphoneDB结果进行后续可视化的一个R包,安装可见GitHub - zktuong/ktplotspy: Python library for plotting Cellphonedb results. Ported from ktplots R package.

同源基因转换

adata = sc.read('/data/work/your_scRNA.h5ad')

gene_names = adata.var_names.tolist()
gene_names_df = pd.DataFrame(gene_names, columns=['gene_name'])
gene_names_df.to_csv('adata_gene_names.csv', index=False)

#biomart转换后得到新的csv

replacement_df = pd.read_csv('mouse_homologous_gene_conversion.csv')
replacement_df.columns = ['Gene.name', 'Gene.name.1','Chromosome.scaffold.name']
gene_mapping = dict(zip(replacement_df['Gene.name'], replacement_df['Gene.name.1']))

new_gene_names = [gene_mapping.get(gene, None) for gene in gene_names]

# 仅保留有对应人类基因名的基因
valid_indices = [i for i, gene in enumerate(new_gene_names) if gene is not None]
filtered_adata = adata[:, valid_indices]
filtered_adata.var_names = [new_gene_names[i] for i in valid_indices]

filtered_adata.obs.index = filtered_adata.obs.index.astype(str)
filtered_adata.var.index = filtered_adata.var.index.astype(str)
filtered_adata.write_h5ad('your_new_scRNA.h5ad')

保存替换完同源基因的anndata对象后可以重新读取,进行后续的分析,这里没有对应的同源基因被我过滤掉了(如果有更全面的分析需求可以注意一下这一点)

提取注释标签

#重新读取同源替换后的数据
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

adata.shape#检查数据大小
adata.obs['annotation'].values.describe()#检查注释信息

adata.obs['barcode_sample']=list(adata.obs_names)
adata.obs['annotation']=adata.obs['annotation'].astype(str)
adata.obs['cell_type']=list(adata.obs['annotation'])
meta=adata.obs.loc[:,['barcode_sample','cell_type']]
meta.to_csv('bin50_meta.tsv', sep = '\t')
#检查对应
list(adata.obs.index).sort() == list(df_meta['barcode_sample']).sort()

提取后续分析需要的注释信息表,其中'barcode_sample'是细胞唯一标识符,'annotation'是我的注释信息存放的列,导出为bin50_meta.tsv

运行cellphoneDB

设置好数据库路径,注释信息表路径,标准化后的anndata对象路径以及输出路径。

我是直接将v5数据库下载使用,因为我的环境不能联网,在线下载可能会方便很多,网上有很多教程。直接下载的路径见GitHub - ventolab/cellphonedb-data

cpdb_file_path = '/data/work/CCI_cellphoneDB/cellphonedb.zip'
meta_file_path = '/data/work/CCI_cellphoneDB/bin50_meta.tsv'
counts_file_path = '/data/work/CCI_cellphoneDB/bin50_annotated_normlog.h5ad'
out_path = '/data/work/CCI_cellphoneDB/bin50'

cpdb_results = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellphoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
 #   active_tfs_file_path = active_tf_path,           # optional: defines cell types and their active TFs.
 #  microenvs_file_path = microenvs_file_path,       # optional (default: None): defines cells per microenvironment.
    score_interactions = True,                       # optional: whether to score interactions or not. 
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.05,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 5,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.1,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )

运行日志以及输出的文件如下

读取结果可视化

相比cellchat,这个软件自带的可视化功能会少很多

自带可视化热图

kpy.plot_cpdb_heatmap(pvals = cpdb_results['pvalues'],
                      degs_analysis = False,
                      figsize = (5, 5),
                      title = "Sum of significant interactions")

R可视化点图

但是利用输出文件也可以在R环境下进行很多想要的可视化方式,首先导入R环境和包

library(ktplots)
library(Seurat)
library(tidyverse)
library(data.table)
library(dplyr)
library(readr)

导入需要的结果表和RDS格式数据

#cellphoneDB结果文件
mypvals <- read.table("/data/work/CCI_cellphoneDB/bin50/statistical_analysis_pvalues_12_26_2024_220659.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F)
mymeans <- read.table("/data/work/CCI_cellphoneDB/bin50/statistical_analysis_means_12_26_2024_220659.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F) 
mydecon <- read.table("/data/work/CCI_cellphoneDB/bin50/statistical_analysis_deconvoluted_12_26_2024_220659.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F)

#单细胞数据
scRNA = read_rds("your_scRNA.rds")

绘制全部点图

p <- plot_cpdb(cell_type1 = "", cell_type2 = "", scdata = scRNA, means = mymeans, pvals = mypvals,
               celltype_key = "annotation",
               highlight_col = "blue",
               keep_significant_only=T) +
  theme(axis.text = element_text(size = 5, color = 'black'))

ggsave("plot_cpdb_output.png", p, width = 50, height = 40, units = "in", limitsize = FALSE)

如果这里cell_type空白则默认绘制所有分群的点图,也可以设置一个或者两个绘制一对多和一对一的配受体点图,效果如下

分类展示基因家族

这里参数还有一个family,可以分类展示不同家族的配受体关系

gene_families <- c("chemokines", "th1", "th2", "th17", "treg", "costimulatory", "coinhibitory")

for(family in gene_families) {
  tryCatch({
    p <- plot_cpdb(cell_type1 = "PT-S1", cell_type2 = "", scdata = scRNA, means = mymeans, pvals = mypvals,
                   celltype_key = "annotation_res1_1",
                   gene_family = family, # 指定基因家族
                   highlight_col = "blue",
                   keep_significant_only=T) +
      theme(axis.text = element_text(size = 10, color = 'black'))
    
    ggsave(paste0("plot_cpdb_PTS1_", family, ".png"), p, width = 20, height = 30, units = "in", limitsize = FALSE)
  }, error = function(e) {
    message(paste("No significant genes found for", family, "and plotting will not proceed."))
  })
}

http://www.kler.cn/a/457677.html

相关文章:

  • ArcgisServer过了元旦忽然用不了了?许可过期
  • ESP32-S3遇见OpenAI:OpenAI官方发布ESP32嵌入式实时RTC SDK
  • Java高频面试之SE-08
  • Linux驱动开发学习准备(Linux内核源码添加到工程-Workspace)
  • 大模型系列——旋转位置编码和长度外推
  • 路径规划 | 基于极光PLO优化算法的三维路径规划Matlab程序
  • TCP网络编程(二)—— 服务器端的编写
  • Upload-labs 靶场(学习)
  • 【Linux】Socket编程-UDP构建自己的C++服务器
  • 3.微服务灰度发布落地实践(组件灰度增强)
  • AI 自动化编程的现状与局限
  • delete,drop,truncate的区别
  • ChatGPT与Postman协作完成接口测试(四)
  • sql注入杂谈(一)--union select
  • Mysql(MGR)和ProxySQL搭建部署-Kubernetes版本
  • 【机器学习篇】穿越数字迷雾:机器深度学习的智慧领航
  • 【Hackthebox 中英 Write-Up】Manipulating a CRUD API | 操控 CRUD API:一步步提取 Flag
  • 一个线程中总共3个串行任务,在另一个线程中展示任务进行的实施进度。
  • XXL-TOOL v1.3.2 发布 | Java工具类库
  • 【10】Selenium+Python UI自动化测试 邮件发送测试报告(某积载系统实例-04)
  • Mac 安装Mysql启动Mysql以及数据库的常规操作
  • Python 中常见的一些画图形式
  • driftingblues6_vh靶机
  • 开源 AI 智能名片商城小程序:个人 IP 运营赋能商业腾飞
  • 计算机网络:TCP/IP网络协议
  • 【代码随想录|完全背包问题】