当前位置: 首页 > article >正文

seurat -- 关于DE gene的讨论

实例

# 加载演示数据集
library(Seurat)
library(SeuratData)
pbmc <- LoadData("pbmc3k", type = "pbmc3k.final")
# list options for groups to perform differential expression on
levels(pbmc)

## [1] "Naive CD4 T"  "Memory CD4 T" "CD14+ Mono"   "B"            "CD8 T"       
## [6] "FCGR3A+ Mono" "NK"           "DC"           "Platelet"
# 默认使用非参秩和检验做差异性分析
# Find differentially expressed features between CD14+ and FCGR3A+ Monocytes
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono")
# view results
head(monocyte.de.markers)
# Find differentially expressed features between CD14+ Monocytes and all other cells, only
# search for positive markers
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = NULL, only.pos = TRUE)
# view results
head(monocyte.de.markers)
# 过滤掉一些细胞加速运算,比如一些gene只在cluster的少部分细胞里表达
# Pre-filter features that are detected at <50% frequency in either CD14+ Monocytes or FCGR3A+
# Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.pct = 0.5))
# Pre-filter features that have less than a two-fold change between the average expression of
# CD14+ Monocytes vs FCGR3A+ Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", logfc.threshold = log(2)))
# 参数检验的时候如果数据量不一样计算的误差会较大
# 这里直接把数据量差不多的gene过滤掉了,难道非参数检验不关注数据量的问题?
# Pre-filter features whose detection percentages across the two groups are similar (within
# 0.25)
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.diff.pct = 0.25))
# 还可以对大的cluster进行抽样获取最多200个细胞进行DE分析
# 个人感觉cluster内部其实也不均一,抽样会更不均一,假阳性会增加吧
# Subsample each group to a maximum of 200 cells. Can be very useful for large clusters, or
# computationally-intensive DE tests
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", max.cells.per.ident = 200))

Perform DE analysis using alternative tests

在这里插入图片描述
其中 MAST and DESeq2需要自己额外安装。
t-test是个人最不推荐使用的方法。

# Test for DE features using the MAST package
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"))
# Test for DE features using the DESeq2 package. Throws an error if DESeq2 has not already
# been installed Note that the DESeq2 workflows can be computationally intensive for large
# datasets, but are incompatible with some feature pre-filtering options We therefore suggest
# initially limiting the number of cells used for testing
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "DESeq2", max.cells.per.ident = 50))

具体参数

在这里插入图片描述

  • object
    An object

  • slot
    Slot to pull data from; note that if test.use is “negbinom”, “poisson”, or “DESeq2”, slot will be set to “counts”。这里就很有意思,不同的normalization方法处理后的数据放在了不同的data下面,具体是哪个?对结果影响怎样呢?

  • counts
    Count matrix if using scale.data for DE tests. This is used for computing pct.1 and pct.2 and for filtering features based on fraction expressing

  • cells.1
    Vector of cell names belonging to group 1

  • cells.2
    Vector of cell names belonging to group 2

  • features
    Genes to test. Default is to use all genes

  • logfc.threshold
    Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Default is 0.25 Increasing logfc.threshold speeds up the function, but can miss weaker signals.

  • test.use
    Denotes which test to use. Available options are:

    • “wilcox” : Identifies differentially expressed genes between two groups of cells using a Wilcoxon Rank Sum test (default)

    • “bimod” : Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)

    • “roc” : Identifies ‘markers’ of gene expression using ROC analysis. For each gene, evaluates (using AUC) a classifier built on that gene alone, to classify between two groups of cells. An AUC value of 1 means that expression values for this gene alone can perfectly classify the two groupings (i.e. Each of the cells in cells.1 exhibit a higher level than each of the cells in cells.2). An AUC value of 0 also means there is perfect classification, but in the other direction. A value of 0.5 implies that the gene has no predictive power to classify the two groups. Returns a ‘predictive power’ (abs(AUC-0.5) * 2) ranked matrix of putative differentially expressed genes.

    • “t” : Identify differentially expressed genes between two groups of cells using the Student’s t-test.

    • “negbinom” : Identifies differentially expressed genes between two groups of cells using a negative binomial generalized linear model. Use only for UMI-based datasets

    • “poisson” : Identifies differentially expressed genes between two groups of cells using a poisson generalized linear model. Use only for UMI-based datasets

    • “LR” : Uses a logistic regression framework to determine differentially expressed genes. Constructs a logistic regression model predicting group membership based on each feature individually and compares this to a null model with a likelihood ratio test.

    • “MAST” : Identifies differentially expressed genes between two groups of cells using a hurdle model tailored to scRNA-seq data. Utilizes the MAST package to run the DE testing.

    • “DESeq2” : Identifies differentially expressed genes between two groups of cells based on a model using DESeq2 which uses a negative binomial distribution (Love et al, Genome Biology, 2014).This test does not support pre-filtering of genes based on average difference (or percent detection rate) between cell groups. However, genes may be pre-filtered based on their minimum detection rate (min.pct) across both cell groups. To use this method, please install DESeq2, using the instructions at https://bioconductor.org/packages/release/bioc/html/DESeq2.html

  • min.pct
    only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.1

  • min.diff.pct
    only test genes that show a minimum difference in the fraction of detection between the two groups. Set to -Inf by default。

  • verbose
    Print a progress bar once expression testing begins

  • only.pos
    Only return positive markers (FALSE by default)

  • max.cells.per.ident
    Down sample each identity class to a max number. Default is no downsampling. Not activated by default (set to Inf)

  • random.seed
    Random seed for downsampling

  • latent.vars
    Variables to test, used only when test.use is one of ‘LR’, ‘negbinom’, ‘poisson’, or ‘MAST’

  • min.cells.feature
    Minimum number of cells expressing the feature in at least one of the two groups, currently only used for poisson and negative binomial tests

  • min.cells.group
    Minimum number of cells in one of the groups

  • pseudocount.use
    Pseudocount to add to averaged expression values when calculating logFC. 1 by default.

  • fc.results
    data.frame from FoldChange

  • densify
    Convert the sparse matrix to a dense form before running the DE test. This can provide speedups but might require higher memory; default is FALSE

  • mean.fxn
    Function to use for fold change or average difference calculation. If NULL, the appropriate function will be chose according to the slot used

  • fc.name
    Name of the fold change, average difference, or custom function column in the output data.frame. If NULL, the fold change column will be named according to the logarithm base (eg, “avg_log2FC”), or if using the scale.data slot “avg_diff”.

  • base
    The base with respect to which logarithms are computed.

  • norm.method
    Normalization method for fold change calculation when slot is “data”

  • recorrect_umi
    Recalculate corrected UMI counts using minimum of the median UMIs when performing DE using multiple SCT objects; default is TRUE

  • ident.1
    Identity class to define markers for; pass an object of class phylo or ‘clustertree’ to find markers for a node in a cluster tree; passing ‘clustertree’ requires BuildClusterTree to have been run

  • ident.2
    A second identity class for comparison; if NULL, use all other cells for comparison; if an object of class phylo or ‘clustertree’ is passed to ident.1, must pass a node to find markers for

  • group.by
    Regroup cells into a different identity class prior to performing differential expression (see example)

  • subset.ident
    Subset a particular identity class prior to regrouping. Only relevant if group.by is set (see example)

  • assay
    Assay to use in differential expression testing

  • reduction
    Reduction to use in differential expression testing - will test for DE on cell embeddings

在这里插入图片描述


http://www.kler.cn/a/15879.html

相关文章:

  • MySQL初学之旅(3)约束
  • SpringBoot(5)-SpringSecurity
  • 数据分析-48-时间序列变点检测之在线实时数据的CPD
  • Linux---常用shell脚本
  • 计算机视觉 ---常见图像文件格式及其特点
  • Pytorch如何将嵌套的dict类型数据加载到GPU
  • vue性能优化之虚拟列表滚动
  • 第七章 单行函数
  • 荔枝派Zero(全志V3S)驱动开发之串口
  • 使用docker部署prometheus最新版本2.43.0
  • 你买票了吗?五一火车票发售量创历史新高,车票总发售2209万张票
  • python之面向对象练习题(七)
  • 芯片封装基本流程及失效分析处理方法
  • 通知所有员工所需的时间
  • 【Android -- 开源库】数据库 Realm 的基本使用
  • Mysql数据库的备份恢复
  • 赞!数字中国建设峰会上的金仓风采
  • ubuntu22安装redis7.0
  • 使用 ESP32 设计智能手表第 2 部分 - 环境光和心率传感器
  • 算法套路十四——动态规划之背包问题:01背包、完全背包及各种变形
  • linux_线程锁mutex(互斥量)_线程同步_死锁现象_pthread_mutex_lock函数_pthread_mutex_unlock函数_死锁现象
  • 操作系统之内存管理
  • 把字符串转换成整数
  • Python使用AI photo2cartoon制作属于你的漫画头像
  • Nautilus Chain 测试网第二阶段,推出忠诚度计划及广泛空投
  • 怎样解决高并发下的I/O瓶颈?