【流行病学】Melodi-Presto因果关联工具
title: “[流行病学] Melodi Presto因果关联工具”
date: 2022-12-08
lastmod: 2022-12-08
draft: false
tags: [“流行病学”,“因果关联工具”]
toc: true
autoCollapseToc: true
阅读介绍
Melodi-Presto: A fast and agile tool to explore semantic triples derived from biomedical literature1
triples: subject–predicate–object triple
SemMedDB 大型开放式知识库
使用入口
-
🚩在线工具 Web Application
-
API
-
Jupyter Notebooks
git 下载到json在提取
curl -X POST 'https://melodi-presto.mrcieu.ac.uk/api/overlap/' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "x": [ "diabetes " ], "y": [ "coronary heart disease" ]}' > 1.json
使用示例
X: KRAS
Y: lung cancer
输入的专业术语应该在Mesh先确定???
文章复现
doi: 10.1093/ije/dyab2032
{{< note >}} 1. 部分内容已经改变 2. Object的挑选精确到chronic 3. Predicate的挑选先无限制 4. Subject的挑选去掉了CRP,但是论文有纳入 5. OR的计算已经去掉? 6. gtf基因和[Uniprot蛋白名库](https://www.uniprot.org/uniprotkb?facets=model_organism%3A9606&query=reviewed%3Atrue)删掉 7. +药物库? {{< /note >}}library(openxlsx)
# read
df <- read.xlsx("chronic kidney disease.xlsx",
sheet = 1,
colNames=TRUE,
check.names=FALSE )
str(df$Pval)
df$Pval <- as.numeric(df$Pval)
# P value < 0.005
df <- subset(df,df$Pval < 0.005 )
# removed triples where the subject was a gene or protein
df$Subject <- tolower(df$Subject)
a=stringr::str_which(df$Subject,
pattern = "gene|protein|receptor")
# [waring:delete the CRP in the paper]
df$Subject[a]
df <- df[-a,]
# where the term “CAUSES” implies causality,
# the term “ASSOCIATED_WITH” implies association,
# and the term “COEXISTS_WITH” implies co-existence.
table(df$Predicate)
df <- subset(df,df$Predicate=="CAUSES"|
df$Predicate=="ASSOCIATED_WITH"|
df$Predicate=="COEXISTS_WITH")
# restricted to triples
# where the object contained either “kidney” or “renal”
table(df$Object)
dplyr::count(df,forcats::fct_lump_n(Object,n=10))
#
df$Object <- tolower(df$Object)
b=stringr::str_which(df$Object,
pattern = "kidney|renal")
df$Object[b]
df <- df[b,]
# removed2
df$Subject
c=stringr::str_which(df$Subject,
pattern = "\\|")
df$Subject[c]
df <- df[-c,]
#
df$Subject
c=stringr::str_which(df$Subject,
pattern = "factor")
df$Subject[c]
df <- df[-c,]
#
df$Subject
c=stringr::str_which(df$Subject,
pattern = "peptide")
df$Subject[c]
df <- df[-c,]
# retained only unique risk factors (subjects)
# to avoid duplicates
df <- dplyr::arrange(df,desc(Count),Pval)
df <- df[!duplicated(df$Subject),]
table(df$Count)
# df <- subset(df,df$Count>2)
write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE)
# enrichment odds ratio
# (a) count the number of these triples
# (b) the number of total triples matched to the query
# (c) the total number of these triples in the data base ,
# (d) and the total number of triples in the database .
# stats.fisher_exact([[a, b-a], [c, d-c]])
library(openxlsx)
# read
df <- read.xlsx("chronic kidney disease.xlsx",
sheet = 1,
colNames=TRUE,
check.names=FALSE )
str(df$Pval)
df$Pval <- as.numeric(df$Pval)
# P value < 0.005
df <- subset(df,df$Pval < 0.005 )
# removed triples where the subject was a gene or protein
df$Subject <- tolower(df$Subject)
a=stringr::str_which(df$Subject,
pattern = "gene|protein|receptor")
# [waring:delete the CRP in the paper]
df$Subject[a]
df <- df[-a,]
# where the term “CAUSES” implies causality,
# the term “ASSOCIATED_WITH” implies association,
# and the term “COEXISTS_WITH” implies co-existence.
table(df$Predicate)
df <- subset(df,df$Predicate=="CAUSES"|
df$Predicate=="ASSOCIATED_WITH"|
df$Predicate=="COEXISTS_WITH")
# restricted to triples
# where the object contained either “kidney” or “renal”
table(df$Object)
dplyr::count(df,forcats::fct_lump_n(Object,n=10))
#
df$Object <- tolower(df$Object)
b=stringr::str_which(df$Object,
pattern = "kidney|renal")
df$Object[b]
df <- df[b,]
# removed2
df$Subject
c=stringr::str_which(df$Subject,
pattern = "\\|")
df$Subject[c]
df <- df[-c,]
#
df$Subject
c=stringr::str_which(df$Subject,
pattern = "factor")
df$Subject[c]
df <- df[-c,]
#
df$Subject
c=stringr::str_which(df$Subject,
pattern = "peptide")
df$Subject[c]
df <- df[-c,]
# retained only unique risk factors (subjects)
# to avoid duplicates
df <- dplyr::arrange(df,desc(Count),Pval)
df <- df[!duplicated(df$Subject),]
table(df$Count)
# df <- subset(df,df$Count>2)
write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE)
# enrichment odds ratio
# (a) count the number of these triples
# (b) the number of total triples matched to the query
# (c) the total number of these triples in the data base ,
# (d) and the total number of triples in the database .
# stats.fisher_exact([[a, b-a], [c, d-c]])
NHANES
注意事项, 参考文章复现
doi: 10.1093/bioinformatics/btaa726 ↩︎
Trans-ethnic Mendelian-randomization
study reveals causal relationships between
cardiometabolic factors and chronic kidney
disease ↩︎