一 生信文章复现学习 PMID:36898287( 四 )


深入分析
此处深入分析主要是指和表型的联系
模块与表型相关性热图的构建
###通过计算表型与module的相关性,找出相关性高的gene module,推测可能是因为它们造成了表型差异 。moduleLabelsAutomatic = net$colorsmoduleColorsAutomatic = labels2colors(moduleLabelsAutomatic)moduleColorsWW = moduleColorsAutomaticMEs0 = moduleEigengenes(datExpr, moduleColorsWW)$eigengenes##表达矩阵不要错MEsWW = orderMEs(MEs0)modTraitCor = cor(MEsWW, samples, use = "p")##分组samples不要错colnames(MEsWW)modlues=MEsWWmodTraitP = corPvalueStudent(modTraitCor, nSamples)textMatrix = paste(signif(modTraitCor, 2), "\n(", signif(modTraitP, 1), ")", sep = "")dim(textMatrix) = dim(modTraitCor)##做相关性热图pdf("4Module-trait.pdf",width = 6, height = 6)labeledHeatmap(Matrix = modTraitCor, xLabels = colnames(samples), yLabels = names(MEsWW), cex.lab = 0.5,yColorWidth=0.01, xColorWidth = 0.03,ySymbols = colnames(modlues), colorLabels = FALSE, colors = blueWhiteRed(50), textMatrix = textMatrix, setStdMargins = FALSE, cex.text = 0.5, zlim = c(-1,1), main = paste("Module-trait relationships"))dev.off()
计算GS及MM
(这一部分原文没有)
基因和性状的相关性:geneGS
基因和模块的相关性:MM
选择我们感兴趣的性状(临床信息),这里是SS,也就是是否有NASH 。
MEs = orderMEs(MEs0)# Define variable weight containing the weight column of datTraitcluster = as.data.frame(samples$SS) # 我们选SSnames(cluster) = "cluster"# names (colors) of the modulesmodNames = substring(names(MEs), 3)# 基因和模块的相关性及P值geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"));MMPvalue = http://www.kingceram.com/post/as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));names(geneModuleMembership) = paste("MM", modNames, sep="");names(MMPvalue) = paste("p.MM", modNames, sep="");# 基因和性状的相关性,这里是和样本亚型的相关性geneTraitSignificance = as.data.frame(cor(datExpr, cluster, use = "p"));GSPvalue = http://www.kingceram.com/post/as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));names(geneTraitSignificance) = paste("GS.", names(cluster), sep="");names(GSPvalue) = paste("p.GS.", names(cluster), sep="");
选择GS和MM都很高的分子
#画个图展示下每个基因的GS和MM的分布情况:module = "yellow"column = match(module, modNames);moduleGenes = moduleColors==module;sizeGrWindow(7, 7);par(mfrow = c(1,1));verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),abs(geneTraitSignificance[moduleGenes, 1]),xlab = paste("Module Membership in", module, "module"),ylab = "Gene significance for body weight",main = paste("Module membership vs. gene significance\n"),cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)#接下来就是提取出GS和MM都很高的基因:tmp1 <- rownames(geneModuleMembership)[abs(geneModuleMembership[moduleGenes, column])>0.7]tmp2 <- rownames(geneTraitSignificance)[abs(geneTraitSignificance[moduleGenes, 1])>0.7] ##此处0.7是可以升高或降低的genes <- unique(intersect(tmp1,tmp2))length(genes)
【一生信文章复现学习 PMID:36898287】按照原文是采用相关系数最大的模块,最终采用模块,共1500左右基因,和原文有出入 。