一生信文章复现学习 PMID：36898287( 三 ) _基因

<-group[grep('HC|SS',group$group2),] group <- group[,-2]group <- subset(group,group$symbol !='GSM2385767') ##此处删除两个离群样本，不知道为啥报错，只能用subset一个一个删rownames(group) <- group[,1]##第一列生成列名，然后删除原第一列samples <- dplyr::select(group,'group2')samples$group2 <- ifelse(samples$group2=='SS',1,0)samples$group3 <- ifelse(samples$group2=='0',1,0)colnames(samples) <- c('SS','HC')
构建gene确定软阈值
软阈值一般要求R^2在0.9以上，最小也要在0.8以上。下面是挑选的代码，基本不用改。
#WGCNA分析的关键是找gene module 。先选择合适的阈值，通过构建网络找gene module，#找出来的gene module可信度如何？要做Preservation，去除not preserved module 。#这样找出的共表达的gene module就可以用于下一步分析了。#（1）选择构建网络的合适阈值#通过这步计算，找出scale free topology modle fit接近0.9的最小power（soft threshold），用于下一步构建网络。powers = c(c(1:10), seq(from = 12, to=20, by=2))sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)##做两个图看趋势，得出power=16（原文是16）pdf("1Threshold.pdf",width = 10, height = 5)par(mfrow = c(1,2))cex1 = 0.9plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",main = paste("Scale independence")) +text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red")+abline(h=0.90,col="red")plot(sft$fitIndices[,1], sft$fitIndices[,5],xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",main = paste("Mean connectivity")) +text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")dev.off()
结果就是看这两个图，选择达到0.9以上的软阈值，右边的图是连接度，选择逐渐平稳的阈值。

文章插图
代码也会自动给出一个合适的软阈值：注意代码推荐的和你最终选择的不一定一致，根据原文，我们选择16
sft$powerEstimate
构建

###（2）构建网络，找出gene modulenet = blockwiseModules(datExpr, power = 16,TOMType = "unsigned", minModuleSize = 30,reassignThreshold = 0, mergeCutHeight = 0.25,numericLabels = TRUE, pamRespectsDendro = FALSE,saveTOMs = TRUE,saveTOMFileBase = "MyTOM",verbose = 3)##本代码运行完毕可找的划分的基因模块table(net$colors)##具体看各个模块的基因数目names(net)##可以查看net的项目

allowWGCNAThreads()#RStudio不支持enableWGCNAThreads()##一步法构建网络和识别模块 。计算时用到了BLAS，所以这一步是可以提速的，R自带的blas非常慢

可视化

####（3）gene module的可视化mergedColors = labels2colors(net$colors)pdf("2module.pdf",width = 10, height = 5)plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]], "Module colors",dendroLabels = FALSE, hang = 0.03,addGuide = TRUE, guideHang = 0.05)dev.off()

和原文差别较大，希望指正。
导出模块基因

##（4）把gene module输出到文件#输出每个module内的基因，用于后续分析作图，随便你DIYmoduleColors = labels2colors(net$colors)color<-unique(moduleColors)for (iin 1:length(color)) {y=t(assign(paste(color[i],"expr",sep = "."),datExpr[moduleColors==color[i]]))write.csv(y,paste(color[i],"csv",sep = "."),quote = F)}

导出每一个模块的基因及其表达量，注意此处是删除了离群样本的表达量

一 生信文章复现学习 PMID：36898287( 三 )

一生信文章复现学习 PMID：36898287( 三 )