一生信文章复现学习 PMID：36898287( 二 ) _基因

本处limma包共得到409个DEGs和原文数量410很接近，继续加油，尝试WGCNA
WGCNA分析 WGCNA的目的是什么？
一言以蔽之，对划分不同基因，将相似表达的基因放在一起，探讨基因群和表型的关系
注意事项代码开始数据预处理过滤基因数据
首先是过滤低质量的基因，官方建议先自己过滤一下，通过均值、方差、绝对中位差都可以，不建议通过差异分析筛选基因。因为原文纳入了所有基因（>20000个基因）构建WGCNA网络，所以此处我也纳入全部基因，同样，也可以采用官网方法来先过滤基因。
library(WGCNA)#ex_ma是经过limma包标准化后的芯片数据load('D:\\BioR\\new\\NAFLD_Bio\\02WGCNA\\ex_ma.Rdata')datExpr0 <- as.data.frame(t(ex_ma))##转置数据###########start handling the data##1.看看有没有为空的值，需要剔除。这里下载的都是原作处理好的。gsg <- goodSamplesGenes(datExpr0,verbose = 3)gsg$allOK ##返回为true证明数据可以，如果false需要运行一段代码，网上可得
返回FALSE就用以下代码删除不符合条件的基因。
# 不OK就需要筛选if (!gsg$allOK){# Optionally, print the gene and sample names that were removed:if (sum(!gsg$goodGenes)>0) printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));if (sum(!gsg$goodSamples)>0) printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));# Remove the offending genes and samples from the data:datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]}gsg <- goodSamplesGenes(datExpr0)gsg$allOK
得到true就可以
过滤样本数据
datExpr1 <- datExpr0 #3防止出错，我们先建一个复制一个数据##2.判断是否有离群样本#（1）通过聚类分析，能看出是否有个别sample跟多数sample的距离较远，决定是否需要去除离群样本。sampleTree = hclust(dist(datExpr1), method = "average")par(cex = 0.7);par(mar = c(0,4,2,0))plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,cex.axis = 1.5, cex.main = 2)##本处有两个离群样本，需要剔除，根据Y轴提示选定hegiht为150可以剔除离群样本plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2) +#想用哪里切，就把“h = 150”和“cutHeight = 150”中的“150”换成你的cutoffabline(h = 150, col = "red") ##（2）运行下段代码，剔除离群样本2个，得到新的表达量datExprclust = cutreeStatic(sampleTree, cutHeight = 150, minSize = 10)keepSamples = (clust==1)datExpr = datExpr1[keepSamples, ]nGenes = ncol(datExpr)nSamples = nrow(datExpr)dim(datExpr)##（3）再次画图验证是否已成功剔除sampleTree = hclust(dist(datExpr), method = "average")par(cex = 0.7);par(mar = c(0,4,2,0))plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,cex.axis = 1.5, cex.main = 2)
这个地方第一次分类，发现有两个离群样本，因为远离其他样本，所以删掉，但是删掉后和原文作图不一致，不清楚原文作者是否删除，以及那两个样本是否属于离群样本。
就是准备好的转置后的表达矩阵了，接下来就用它进行后续的分析。
处理分组或表型信息
此处预先处理分组或表型等信息
#1.首先得到分组信息，分组信息直接拷贝的GEO信息，因为规律明显，容易使用R处理group <- fread("group.txt",data.table = F)library(stringr)##观察分组信息，发现‘_’为分隔，采用split函数分开字符串，并取第二个group$group2=trimws(str_split(group$group,'_',simplify = T)[,2])table(group$group2)##此处提取HC和SS的分组信息，grep和grepl都可以，grep是提取包含某一或某些字符的行##提取group2列里包含HC或者SS的行group


上一页
1
2
3
4
下一页
		  	









一毛硬币多重  一毛硬币多重是一百元 

炉石传说多久一个版本  炉石传说一年更新几个版本 

如何开发一个求职招聘小程序？详细步骤解析与教程 

南京最冷的天气是多少 南京一般几月份才开始冷 

智慧城市的抓手：平安城市是一种服务 

十  GEO生信数据挖掘肺结核数据-差异分析 

用 Python 开发一个【视频营销号】生成器 

WGCNA的使用 

如何让ChatGPT生成图片？ 

一年的打磨，MNN正式版发布！

一 生信文章复现学习 PMID：36898287( 二 )

一生信文章复现学习 PMID：36898287( 二 )