相关性系数是判别两个样品之间相似度的一种度量方式,如果有多个样品,你打算看看他们之间的相似性,比较好的一种办法就是画corplot
这里给出来一个R语言的ggplot版本的方法,
install.packages("corrplot")
library(corrplot)
rna<-read.table("merge.xls",header=TRUE)
data<-subset(rna,sum1>0)
rnacor<-cor(data[2:7])
#这里用了col1的概念,其实是corplot官方说明文档上的一段代码,挺好用,大家也可以直接拷过去
col1 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","white", "cyan", "#007FFF", "blue","#00007F")) col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582", "#FDDBC7", "#FFFFFF", "#D1E5F0", "#92C5DE", "#4393C3", "#2166AC", "#053061")) col3 <- colorRampPalette(c("red", "white", "blue")) col4 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","#7FFF7F", "cyan", "#007FFF", "blue","#00007F")) wb <- c("white","black") par(ask = TRUE)
这里有一个挺有意思的事情,就是在算person相关性系数的时候,如果在过滤的过程中,把一行全部是0的行去掉,相关性系数会很低,如果把一行内最小值是0的都过滤掉,那么相关性系数就会很高,这个具体算法的取舍,因为皮尔逊相关系数在计算的过程中对0敏感,希望读者在具体项目中要加以注意,这里测试过一下,如果把0赋予一个比较小的值,其相关性系数还是比较低,这样来看应该是较小的数值对pearson系数构成了影响,直接把是0的过滤掉,相关性系数就很高啦。
数据:
10S | 1S | 3S | 6S | 8S | 9S |
1.72953 | 2.49995 | 2.8954 | 3.97646 | 3.04071 | 1.72953 |
243.827 | 270.098 | 301.783 | 343.476 | 279.752 | 243.827 |
42.7525 | 39.2311 | 46.4 | 32.9699 | 44.5847 | 42.7525 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0.14908 | 0 |
0.130414 | 0.067166 | 0.097597 | 0.033922 | 0.260782 | 0.130414 |
47.5932 | 58.1898 | 62.7048 | 86.8684 | 71.1578 | 47.5932 |
48.0345 | 59.2795 | 63.4637 | 88.6718 | 71.0701 | 48.0345 |
47.3283 | 58.3714 | 62.9383 | 87.0212 | 70.4561 | 47.3283 |
363.076 | 417.626 | 444.945 | 629.738 | 542.223 | 363.076 |
16.7139 | 18.9893 | 18.0218 | 19.559 | 19.8357 | 16.7139 |
0 | 0 | 0 | 0 | 0 | 0 |
18.9541 | 22.6626 | 23.7261 | 23.8286 | 23.4944 | 18.9541 |
171.233 | 150.81 | 164.34 | 174.828 | 199.335 | 171.233 |
2.72481 | 7.13037 | 5.63469 | 3.85679 | 5.39862 | 2.72481 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
1.28784 | 1.21915 | 0.642922 | 1.17715 | 0.644234 | 1.28784 |