One of NetCoMi’s strengths is the ability to compare networks between
two groups. The netCompare()
function is used for this
task.
Network construction
The amgut data set is split by "SEASONAL_ALLERGIES"
leading to two subsets of samples (with and without seasonal allergies).
We ignore the “None” group.
data("amgut2.filt.phy")
# Split the phyloseq object into two groups
amgut_season_yes <- phyloseq::subset_samples(amgut2.filt.phy,
SEASONAL_ALLERGIES == "yes")
amgut_season_no <- phyloseq::subset_samples(amgut2.filt.phy,
SEASONAL_ALLERGIES == "no")
amgut_season_yes
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 138 taxa and 121 samples ]
#> sample_data() Sample Data: [ 121 samples by 166 sample variables ]
#> tax_table() Taxonomy Table: [ 138 taxa by 7 taxonomic ranks ]
amgut_season_no
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 138 taxa and 163 samples ]
#> sample_data() Sample Data: [ 163 samples by 166 sample variables ]
#> tax_table() Taxonomy Table: [ 138 taxa by 7 taxonomic ranks ]
The 50 nodes with highest variance are selected for network construction to get smaller networks.
We filter the 121 samples (sample size of the smaller group) with highest frequency to make the sample sizes equal and thus ensure comparability.
n_yes <- phyloseq::nsamples(amgut_season_yes)
# Network construction
net_season <- netConstruct(data = amgut_season_no,
data2 = amgut_season_yes,
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
filtSamp = "highestFreq",
filtSampPar = list(highestFreq = n_yes),
measure = "spring",
measurePar = list(nlambda = 10,
rep.num = 10,
Rmethod = "approx"),
normMethod = "none",
zeroMethod = "none",
sparsMethod = "none",
dissFunc = "signed",
verbose = 2,
seed = 123456)
#> Checking input arguments ... Done.
#> Data filtering ...
#> 42 samples removed in data set 1.
#> 0 samples removed in data set 2.
#> 96 taxa removed in each data set.
#> 1 rows with zero sum removed in group 2.
#> 42 taxa and 121 samples remaining in group 1.
#> 42 taxa and 120 samples remaining in group 2.
#>
#> Calculate 'spring' associations ... Registered S3 method overwritten by 'dendextend':
#> method from
#> rev.hclust vegan
#> Registered S3 method overwritten by 'seriation':
#> method from
#> reorder.hclust vegan
#> Done.
#>
#> Calculate associations in group 2 ... Done.
Alternatively, a group vector could be passed to group
,
according to which the data set is split into two groups:
# Get count table
countMat <- phyloseq::otu_table(amgut2.filt.phy)
# netConstruct() expects samples in rows
countMat <- t(as(countMat, "matrix"))
group_vec <- phyloseq::get_variable(amgut2.filt.phy, "SEASONAL_ALLERGIES")
# Select the two groups of interest (level "none" is excluded)
sel <- which(group_vec %in% c("no", "yes"))
group_vec <- group_vec[sel]
countMat <- countMat[sel, ]
net_season <- netConstruct(countMat,
group = group_vec,
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
filtSamp = "highestFreq",
filtSampPar = list(highestFreq = n_yes),
measure = "spring",
measurePar = list(nlambda=10,
rep.num=10,
Rmethod = "approx"),
normMethod = "none",
zeroMethod = "none",
sparsMethod = "none",
dissFunc = "signed",
verbose = 3,
seed = 123456)
Network analysis
The object returned by netConstruct()
containing both
networks is again passed to netAnalyze()
. Network
properties are computed for both networks simultaneously.
To demonstrate further functionalities of netAnalyze()
,
we play around with the available arguments, even if the chosen setting
might not be optimal.
-
centrLCC = FALSE
: Centralities are calculated for all nodes (not only for the largest connected component). -
avDissIgnoreInf = TRUE
: Nodes with an infinite dissimilarity are ignored when calculating the average dissimilarity. -
sPathNorm = FALSE
: Shortest paths are not normalized by average dissimilarity. -
hubPar = c("degree", "eigenvector")
: Hubs are nodes with highest degree and eigenvector centrality at the same time. -
lnormFit = TRUE
andhubQuant = 0.9
: A log-normal distribution is fitted to the centrality values to identify nodes with “highest” centrality values. Here, a node is identified as hub if for each of the three centrality measures, the node’s centrality value is above the 90% quantile of the fitted log-normal distribution. - The non-normalized centralities are used for all four measures.
Note! The arguments must be set carefully, depending on the research questions. NetCoMi’s default values are not generally preferable in all practical cases!
props_season <- netAnalyze(net_season,
centrLCC = FALSE,
avDissIgnoreInf = TRUE,
sPathNorm = FALSE,
clustMethod = "cluster_fast_greedy",
hubPar = c("degree", "eigenvector"),
hubQuant = 0.9,
lnormFit = TRUE,
normDeg = FALSE,
normBetw = FALSE,
normClose = FALSE,
normEigen = FALSE)
#> Warning: The `scale` argument of `eigen_centrality()` always as if TRUE as of igraph
#> 2.1.1.
#> ℹ Normalization is always performed
#> ℹ The deprecated feature was likely used in the NetCoMi package.
#> Please report the issue at <https://github.com/stefpeschel/NetCoMi/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
summary(props_season)
#>
#> Component sizes
#> ```````````````
#> group '1':
#> size: 28 1
#> #: 1 14
#> group '2':
#> size: 31 8 1
#> #: 1 1 3
#> ______________________________
#> Global network properties
#> `````````````````````````
#> Largest connected component (LCC):
#> group '1' group '2'
#> Relative LCC size 0.66667 0.73810
#> Clustering coefficient 0.15161 0.27111
#> Modularity 0.62611 0.45823
#> Positive edge percentage 86.66667 100.00000
#> Edge density 0.07937 0.12473
#> Natural connectivity 0.04539 0.04362
#> Vertex connectivity 1.00000 1.00000
#> Edge connectivity 1.00000 1.00000
#> Average dissimilarity* 0.67251 0.68178
#> Average path length** 3.40008 1.86767
#>
#> Whole network:
#> group '1' group '2'
#> Number of components 15.00000 5.00000
#> Clustering coefficient 0.15161 0.29755
#> Modularity 0.62611 0.55684
#> Positive edge percentage 86.66667 100.00000
#> Edge density 0.03484 0.08130
#> Natural connectivity 0.02826 0.03111
#> -----
#> *: Dissimilarity = 1 - edge weight
#> **: Path length = Sum of dissimilarities along the path
#>
#> ______________________________
#> Clusters
#> - In the whole network
#> - Algorithm: cluster_fast_greedy
#> ````````````````````````````````
#> group '1':
#> name: 0 1 2 3 4 5
#> #: 14 7 6 5 4 6
#>
#> group '2':
#> name: 0 1 2 3 4 5
#> #: 3 5 14 4 8 8
#>
#> ______________________________
#> Hubs
#> - In alphabetical/numerical order
#> - Based on log-normal quantiles of centralities
#> ```````````````````````````````````````````````
#> group '1' group '2'
#> 307981 322235
#> 363302
#>
#> ______________________________
#> Centrality measures
#> - In decreasing order
#> - Computed for the complete network
#> ````````````````````````````````````
#> Degree (unnormalized):
#> group '1' group '2'
#> 307981 5 2
#> 9715 5 5
#> 364563 4 4
#> 259569 4 5
#> 322235 3 9
#> ______ ______
#> 322235 3 9
#> 363302 3 9
#> 158660 2 6
#> 188236 3 5
#> 259569 4 5
#>
#> Betweenness centrality (unnormalized):
#> group '1' group '2'
#> 307981 231 0
#> 331820 170 9
#> 158660 162 80
#> 188236 161 85
#> 322235 159 126
#> ______ ______
#> 322235 159 126
#> 363302 74 93
#> 188236 161 85
#> 158660 162 80
#> 326792 17 58
#>
#> Closeness centrality (unnormalized):
#> group '1' group '2'
#> 307981 18.17276 7.80251
#> 9715 15.8134 9.27254
#> 188236 15.7949 23.24055
#> 301645 15.30177 9.01509
#> 364563 14.73566 21.21352
#> ______ ______
#> 322235 13.50232 26.36749
#> 363302 12.30297 24.19703
#> 158660 13.07106 23.31577
#> 188236 15.7949 23.24055
#> 326792 14.61391 22.52157
#>
#> Eigenvector centrality (unnormalized):
#> group '1' group '2'
#> 307981 1 0.13142
#> 9715 0.83277 0.20513
#> 301645 0.78551 0.16298
#> 326792 0.50706 0.42082
#> 188236 0.48439 0.56626
#> ______ ______
#> 322235 0.03281 0.79487
#> 363302 0.06613 0.76293
#> 188236 0.48439 0.56626
#> 194648 0.00687 0.52039
#> 184983 0.172 0.49611
Visual network comparison
First, the layout is computed separately in both groups (qgraph’s “spring” layout in this case).
Node sizes are scaled according to the mclr-transformed data since
SPRING
uses the mclr transformation as normalization
method.
Node colors represent clusters. Note that by default, two clusters
have the same color in both groups if they have at least two nodes in
common (sameColThresh = 2
). Set sameClustCol
to FALSE
to get different cluster colors.
plot(props_season,
sameLayout = FALSE,
nodeColor = "cluster",
nodeSize = "mclr",
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 2.5,
cexHubLabels = 3,
cexTitle = 3.7,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend("bottom", title = "estimated association:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)
Using different layouts leads to a “nice-looking” network plot for each group, however, it is difficult to identify group differences at first glance.
Thus, we now use the same layout in both groups. In the following, the layout is computed for group 1 (the left network) and taken over for group 2.
rmSingles
is set to "inboth"
because only
nodes that are unconnected in both groups can be removed if the same
layout is used.
plot(props_season,
sameLayout = TRUE,
layoutGroup = 1,
rmSingles = "inboth",
nodeSize = "mclr",
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 2.5,
cexHubLabels = 3,
cexTitle = 3.8,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend("bottom", title = "estimated association:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)
In the above plot, we can see clear differences between the groups. The OTU “322235”, for instance, is more strongly connected in the “Seasonal allergies” group than in the group without seasonal allergies, which is why it is a hub on the right, but not on the left.
However, if the layout of one group is simply taken over to the
other, one of the networks (here the “seasonal allergies” group) is
usually not that nice-looking due to the long edges. Therefore, NetCoMi
(>= 1.0.2) offers a further option
(layoutGroup = "union"
), where a union of the two layouts
is used in both groups. In doing so, the nodes are placed as optimal as
possible equally for both networks.
The idea and R code for this functionality were provided by Christian L. Müller and Alice Sommer
plot(props_season,
sameLayout = TRUE,
repulsion = 0.95,
layoutGroup = "union",
rmSingles = "inboth",
nodeSize = "mclr",
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 2.5,
cexHubLabels = 3,
cexTitle = 3.8,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend("bottom", title = "estimated association:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)
Quantitative network comparison
Since runtime is considerably increased if permutation tests are
performed, we set the permTest
parameter to
FALSE
. See the tutorial_createAssoPerm
file
for a network comparison including permutation tests.
Since permutation tests are still conducted for the Adjusted Rand Index, a seed should be set for reproducibility.
comp_season <- netCompare(props_season,
permTest = FALSE,
verbose = FALSE,
seed = 123456)
summary(comp_season,
groupNames = c("No allergies", "Allergies"),
showCentr = c("degree", "between", "closeness"),
numbNodes = 5)
#>
#> Comparison of Network Properties
#> ----------------------------------
#> CALL:
#> netCompare(x = props_season, permTest = FALSE, verbose = FALSE,
#> seed = 123456)
#>
#> ______________________________
#> Global network properties
#> `````````````````````````
#> Largest connected component (LCC):
#> No allergies Allergies difference
#> Relative LCC size 0.667 0.738 0.071
#> Clustering coefficient 0.152 0.271 0.120
#> Modularity 0.626 0.458 0.168
#> Positive edge percentage 86.667 100.000 13.333
#> Edge density 0.079 0.125 0.045
#> Natural connectivity 0.045 0.044 0.002
#> Vertex connectivity 1.000 1.000 0.000
#> Edge connectivity 1.000 1.000 0.000
#> Average dissimilarity* 0.673 0.682 0.009
#> Average path length** 3.400 1.868 1.532
#>
#> Whole network:
#> No allergies Allergies difference
#> Number of components 15.000 5.000 10.000
#> Clustering coefficient 0.152 0.298 0.146
#> Modularity 0.626 0.557 0.069
#> Positive edge percentage 86.667 100.000 13.333
#> Edge density 0.035 0.081 0.046
#> Natural connectivity 0.028 0.031 0.003
#> -----
#> *: Dissimilarity = 1 - edge weight
#> **: Path length = Sum of dissimilarities along the path
#>
#> ______________________________
#> Jaccard index (similarity betw. sets of most central nodes)
#> ```````````````````````````````````````````````````````````
#> Jacc P(<=Jacc) P(>=Jacc)
#> degree 0.556 0.957578 0.144846
#> betweenness centr. 0.333 0.650307 0.622822
#> closeness centr. 0.231 0.322424 0.861268
#> eigenvec. centr. 0.100 0.017593 * 0.996692
#> hub taxa 0.000 0.296296 1.000000
#> -----
#> Jaccard index in [0,1] (1 indicates perfect agreement)
#>
#> ______________________________
#> Adjusted Rand index (similarity betw. clusterings)
#> ``````````````````````````````````````````````````
#> wholeNet LCC
#> ARI 0.232 0.355
#> p-value 0.000 0.000
#> -----
#> ARI in [-1,1] with ARI=1: perfect agreement betw. clusterings
#> ARI=0: expected for two random clusterings
#> p-value: permutation test (n=1000) with null hypothesis ARI=0
#>
#> ______________________________
#> Graphlet Correlation Distance
#> `````````````````````````````
#> wholeNet LCC
#> GCD 1.577 1.863
#> -----
#> GCD >= 0 (GCD=0 indicates perfect agreement between GCMs)
#>
#> ______________________________
#> Centrality measures
#> - In decreasing order
#> - Computed for the whole network
#> ````````````````````````````````````
#> Degree (unnormalized):
#> No allergies Allergies abs.diff.
#> 322235 3 9 6
#> 363302 3 9 6
#> 469709 0 4 4
#> 158660 2 6 4
#> 223059 0 4 4
#>
#> Betweenness centrality (unnormalized):
#> No allergies Allergies abs.diff.
#> 307981 231 0 231
#> 331820 170 9 161
#> 259569 137 34 103
#> 158660 162 80 82
#> 184983 92 12 80
#>
#> Closeness centrality (unnormalized):
#> No allergies Allergies abs.diff.
#> 469709 0 21.203 21.203
#> 541301 0 20.942 20.942
#> 181016 0 19.498 19.498
#> 361496 0 19.349 19.349
#> 223059 0 19.261 19.261
#>
#> _________________________________________________________
#> Significance codes: ***: 0.001, **: 0.01, *: 0.05, .: 0.1