Title: | Graphical Models Estimation from Multiple Sources |
---|---|
Description: | Estimates networks of conditional dependencies (Gaussian graphical models) from multiple classes of data (similar but not exactly, i.e. measurements on different equipment, in different locations or for various sub-types). Package also allows to generate simulation data and evaluate the performance. Implementation of the method described in Angelini, De Canditiis and Plaksienko (2022) <doi:10.3390/math10213983>. |
Authors: | Anna Plaksienko [aut, cre] |
Maintainer: | Anna Plaksienko <[email protected]> |
License: | GPL-2 |
Version: | 2.0.2 |
Built: | 2025-02-15 04:40:08 UTC |
Source: | https://github.com/annaplaksienko/jewel |
Function takes a numerical vector of vertices degrees and constructs weights with the rule W_ij = 1 / sqrt(d_i * d_j)
and then the whole matrix is normilized by the maximum.
constructWeights(d, K = NULL)
constructWeights(d, K = NULL)
d |
either one numerical vector or a list of |
K |
number of classes (i.e. datasets, i.e. desired graphs). By default it is length(d).
In length(d) = 1, |
W - a list of K
numeric matrices of the size p
by p
{ K <- 3 p <- 50 n <- 20 data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE) G_list_true <- data$Graphs true_degrees <- rowSums(G_list_true[[1]]) cut <- sort(true_degrees, decreasing = TRUE)[ceiling(p * 0.03)] apriori_hubs <- ifelse(true_degrees >= cut, 10, 1) W <- constructWeights(apriori_hubs, K = K) }
{ K <- 3 p <- 50 n <- 20 data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE) G_list_true <- data$Graphs true_degrees <- rowSums(G_list_true[[1]]) cut <- sort(true_degrees, decreasing = TRUE)[ceiling(p * 0.03)] apriori_hubs <- ifelse(true_degrees >= cut, 10, 1) W <- constructWeights(apriori_hubs, K = K) }
Function compares adjacency matrices of the true and estimated simple graphs and calculates the number of true positives (correctly estimated edges), true negatives (correctly estimated absence of edges), false positives (edges present in the estimator but not in the true graph) and false negatives (failure to identify an edge).
evaluatePerformance(G, G_hat)
evaluatePerformance(G, G_hat)
G |
true graph's adjacency matrix. |
G_hat |
estimated graph's adjacency matrix. Must have the same dimensions as |
performance - a numeric vector of length 4 with TP, TN, FP, FN.
{ K <- 3 p <- 50 n <- 20 data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE) G_common_true <- data$CommonGraph X <- data$Data res <- jewel(X, lambda1 = 0.25) G_common_est <- res$CommonG evaluatePerformance(G = G_common_true, G_hat = G_common_est) }
{ K <- 3 p <- 50 n <- 20 data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE) G_common_true <- data$CommonGraph X <- data$Data res <- jewel(X, lambda1 = 0.25) G_common_est <- res$CommonG evaluatePerformance(G = G_common_true, G_hat = G_common_est) }
Function first generates K
scale-free graphs with p
vertices. They have the same order and degree distribution and share most of the edges, but some edges may vary (user can control how many).
Function then generates corresponding precision and covariance matrices, all of the size p
by p
(see the paper for the details of the procedure).
Then for each l
-th element of vector n
it generates K
data matrices, each of the size n_l
by p
,
i.e., for the same underlying graphs we can generate several sets of K
datasets with different sample sizes.
generateData_rewire( K, p, n, power = 1, m = 1, perc = 0.05, int = NULL, ncores = NULL, makePlot = TRUE, verbose = TRUE )
generateData_rewire( K, p, n, power = 1, m = 1, perc = 0.05, int = NULL, ncores = NULL, makePlot = TRUE, verbose = TRUE )
K |
number of graphs/data matrices. |
p |
number of nodes in the true graphs. |
n |
a numerical vector of the sample sizes for each desired set of
|
power |
a number, power of preferential attachment for the Barabasi-Albert algorithm for the generation of the scale-free graph. Bigger number means more connected hubs. The default value is 1. |
m |
number of edges to add at each step of Barabasi-Albert algorithm for generation of the scale-free graph. The default value is 1. |
perc |
a number, tuning parameter for the difference between graphs.
Number of trials to perform in the rewiring procedure of the first graph is
|
int |
a vector of two numbers, |
ncores |
number of cores to use in parallel data generation.
If |
makePlot |
If makePlot = FALSE, plotting of the generated graphs is disabled. The default value is TRUE. |
verbose |
If verbose = FALSE, tracing information printing is disabled. The default value is TRUE. |
The following list is returned
Graphs
– a list of adjacency matrices of the K
generated graphs.
CommomGraph
- a matrix, common part (intersection) of the K
generated graphs.
Data
- a list of lists, for each sample size of the input vector n
one obtains K
data matrices, each of the size n_l
by p
.
Sigma
- a list of K
covariance matrices of the size p
by p
.
data <- generateData_rewire(K = 3, p = 50, n = 20, ncores = 1, verbose = FALSE)
data <- generateData_rewire(K = 3, p = 50, n = 20, ncores = 1, verbose = FALSE)
This function estimates Gaussian graphical models (i.e. networks of conditional dependencies, direct connections between variables) given multiple datasets. We assume that datasets contain measurements of the same variables collected under different conditions (different equipment, locations, even sub-types of disease).
jewel( X, lambda1, lambda2 = NULL, Theta = NULL, W = NULL, tol = 0.01, maxIter = 10000, stability = FALSE, stability_nsubsets = 25, stability_frac = 0.8, verbose = TRUE )
jewel( X, lambda1, lambda2 = NULL, Theta = NULL, W = NULL, tol = 0.01, maxIter = 10000, stability = FALSE, stability_nsubsets = 25, stability_frac = 0.8, verbose = TRUE )
X |
a list of |
lambda1 |
a number, first regularization parameter (of the common penalty). |
lambda2 |
an optional number, second regularization parameter
(of the class-specific penalty). If NULL, set to |
Theta |
an optional list of |
W |
an optional list of |
tol |
an optional number, convergence threshold controlling the relative error between iterations. The default value is 0.01. |
maxIter |
an optional number, maximum allowed number of iterations. The default value is 10 000. |
stability |
if stability = TRUE, stability selection procedure to reduce
the number of false positives will be applied. |
stability_nsubsets |
an optional number, how many times to subsample datasets and apply __jewel__ for stability selection procedure. The default value is 25. |
stability_frac |
an optional number, in what proportion of the stability results on subsampled data an edge has to be present to be included into the final estimate. The default value is 0.8. |
verbose |
if verbose = FALSE, tracing information printing is disabled. The default value is TRUE. |
The following list is returned
CommonG
- an adjacency matrix of the common estimated graph (intersection of K
estimated graphs).
G_list
- a list of K
adjacency matrices for each estimated graph.
Theta
- a list of K
estimated covariance matrices (when stability selection is disabled).
BIC
– a number, value of Bayesian information criterion for resulting graphs (when stability selection is disabled).
{ K <- 3 p <- 50 n <- 20 data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE) G_list_true <- data$Graphs X <- data$Data true_degrees <- rowSums(G_list_true[[1]]) cut <- sort(true_degrees, decreasing = TRUE)[ceiling(p * 0.03)] apriori_hubs <- ifelse(true_degrees >= cut, 10, 1) W <- constructWeights(apriori_hubs, K = K) res <- jewel(X, lambda1 = 0.25, W = W, verbose = FALSE) }
{ K <- 3 p <- 50 n <- 20 data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE) G_list_true <- data$Graphs X <- data$Data true_degrees <- rowSums(G_list_true[[1]]) cut <- sort(true_degrees, decreasing = TRUE)[ceiling(p * 0.03)] apriori_hubs <- ifelse(true_degrees >= cut, 10, 1) W <- constructWeights(apriori_hubs, K = K) res <- jewel(X, lambda1 = 0.25, W = W, verbose = FALSE) }