1 Introduction

This tutorial summarizes details of the Joint Graphical Lasso (JGL) algorithm (Danaher et. al 2011), then walks through the steps needed to run the JGL package using a real-world metabolomics dataset.

1.1 Important Notes

Code throughout the document is “hidden” but if you click on “CODE” in the upper right above any R output, you can expand it to view code and (extensive) comments!

All code is available on my GitHub. Please see this link for Ariane Stark’s companion presentation which contains further elaboration on methods and results presented in the Danaher paper.

To run this tutorial on your computer please either clone the entire repository to your computer, double click jgl-torial.Rproj, open the .Rmd file and run the code as is after making sure you have installed the appropriate packages in R (below).

You can alternatively download the .csv and .Rmd files here and run the script after installing appropriate packages (below, first code chunk).

# if you are missing any of the packages please install them before proceeding:

# install.packages("JGL")
# install.packages("igraph", "RColorBrewer", "pheatmap")
# install.packages("tidyverse", "here")

library(JGL)
library(igraph)
library(tidyverse)
library(igraph)
library(RColorBrewer)
library(pheatmap)
#source(here::here("helper_functions.r")) #contains functions I made using code from Kate and Yukun and one function from Koyej0 lab GitHub for tuning. To make it easier for users I pasted the functions below in a code chunk. 
theme_set(theme_bw()) #setting ggplot theme 
igraph_options(annotate.plot = TRUE) # setting default to annotate plot with number of vertices and edges

# read in metabolomics data from Raji Balasubramanian, which is on the github for this tutorial
#http://github.com/mljaniczek/jgl-tutorial
dat <- read.csv("hapo_metabolomics_2020.csv")

# this is a function I made which incorporates code from Kate Shutta and Yukun Li.
# input your prepared adjacency matrix (e.g. from graph_from_adjancency_matrix(JGL(...)$theta)) as well as any graphing parameters
plot_jgl <- function(
  thisGraph,
  multiplier = 15,
  vertex.size = 4,  
  vertex.label.dist = 2 , 
  vertex.label.cex = 0.5,
  rescale = F,
  vertex.color = NULL, 
  vertex.label.color= NULL, 
  main = NULL,
  #sub = NULL,
  ...) {
  myLayout =  layout_in_circle(makePrettyGraphFromGraph(thisGraph))*0.8
  lab.locs <- radian.rescale(x=1:length(V(thisGraph)), direction=-1, start=0)
  plot(makePrettyGraphFromGraph(thisGraph, multiplier = multiplier), 
       vertex.size = vertex.size,  
       vertex.label.dist = vertex.label.dist , 
       vertex.label.cex = vertex.label.cex, 
       layout = myLayout,
       lab.locs = lab.locs,
       vertex.label.degree=lab.locs,
       vertex.color = vertex.color,
       vertex.label.color= vertex.label.color,
       rescale=F, 
       main = main,
       #sub = sub,
       ...)
}


# this is a function for making red-blue edges to match the edge weights(partial correlation)
# you may need to adjust the multiplier depending on how strong your edges are
# function from Kate and Yukun
makePrettyGraphFromGraph = function(thisGraph, multiplier = 15,redblue=T)
{
  if(redblue == T) my_palette <- brewer.pal(n = 8, name = "RdBu")
  if(redblue == F) my_palette <- brewer.pal(n = 8, name = "PRGn")
  my_palette <- my_palette[8:1]
  E(thisGraph)$width = abs(E(thisGraph)$weight)*multiplier
  E(thisGraph)$color = ifelse(sign(E(thisGraph)$weight)>0,my_palette[1],my_palette[8])
  return(thisGraph)
}


# this is code from this stack exchange: 
# https://stackoverflow.com/questions/23209802/placing-vertex-label-outside-a-circular-layout-in-igraph

radian.rescale <- function(x, start=0, direction=1) {
  c.rotate <- function(x) (x + start) %% (2 * pi) * direction
  c.rotate(scales::rescale(x, c(0, 2 * pi), range(x)))
}

# this function from Koyejo Lab Github https://github.com/koyejo-lab/JointGraphicalLasso
# input is list of matrices same as you would put into `JGL` function, as well as results of `JGL`
# output is AIC for particular estimated graph.
# use this function when tuning lambdas
vBIC <- function(X, est_graph, thr=0.001){
  num <- length(X)
  BIC_acc <- 0.
  for(i in 1:num){
    
    data_num <- dim(X[[i]])[1]
    sample_cov <- cov(X[[i]], X[[i]])
    tr_sum <- sum(diag(sample_cov %*% est_graph$theta[[i]]))
    
    log_det <- determinant(est_graph$theta[[i]], logarithm = TRUE)$modulus[1][1]
    
    E <- sum(sum(abs(est_graph$theta[[i]]) >= thr))
    BIC_acc <- BIC_acc + (tr_sum - log_det) + (log(data_num)*E/data_num)
  }
  return(BIC_acc)
}

1.2 Acknowledgements

Thank you to Dr. Raji Balasubramanian for the metabolomics data, which was previously presented at ENAR 2021 with Dr. Denise Scholtens.

Thank you to Dr. Kate Hoff Shutta and Yukun Li who provided code for the beautiful circular network graphs.

Thank you to Koyejo lab for providing code online demonstrating tuning procedures.

2 Joint Graphical Lasso Overview

2.1 Background

We previously discussed details of graphical lasso to estimate precision matrix for Gaussian Graphical Models (GGMs) as in Yuan and Lin (2007) and Friedman et al (2007).
We should be familiar with this convex optimization problem for graphical lasso at this point, where \(\lambda\) is a tuning parameter and \(||\Theta||_1\) is the sum of absolute values of the elements of \(\Theta\). The solution gives an estimate for \(\Sigma^{-1}\), the precision matrix:

\[maximize_\Theta\{logdet\Theta - tr(S\Theta) - \lambda||\Theta||_1\}\] * Reminder that this graphical lasso can be used even when \(p >> n\), and when \(\lambda\) is large then it forces the estimated precision matrix to be sparse (so few edges!).

Joint graphical lasso builds upon this by estimating multiple, related GGMs from data with observations belonging to distinct classes (for example, cancer vs normal tissue).
The idea is to leverage information across the classes while still letting there be class-specific edges. The level of resulting sparsity, and just how similar the resulting graphs are, can be modified by the use of two different penalty functions, and two tuning parameters.
There are two types of penalty functions we can use within JGL which will adjust how much information to leverage across the classes.
Additionally, the algorithm uses a fast alternating directions method of multipliers.
Danaher paper has almost 800 citations as of March 2022.

2.2 Notation

\(K\) number of classes (\(\geq 2\)). Index classes using \(k\) = 1, … \(K\).
\(\Sigma^{-1}_k\): True precision matrix for the kth class
\(Y^{(k)}\): \(n_k\) x \(p\) matrix consisting of \(n_k\) observations from the \(k\)th class on a set of \(p\) features which are common to all \(K\) datasets
\(S^{(k)}\): Empirical covariance matrix for \(Y^{(k)}\)
\(\Theta^{(k)}\): argument to convex optimization problem used for estimating \(\Sigma^{-1}_k\)
Index matrix arguments by using \(i\) = 1, …, \(p\) and \(j\) = 1, …, \(p\)
\(\lambda_1\) and \(\lambda_2\): non-negative tuning parameters used in penalty function

2.3 Major assumptions

We assume the observations within each class are iid.
Also assume \(\mu_k\), the mean for each class, is 0. i.e:

\[Y^{(k)}_1, ..., Y^{(k)}_{nk} \sim N(0, \Sigma_k)\]

2.4 Optimization problem

Our goal is to estimate \(\Sigma^{-1}_1\), …, \(\Sigma^{-1}_K\) by using penalized log-likelihood approach.

  * Again, we want each class to have it's own precision matrix, but to be able to use information across the classes to make them.

Seek {\(\hat{\Theta}\)} by solving:

\[maximize_{\{\Theta\}}\left(\sum^{K}_{k=1}n_k[log\{det(\Theta^{(k)})\}-tr(S^{(k)}\Theta^{(k)}) - P(\{\Theta\})\right)\]

A major innovation of this paper, is the generalization of the optimization problem to multiple classes, in addition to using the penalty function \(P(\{\Theta\})\), for which the authors provide two different versions.

2.5 Penalty functions

The general form for the penalty function is:

\[P(\{\Theta\}) = \lambda_1\sum^K_{k=1}\sum_{i \neq j}|\theta^{(k)}_{ij}| + \widetilde{P}\{\Theta\}\]

Notice that the \(P(\{\Theta\})\) is not class specific. It takes information from all the classes!!
The form of this penalty function will encourage the solutions to share certain characteristics such as locations of sparsity or value.
Depending on the form we choose and the value of the tuning parameters, we could essentially force joint graphical lasso to just perform unrelated graphical lasso on each of the \(K\) classes (i.e. if \(\widetilde{P}\{\Theta\}\) is zero.)
Let’s look at the possible forms for \(\widetilde{P}\{\Theta\}\)!

2.5.1 Fused Graphical Lasso

Fused Graphical Lasso (FGL) uses the following penalty function:

\[P(\{\Theta\}) = \lambda_1\sum^K_{k=1}\sum_{i \neq j}|\theta^{(k)}_{ij}| + \lambda_2\sum_{k<k'}\sum_{i,j}|\theta^{(k)}_{ij}-\theta^{(k')}_{ij}|\]

When \(\lambda_1\) is large, FGL makes sparse estimates of \(\hat{\Theta}^{(1)}, ... , \hat{\Theta}^{(K)}\)
When \(\lambda_2\) is large, many elements of \(\hat{\Theta}^{(1)}, ... , \hat{\Theta}^{(K)}\) will be the same across classes
So, FGL “borrows information aggressively across classes, encouraging similar network structure and similar edge values”

2.5.2 Group Graphical Lasso

Group Graphical Lasso (GGL) uses the following penalty function:

\[P(\{\Theta\}) = \lambda_1\sum^K_{k=1}\sum_{i \neq j}|\theta^{(k)}_{ij}| + \lambda_2\sum_{i \neq j}\left(\sum_{i,j}{\theta^{(k)}_{ij}}^2\right)^{1/2}\]

Lasso penalty applied to elements of the precision matrices
Group lasso penalty is applied to the (i, j) element across all K precision matrices
When \(\lambda_1\) is large, GGL makes sparse estimates of \(\hat{\Theta}^{(1)}, ... , \hat{\Theta}^{(K)}\)
So, GGL just encourages a shared pattern of sparsity, not shared edge values (unlike FGL which encourage sharing across both)

3 Application of JGL to metabolomics dataset

3.1 Data description

Data comes from Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study. Metabolites were measured from blood samples taken from pregnant women from four ancestry groups: Afro-Caribbean, Mexican American, Northern European, and Thai.

The data consists of ID variable, ancestry group, and metabolites (p = 51 features). The ancestry groups (K = 4) are balanced (n = 400 each, with 1600 total observations).There is also a variable “fpg” which is fasting plasma glucose, which I omit from this analysis.

Within the data there are three groups of metabolites: Amino Acids, Acycl carnitines, and Other.

Here’s what the raw data look like:

# examine ancestry group variable
#summary(as.factor(dat$anc_gp))

head(dat)[,1:10]

##       id anc_gp  fpg    mt1_1     mt1_2    mt1_3    mt1_4     mt1_5    mt1_6
## 1 hm0001    ag3 75.6 218.2223  76.99525 19.06366 14.23091  86.75162 135.2109
## 2 hm0002    ag3 84.6 292.6314 136.41320 43.14854 17.77549 120.17344 213.6531
## 3 hm0003    ag4 79.2 361.1135  79.98370 22.15848 13.05497  74.75441 136.1587
## 4 hm0004    ag3 73.8 274.0428  79.76154 19.72682 12.68744  69.34854 146.7262
## 5 hm0005    ag1 91.8 270.9049  88.98260 39.20949 15.73498  95.56857 145.2876
## 6 hm0006    ag4 73.8 271.4034  99.63992 27.81137 20.91344  95.55926 223.7438
##      mt1_7
## 1 64.00578
## 2 91.30156
## 3 83.67878
## 4 72.67280
## 5 48.49431
## 6 68.94172

We can take a look at a few of the metabolites to get an idea of distribution and missingness.

summary(dat$mt1_1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   115.6   281.8   322.4   328.5   370.6   660.4

summary(dat$mt2_1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -4.631  -3.341  -3.040  -3.044  -2.750  -1.154       1

summary(dat$mt3_1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   17.28   21.39   22.64   22.96   24.64   27.25      11

3.2 Data Processing

3.2.1 Center and Scale Data within Class

Assumptions of JGL need data to be centered and scaled so we must center and scale each metabolite feature, to satisfy assumptions of JGL.

Here I am making the data into “long” format and standardizing each metabolite separately. Check out how the data look now:

# prepare data into list of K datasets as needed as input to JGL function

# make data into long format then center and scale by metabolite

## we do not want to center and scale within each group because that would remove any signal we see, right? 

dat_long <- dat %>%
  select(-fpg) %>%
  pivot_longer(cols = starts_with("mt"),
               names_to = "metabolite",
               values_to = "value",
               values_drop_na = TRUE) %>%
  group_by(metabolite) %>%
  mutate(scaled_value = scale(value, center = TRUE, scale = TRUE)) %>%
  ungroup()

head(dat_long)

## # A tibble: 6 × 5
##   id     anc_gp metabolite value scaled_value[,1]
##   <chr>  <chr>  <chr>      <dbl>            <dbl>
## 1 hm0001 ag3    mt1_1      218.            -1.66 
## 2 hm0001 ag3    mt1_2       77.0           -0.942
## 3 hm0001 ag3    mt1_3       19.1           -1.30 
## 4 hm0001 ag3    mt1_4       14.2           -0.744
## 5 hm0001 ag3    mt1_5       86.8           -0.591
## 6 hm0001 ag3    mt1_6      135.            -1.17

Now we can check on how the distribution of a few of the variables look now. Data looks roughly normal. Values of metabolites in class 3 (“Other”) appear to have variable mean between ancestry groups.

# if you want to look at one metabolite
# dat_long %>% 
#   filter(metabolite %in% c("mt1_1")) %>%
#   ggplot(aes(x=value, color=anc_gp)) +
#     geom_density()+
#   facet_grid(metabolite ~.)

# demonstrative density plot showing centering/scaling of data
dat_long %>% 
  filter(metabolite %in% c("mt1_1", "mt2_1", "mt3_1")) %>%
  ggplot(aes(x=scaled_value, color=anc_gp)) +
    geom_density()+
  facet_grid(metabolite ~.)

3.2.2 Make K n by p matrices and deal with missingness

Now the raw data is centered and scaled! But first we need to get it into K n by p matrices as required by the JGL package.

So in our case, we need 4 matrices (4 ancestry groups), each with 400 observations of 51 metabolites.

Note: Normally we would impute any missing data. For simplicity in this code I treated missing data as “0” value.

# make list of ancestry groups
ancestry_groups <- sort(unique(dat$anc_gp))

# filter by ancestry group, then make wide matrix of scaled values
# create a list of the results
# Here I'm using the `purrr::map` to iterate over the 4 ancestry groups and make the list. You could use base R lapply instead if you prefer.
dat_mat_list <- purrr::map(ancestry_groups,
                    ~dat_long %>%
                      filter(anc_gp == .x) %>%
                      select(-c(anc_gp, value)) %>%
                      pivot_wider(names_from = metabolite, 
                                  values_from = scaled_value,
                                  #for simplicity putting 0 as missing values. Normally should do imputation.
                                  values_fill = 0) %>%
                      select(-id) %>%
                      as.matrix())

names(dat_mat_list) <- ancestry_groups

Let’s check to see if our list contains the right structure:

str(dat_mat_list)

## List of 4
##  $ ag1: num [1:400, 1:51] -0.869 -0.474 -0.893 1.082 0.578 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:51] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##  $ ag2: num [1:400, 1:51] 0.169 0.698 -1.174 -1.012 -0.591 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:51] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##  $ ag3: num [1:400, 1:51] -1.664 -0.541 -0.822 0.152 -0.591 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:51] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##  $ ag4: num [1:400, 1:51] 0.4919 -0.8616 0.5032 0.0342 -0.157 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:51] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...

It does! So now we have a list of 4 matrices, each with 400 standardized observations of 51 metabolites.

3.2.3 Heatmap to visualize the processed data

Before we proceed, let’s visualize the processed data quick in a heatmap to get an idea of if the data has any patterns.

# making one full wide data frame with all 1600 observations, using the centered-scaled values for metabolites that we processed above. 
met_wide_all <- dat_long %>%
  select(-value) %>%
  pivot_wider(names_from = metabolite, 
              values_from = scaled_value,
              #for simplicity putting 0 as missing values. Normally should do imputation.
              values_fill = 0) %>%
  arrange(anc_gp) #arranging by ancestry group for the heatmap

#make matrix out of dataframe, removing the id and group labels since pheatmap just takes numeric matrix as input
met_wide_all_mat <- t(as.matrix(met_wide_all%>%
  select(-c(id, anc_gp))))
met_wide_all_mat[is.na(met_wide_all_mat)] <- 0
colnames(met_wide_all_mat) <- met_wide_all$id

#pheatmap can annotate your heatmap nicely with colors per class! so I'm preparing those here. 

#first I want the columns annotated by ancestry group
my_sample_col <- data.frame(ancestry = as.factor(met_wide_all$anc_gp))
rownames(my_sample_col) <- met_wide_all$id
# then I want the rows annotated by metabolite group, we have three groups of metabolites mt1, mt2, mt3
met_group <- str_split_fixed(rownames(met_wide_all_mat), "_", 2)[,1] %>%
  str_replace_all(c("mt1" = "Amino Acids",
                  "mt2" = "Acyl carnitines",
                  "mt3" = "Other"))
my_sample_row <- data.frame(met_group = met_group)
row.names(my_sample_row) <- row.names(met_wide_all_mat)

# finally we input the numeric matrix and annotations
pheatmap(met_wide_all_mat, 
         cluster_cols = FALSE, 
         show_colnames = FALSE, 
         show_rownames = FALSE, 
         annotation_col = my_sample_col,
         annotation_row = my_sample_row)

Here the x axis has the 1600 samples, sorted by ancestry group. The y axis has the 51 metabolites, clustered using hierarchical clustering, and with a color label based on metabolite group (there are 3 “groups” in the data, with suffix mt1, mt2, and mt3 which indicate Amino Acids, Acyl carnitines, and other groups respectively).

From this heatmap we can see there are big differences in the values for “Other” metabolite group (pink) across the 4 ancestry groups. metabolites from Amino Acids and Acyl Carnitines groups appear to be more homogeneous across the classes.

Let’s get to joint graphical lasso now! We will essentially be taking the data from each ancestry section and fitting graphical lasso, while also borrowing strength from the other classes which is a strength of the JGL.

3.3 Apply Joint Graphical Lasso

Danaher et. al wrote the R package JGL as companion to their paper. As described above in Section 2.5, there are two penalty functions described in the paper: Fused Graphical Lasso (FGL) and Group Graphical Lasso (GGL) penalties.

We will first go over some basic results for both penalties using preset values for \(\lambda_1\) and \(\lambda_2\) parameters, visualize the results, then discuss strategies for tuning the parameters.

3.3.1 Using Fused Graphical Lasso penalty

We use the JGL() function from the JGL package, with penalty = "fused" specified.

Note: For now we are just putting in a pre-selected value for \(\lambda_1\) and \(\lambda_2\), but in a later section we will go over methods to tune the parameters.

fgl_results = JGL(Y = dat_mat_list,
                  penalty = "fused",
                  lambda1 = .15,
                  lambda2 = 0.2,
                  return.whole.theta = FALSE)

str(fgl_results) # the theta contains a list of estimated matrices, one for each of the K classes. We will extract the thetas for visualization with igraph.

## List of 3
##  $ theta            :List of 4
##   ..$ : num [1:39, 1:39] 1.46 0 0 0 0 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   ..$ : num [1:39, 1:39] 1.46 0 0 0 0 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   ..$ : num [1:39, 1:39] 1.46 0 0 0 0 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   ..$ : num [1:39, 1:39] 1.46 0 0 0 0 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##   .. .. ..$ : chr [1:39] "mt1_1" "mt1_2" "mt1_3" "mt1_4" ...
##  $ theta.unconnected:List of 4
##   ..$ : Named num [1:12] 13.98 16.69 7.94 4.81 8.27 ...
##   .. ..- attr(*, "names")= chr [1:12] "mt3_2" "mt3_3" "mt3_4" "mt3_6" ...
##   ..$ : Named num [1:12] 13.98 16.69 7.94 4.81 8.27 ...
##   .. ..- attr(*, "names")= chr [1:12] "mt3_2" "mt3_3" "mt3_4" "mt3_6" ...
##   ..$ : Named num [1:12] 13.98 16.69 7.94 4.81 8.27 ...
##   .. ..- attr(*, "names")= chr [1:12] "mt3_2" "mt3_3" "mt3_4" "mt3_6" ...
##   ..$ : Named num [1:12] 13.98 16.69 7.94 4.81 8.27 ...
##   .. ..- attr(*, "names")= chr [1:12] "mt3_2" "mt3_3" "mt3_4" "mt3_6" ...
##  $ connected        : logi [1:51] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  - attr(*, "class")= chr "jgl"

print.jgl(fgl_results)

## 
## Number of connected nodes:  39 
## Number of subnetworks in each class:  1 1 1 1 
## Number of edges in each class:  117 117 117 117 
## Number of edges shared by all classes:  117 
## L1 norm of off-diagonal elements of classes' Thetas:  41.60834 41.60834 41.60834 41.60834

The basic code itself is simple, although the package does not come with easy-to-use visualization functions. The following code chunks contains a frankenstein mix of code I borrowed from online and from colleagues (thank you Kate and Yukun again!!).

Here is the basic strategy for plotting the output of JGL():

Extract the estimated \(K\) covariance matrices from the JGL() result
Use graph_from_adjacency_matrix() function from the igraph package on each of the \(K\) estimated covariance matrices
Either use igraph::plot.igraph() and tweak inputs to make a nice graph, or you can use the cobbled-together code that was passed down from Kate to Yukun to me. Each of us has tweaked it slightly to how we like it! I have made the function plot_jgl() in the companion script helper_functions.r which is available on my GitHub along with this presentation.

# extract all estimated covariance matrices from result
inv_covar_matrices <- fgl_results$theta
names(inv_covar_matrices) <- ancestry_groups


#test <- graph_from_adjacency_matrix(-cov2cor(inv_covar_matrices[[1]]))

#now use `graph_from_adjacency_matrix()` function from igraph to create igraph graphs from adjacency matrices

#again since we have a list covariance matrices I use the map() function to apply igraph::graph_from_adjacency_matrix() over the list

graph_list <- map(ancestry_groups,
                  ~graph_from_adjacency_matrix(
                    -cov2cor(inv_covar_matrices[[.x]]),
                    weighted = T, #want to weight edge by partial correlatioin
                    mode = "undirected",
                    diag = FALSE
                  ))

## Warning: The `adjmatrix` argument of `graph_from_adjacency_matrix()` must be symmetric
## with mode = "undirected" as of igraph 1.6.0.
## ℹ Use mode = "max" to achieve the original behavior.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

names(graph_list) <- ancestry_groups

# TODO add inverse z transform test
# TODO get p value to test each edge 
# TODO multiple testing adjustment
# TODO take as final set of edges those that meet pvalue

# can use the basic igraph code to plot
# plot.igraph(graph_list[["ag1"]],
#             layout = layout.fruchterman.reingold)

# or can use function I made from Kate/Yukun's code to make a pretty circular graph
#plot_jgl(graph_list[[1]], multiplier = 3)

# specify colors for vertices (groups of metabolites in our case)

color_opts <- RColorBrewer::brewer.pal(3, "Dark2")
color_list <- data.frame(class = met_group) %>%
  mutate(color = case_when(
    class == "Amino Acids" ~ color_opts[1],
    class == "Acyl carnitines" ~ color_opts[2],
    class == "Other" ~ color_opts[3],
    TRUE ~ NA_character_
    )
  )

# color and labels for edges
edge_palette <- brewer.pal(n = 8, name = "RdBu")[c(8,1)]
edge_labels <- c("Positive", "Negative")

By examining the below graphs we can see that the estimated networks, especially for the strong correlations, are largely similar across the four groups, at least with this selection of lambdas.

Click tabs below to see the graphs from the different ancestry groups

3.3.1.1 FGL Class 1

plot_jgl(graph_list[[1]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 1, FGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.1.2 FGL Class 2

plot_jgl(graph_list[[2]],  multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 2, FGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.1.3 FGL Class 3

plot_jgl(graph_list[[3]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 3, FGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.1.4 FGL Class 4

plot_jgl(graph_list[[4]],  multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 4, FGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.2 Using Group Graphical Lasso penalty

Now let’s run “group” penalty instead in JGL() function. Same procedure as above except with penalty = "group" as input.

## run ggl:
ggl_results = JGL(Y=dat_mat_list,
                  penalty="group",
                  lambda1=.15,
                  lambda2=.2,
                  return.whole.theta=TRUE)
#str(ggl_results)
print.jgl(ggl_results)

## 
## Number of connected nodes:  31 
## Number of subnetworks in each class:  1 1 1 2 
## Number of edges in each class:  94 95 92 92 
## Number of edges shared by all classes:  84 
## L1 norm of off-diagonal elements of classes' Thetas:  26.26214 21.82969 24.76903 23.12329

Extract estimated covariance matrices and prepare for visualization.

# extract all estimated covariance matrices from result
ggl_inv_covar_matrices <- ggl_results$theta
names(ggl_inv_covar_matrices) <- ancestry_groups

#now use function from igraph to create igraph graphs from adjacency matrices
ggl_graph_list <- map(ancestry_groups,
                  ~graph_from_adjacency_matrix(
                    -cov2cor(ggl_inv_covar_matrices[[.x]]),
                    weighted = T,
                    mode = "undirected",
                    diag = FALSE
                  ))
names(ggl_graph_list) <- ancestry_groups

3.3.2.1 GGL Class 1

plot_jgl(ggl_graph_list[[1]],  multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 1, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.2.2 GGL Class 2

plot_jgl(ggl_graph_list[[2]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 2, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.2.3 GGL Class 3

plot_jgl(ggl_graph_list[[3]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 3, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.2.4 GGL Class 4

plot_jgl(ggl_graph_list[[4]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 4, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.3.3 What do those lambdas do anyways???

As we described above, having large \(\lambda_1\) forces sparse solutions for both penalty functions.

When \(\lambda_2\) is high in FGL it enforces similar values across classes (aggressive sharing of information).

When \(\lambda_2\) is high in GGL it enforces similar sparsity locations across classes, but does not enforce similar values.

Let’s look at the graphs below to see what different extreme values of lambda do in the metabolomics dataset.

Here I fit 4 models for each penalty function with the following combinations of lambdas:

Small \(\lambda_1\) (0.01), small \(\lambda_2\) (0.01)
Small \(\lambda_1\) (0.01), large \(\lambda_2\) (0.75)
Large \(\lambda_1\) (0.5), small \(\lambda_2\) (0.01)
Large \(\lambda_1\) (0.5), large \(\lambda_2\) (0.75)

**When I went too high for lambda values they produced completely sparse graphs with no edges.

lambda.eff <- c(0.01, 0.75)
aic_vec <- matrix(NA, length(lambda.eff),length(lambda.eff))
fits <- NULL

# function to make the plots I want below. input data, penalty type and lambdas.
# made function to shorten code later.
lambda_plots <- function(data, penalty = "group", lambda1, lambda2, ...){
  res <- JGL(Y=data,
                penalty=penalty,
                lambda1=lambda1,
                lambda2=lambda2, 
                return.whole.theta=TRUE)
  test1 <- vBIC(data, res, thr=0.0001)
  resgraph <- graph_from_adjacency_matrix(
                    -cov2cor(res$theta[[1]]),
                    weighted = T,
                    mode = "undirected",
                    diag = FALSE
                  )
  resplot <- plot_jgl(resgraph,
               multiplier = 3,
               sub = paste("AIC", round(test1, 2)),
               ...
               )
}

3.3.3.1 GGL Small \(\lambda_1\) Small \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "group",
             lambda = lambda.eff[1],
             lambda2 = lambda.eff[1],
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "GGL small lambda1 small lambda2"
             )

3.3.3.2 GGL Small \(\lambda_1\) Large \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "group",
             lambda = lambda.eff[1],
             lambda2 = 0.75,
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "GGL small lambda1 large lambda2"
             )

3.3.3.3 GGL Large \(\lambda_1\) Small \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "group",
             lambda = 0.4,
             lambda2 = lambda.eff[1],
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "GGL large lambda1 small lambda2"
             )

3.3.3.4 GGL Large \(\lambda_1\) Large \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "group",
             lambda = 0.4,
             lambda2 = 0.5,
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "GGL large lambda1 large lambda2"
             )

3.3.3.5 FGL Small \(\lambda_1\) Small \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "fused",
             lambda = lambda.eff[1],
             lambda2 = lambda.eff[1],
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "FGL small lambda1 small lambda2"
             )

3.3.3.6 FGL Small \(\lambda_1\) Large \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "fused",
             lambda = lambda.eff[1],
             lambda2 = 0.75,
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "FGL small lambda1 large lambda2"
             )

3.3.3.7 FGL Large \(\lambda_1\) Small \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "fused",
             lambda = 0.5,
             lambda2 = lambda.eff[1],
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "FGL large lambda1 small lambda2"
             )

3.3.3.8 FGL Large \(\lambda_1\) Large \(\lambda_2\)

lambda_plots(data = dat_mat_list,
             penalty = "fused",
             lambda = 0.5,
             lambda2 = 0.75,
             vertex.color = color_list$color,
             vertex.label.color = color_list$color,
             main = "FGL large lambda1 large lambda2"
             )

Depending on your application you may want sparse results or dense results, or you may prefer tuning based on something like AIC, which we will go through next.

3.4 Tuning parameters

In what we’ve done so far we’ve had to pre-select the lambdas.

Let’s show how we would input an assortment of lambdas and tune those parameters!

The Koyejo Lab GitHub repository is full of code to assist with different methods to select parameters for JGL.

They use the following process:

Do a 2D grid search over a sequence of possible values for \(\lambda_1\) and \(\lambda_2\), run JGL() for each
Collect Akaike Information Criterion (AIC) for each combination
Identify lambda combination with the lowest AIC
Build final model with tuned parameters.

You can use this method with both fused or group penalty.

Danaher et al note that this 2D grid search can be really computationally expensive! If you have large \(p\) then you can keep \(\lambda_2\) constant and search for best \(\lambda_1\), then use that input to search for \(\lambda_2\).

3.4.1 Example tuning lambdas using 2D grid search with metabolomics data using GGL

Below I’m making a very small 3x3 grid of lambdas for run-time demonstration purposes (it takes my computer less than a minute, but timing went up substantially when I added a larger grid and my computer crashed).

interval_l = 3
lambda.eff <- seq(0.01, 0.4, len = interval_l)
aic_vec <- matrix(NA, length(lambda.eff),length(lambda.eff))

#search grid of lambdas
for(i in 1:length(lambda.eff)){
    for(j in 1:length(lambda.eff)){
        fit00 <- JGL(Y=dat_mat_list,penalty="group",lambda1=lambda.eff[i],lambda2=lambda.eff[j], return.whole.theta=TRUE)
        aic_vec[i,j] <- vBIC(dat_mat_list, fit00, thr=0.0001)
   }
}


min(aic_vec)

## [1] -11.83102

which(aic_vec == min(aic_vec), arr.ind = TRUE)

##      row col
## [1,]   1   2

# TODO add cross validation step for more Robust selection
# Training portion assume lambda1 lambda2 then calculate AIC in reserved testing

Below visualizes the AIC in our grid search.

image(x=lambda.eff,y=lambda.eff,z=aic_vec,
      ylab='lambda_2',xlab='lambda_1')
contour(x=lambda.eff,y=lambda.eff,z=aic_vec, add = TRUE,nlevels = 10)

Finally, identify lambda combination with lowest AIC.

# identify which had min aic
i_idx <- which(aic_vec == min(aic_vec), arr.ind = TRUE)[1,1]
j_idx <- which(aic_vec == min(aic_vec), arr.ind = TRUE)[1,2]

#assign tuned lambdas based on your search
lam_1 <- lambda.eff[i_idx]
lam_2 <- lambda.eff[j_idx]

But again, even with this relatively small dataset my computer didn’t do well using this 2D grid search (perhaps I did something wrong!). So, I coded the below way of tuning one parameter at a time.

3.4.2 Example tuning parameters computationally faster

Here I am demonstrating how to tune \(\lambda_1\) by holding \(\lambda_2\) small and constant, then using that \(\lambda_1\) to tune \(\lambda_2\), as the authors suggest if doing 2D grid search is computationally prohibitive.

interval_l = 10
lambda.eff <- seq(0.01, 0.4, len = interval_l)
aic_vec1 <- matrix(NA, length(lambda.eff), 1) #for lambda1
aic_vec2 <- matrix(NA, length(lambda.eff), 1) #for lambda2


#search length of lambda1, keeping lambda2 constant and small
for(i in 1:length(lambda.eff)){
        fit00 <- JGL(Y=dat_mat_list,penalty="group",lambda1=lambda.eff[i],lambda2=lambda.eff[1], return.whole.theta=TRUE)
        aic_vec1[i,1] <- vBIC(dat_mat_list, fit00, thr=0.0001)
}

## TODO add cross validation for more robust tuning

# identify which had min aic
i_idx <- which(aic_vec1 == min(aic_vec1), arr.ind = TRUE)[1,1]
#assign tuned lambda1 based on your search
lam_1 <- lambda.eff[i_idx]

#now search for lambda2, using value of lambda1 learned above
for(j in 1:length(lambda.eff)){
        fit00 <- JGL(Y=dat_mat_list,penalty="group",lambda1=lambda.eff[i_idx],lambda2=lambda.eff[j], return.whole.theta=TRUE)
        aic_vec2[j,1] <- vBIC(dat_mat_list, fit00, thr=0.0001)
}

# identify which lambda2 had min aic
j_idx <- which(aic_vec2 == min(aic_vec2), arr.ind = TRUE)[1,1]
#assign tuned lambda1 based on your search
lam_2 <- lambda.eff[j_idx]

Here are the final tuned values for \(\lambda_1\) and \(\lambda_2\) using this method:

print(paste("lambda_1", lam_1))

## [1] "lambda_1 0.0533333333333333"

print(paste("lambda_2", lam_2))

## [1] "lambda_2 0.01"

3.4.3 Run JGL again with tuned parameters

Now lets run GGL again with those tuned lambdas.

ggl_tuned_results <- JGL(Y=dat_mat_list,
                  penalty="group",
                  lambda1= lam_1,
                  lambda2= lam_2,
                  return.whole.theta=TRUE)

# extract all estimated covariance matrices from result
inv_covar_matrices <- ggl_tuned_results$theta
names(inv_covar_matrices) <- ancestry_groups

#now use `graph_from_adjacency_matrix()` function from igraph to create igraph graphs from adjacency matrices

#again since we have a list covariance matrices I use the map() function to apply igraph::graph_from_adjacency_matrix() over the list

graph_list <- map(ancestry_groups,
                  ~graph_from_adjacency_matrix(
                    -cov2cor(inv_covar_matrices[[.x]]),
                    weighted = T,
                    mode = "undirected",
                    diag = FALSE
                  ))
names(graph_list) <- ancestry_groups

3.4.3.1 Tuned GGL Class 1

plot_jgl(graph_list[[1]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 1, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.4.3.2 Tuned GGL Class 2

plot_jgl(graph_list[[2]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 2, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.4.3.3 Tuned GGL Class 3

plot_jgl(graph_list[[3]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 3, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

3.4.3.4 Tuned GGL Class 4

plot_jgl(graph_list[[4]], multiplier = 3, 
         vertex.color = color_list$color,
         vertex.label.color = color_list$color,
         main = "Class 4, GGL")
legend("bottomleft", inset=.02, title="Metabolite class",
   unique(color_list$class), fill=color_opts, cex=0.8)
legend("topleft", inset=.02, title="Edge",
   edge_labels, fill=edge_palette, cex=0.8)

What I take from these networks is that there appear to be dense edges within the metabolite classes, and fewer cross-class edges although there are a few. This makes sense with what we saw in our data exploration heatmap.

3.4.4 Identifying shared and unique edges

I’m still learning about the kind of further analysis you would do after running JGL, but one common thing to do would be to identify which edges are shared across groups and which are unique.

# go through list and use igraph::get.edglist() function to grab matrix of edges
test_edges <- map(graph_list, ~as.data.frame(get.edgelist(.)))

shared_edges <- inner_join(test_edges$ag1, test_edges$ag2) %>%
  inner_join(test_edges$ag3) %>%
  inner_join(test_edges$ag4)

#there's definitely a more efficient way to code this but I'm just slapping it together for now. Using anti_join() function to identify the unique match-ups of edges for each class compared to all others. 

unique1 <- anti_join(test_edges$ag1, test_edges$ag2) %>%
  anti_join(test_edges$ag3) %>%
  anti_join(test_edges$ag4)

unique2 <- anti_join(test_edges$ag2, test_edges$ag1) %>%
  anti_join(test_edges$ag3) %>%
  anti_join(test_edges$ag4)

unique3 <- anti_join(test_edges$ag3, test_edges$ag1) %>%
  anti_join(test_edges$ag2) %>%
  anti_join(test_edges$ag4)

unique4 <- anti_join(test_edges$ag4, test_edges$ag1) %>%
  anti_join(test_edges$ag2) %>%
  anti_join(test_edges$ag4)

For this analysis, there are 103 common shared edges across all 4 ancestry groups. There are 37 edges unique to ancestry group 1, 51 edges unique to ancestry group 2, 12 edges unique to ancestry group 3, and 0 edges unique to ancestry group 4.

From here collaborators might look into these unique or shared edges and their corresponding nodes for further hypothesis generation. You could also start to do some differential network analysis as Dr. Shutta presented in a previous seminar (DINGO).

4 References

Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324

Raji Balasubramanian, Denise Scholtens (2021). ENAR Conference Presentation. (Metabolomics dataset and description)

Koyejo Lab JGL

4.1 Packages and resources used to create this tutorial

igraph: for plotting network graphs
JGL: for running Joint Graphical Lasso
tidyverse: for data wrangling
RColorBrewer: for pretty color palettes
pheatmap: My favorite heatmap package, makes the heatmaps with annotated tracks (See this blog post by Dave Tang for a nice tutorial on using this package.)
I built the tutorial site using GitHub pages and RMarkdown. See here for Jenny Bryant’s guide to using Git and Rmarkdown.

Joint Graphical Lasso Tutorial

Margaret Janiczek