Network Density Analysis


Our post-GWAS analysis method (network density analysis; NDA) reveals new biological features of numerous disease states and traits. It works by examining a coexpression network of transcription start sites (discovered in FANTOM5). We find that transcripts containing GWAS hits for a given trait tend to fall into more dense groupings in the coexpression network than randomly-selected transcripts.

NDA demonstrates that GWAS hits for a given disease tend to be near promoter/enhancer elements with similar expression profiles, which enables us to find more hits, fine map probable causative SNPs, and implicate cell types in pathogenesis. Surprisingly, for some diseases, the underlying variants fall into distinct functional groups, suggesting either dual mechanisms of disease, or distinct disease endotypes.

Click here to view the published results.

Baillie JK et al. Shared Activity Patterns Arising at Genetic Susceptibility Loci Reveal Underlying Genomic and Cellular Architecture of Human Disease.PLOS Computational Biology 14, no. 3 (March 1, 2018): e1005934. PMC5849332.

Pairwise coexpression networks derived from GWAS results. Each coloured ball indicates a transcription start region containing a GWAS-associated variant. Red - significantly coexpressed by network density analysis. Light blue - all other transcription region containing GWAS-associated variants for this phenotype.

(3d visualisation by vasturiano)

Overview of approach

Network density analysis method for detecting significant coexpression among GWAS hits. (a) A subset of regulatory elements is identified containing disease-associated SNPs. (b) The strength of the links between pairs of these regulatory regions is quantified, first as the Spearman correlation, then as the -log10p-value quantifying the probability, specific to this regulatory region, of a Spearman correlation of at least this strength arising by chance. This is determined from the empirical distribution of correlations between this regulatory region and all other regulatory regions in the entire network of all regulatory regions in the genome. (c) The subset of regulatory regions containing disease-associated SNPs form an unexpectedly dense grouping in the network. The NDA score assigned to any one node is the sum of the links it shares with other nodes in the chosen subset. d) NDA scores from the input subset of regulatory elements are compared with NDA scores from permuted subsets of regulatory elements in order to quantify the false discovery rate (FDR).

View results of example analyses

Height8882 snps searched471 promoters hit166 distinct regions mapped29 significantly-coexpressed regions
Systolic Blood Pressure417 snps searched25 promoters hit13 distinct regions mapped
Diastolic Blood Pressure711 snps searched26 promoters hit14 distinct regions mapped
High-density lipoprotein5410 snps searched450 promoters hit101 distinct regions mapped17 significantly-coexpressed regions
Low-density lipoprotein4644 snps searched321 promoters hit92 distinct regions mapped19 significantly-coexpressed regions
Total Cholesterol6421 snps searched519 promoters hit128 distinct regions mapped29 significantly-coexpressed regions
Crohn's disease1924 snps searched217 promoters hit70 distinct regions mapped23 significantly-coexpressed regions
Triglycerides4863 snps searched437 promoters hit97 distinct regions mapped23 significantly-coexpressed regions
Ulcerative colitis2162 snps searched234 promoters hit83 distinct regions mapped20 significantly-coexpressed regions