Genome-wide association studies have shown an association between single nucleotide polymorphisms (SNPs) and coronary artery disease and myocardial infarction in new chromosomal regions: 1p13.1, 2q36.3, 9p21 and 10q11.21. The SNPs from the 9p21 region constitute a risk haplotype due to the strong linkage disequilibrium in this area. These SNPs have been extensively replicated in several European and Asian populations, and are associated with other pathologies such as abdominal aortic and intracranial aneurysms, and with intermediate phenotypes such as arterial stiffness and coronary calcium. The risk haplotype of 9p21 is located in a region without annotated genes, near CDKN2A and CDKN2B, known tumor suppressor genes encoding for inhibitors of cell cycle kinases. In the remaining regions the SNPs are located in genes with known roles in atherosclerosis as well as others with new roles. It has been shown that the incorporation of genetic information in the form of SNPs slightly improves the prediction of long-term cardiovascular risk estimated by the Framingham function, allowing the reclassification of individuals into more precise categories. Gene expression studies have found that expression levels of CDKN2A/CDKN2B/ANRIL are co-regulated and associated with the risk haplotype and atherosclerosis severity.
Keywords
.
INTRODUCTIONCardiovascular disease is the leading cause of mortality in industrialized countries. Since 1996, coronary artery disease (CAD) has been responsible for approximately 31% of cardiovascular mortality in Spain, the most common cause being myocardial infarction (MI; 61%). In 2005, there were 72 950 cases of MI in Spain; 60% of these patients were admitted to hospital and 40% died before arrival. The health care costs of coronary disease were 1953 million Euros in 2003.1
Epidemiological and animal studies have determined the cardiovascular risk factors (CRF) predisposing to CAD, such as low-density lipoprotein cholesterol (LDLc), age, obesity, smoking, low levels of high-density lipoprotein cholesterol (HDLc), triglycerides, diabetes mellitus, and hypertension.2 Furthermore, a genetic component has been confirmed by studies on monozygotic twins and other studies in which a family history of CAD has been associated with coronary events.
From the genetic standpoint, CAD is classified as a complex disease, although there are forms of presentation with a simple Mendelian inheritance pattern, such as familial hypercholesterolemia caused by mutations in the LDL receptor gene, PCSK9 and ApoB.2 The main difference lies in the fact that Mendelian diseases cause mutations in a gene that lead to a dysfunctional change in the coded protein, thus increasing the risk of the disease, whereas complex diseases are caused by multiple gene polymorphisms with a small effect size interacting with environmental risk factors.
Linkage and association studies have been used to map the causal genes of CAD. The former follow a family study design in which microsatellites distributed throughout the genome are genotyped (Table 1) leading to the identification of genes such as MEF2A,3 ALOX5AP,4 and TNFSF4.5 Association studies usually follow a case-control design and investigate candidate genes assumed to be involved in the physiology of the disease, and are thus based on a previous hypothesis. By comparing the frequency of genetic polymorphisms in the cases and controls it can be determined whether the gene is associated with the disease or not. The type of polymorphism most commonly used in these studies are single-nucleotide polymorphisms (SNPs) consisting of a variation in a single base pair in the DNA sequence. The most important limitations of association studies include the following: the lack of replicability in different populations, the small sample size and consequent low statistical power, genotyping errors, inaccurate clinical characterization of the disease, inadequate case and control selection, and the presence of a population substructure.6
Table 1. Glossary of Genetic Terms Used in the Article.
Term | Concept |
Genotyping | Determining the genotype of an individual for a genetic variation, whether a polymorphism or a mutation |
Transcript | Each of the mRNA variants resulting from the alternative splicing characteristic of human genes |
Hardy-Weinberg equilibrium | Mathematical model establishing that the genetic composition of a population remains constant over generations in the absence of mutation, natural selection, and other evolutionary forces |
Minor allele | Given 2 alleles A (0.7) and B (0.3) of a polymorphism, the minor allele is the one with the lowest relative frequency (B) |
Metaanalysis of case-control studies has detected the association of some polymorphisms with MI (MTHFR-C677T, CETP-TaqlB, PON1-Q192R, eNOS Glu298Asp, Prothrombin G20210A, F5, AT1R 1166 A/C, ApoB [Xba, EcoRI, Ins/Del], ApoE ¿4/¿4, ACE DD, LPL Ser447Ter), but many of these associations are false positives.7
The aim of the present article is to review new advances regarding the genetic component of CAD, based on genome-wide association studies (GWAS) and their potential clinical usefulness.
GENOME-WIDE ASSOCIATION STUDIESRecently, GWAS have been successful in the discovery of loci associated with CAD, MI, diabetes mellitus type 2, rheumatoid arthritis, Crohn disease, bipolar disorder, and others detailed in a catalog of GWAS.8 These studies are based on the genetic analysis of large case-control samples through genotyping thousands of SNPs distributed throughout the whole genome using DNA microchips (these studies are not based on previous hypotheses). The results of the HapMap project on genotype frequency and haplotype structure have enabled the selection of the minimum SNPs necessary for genotyping in GWAS9 to capture most of the common genetic variability in the human genome.
The advantages of GWAS compared to case-control genetic studies derive from the use of large samples, automated SNPs genotyping of the whole genome (and thus not restricted to candidate genes), sex testing, proof of family relationships and ancestry, and sample quality control to exclude duplication. Furthermore, quality control excludes from the analysis SNPs with missing data, those with deviations from Hardy-Weinberg equilibrium in the controls, any with an allele frequency<1%, and those in which the genotyping rate is <80%-90%. The SNPs are chosen using a criterion for statistical significance that keeps the false positive rate within acceptable limits; also, the results have to be replicable in other populations for an association to be considered definitive.9, 10
The limitations of these studies include the difficulty in detecting loci with a small effect size, the preferential choice of SNPs (and thus other polymorphisms such as copy-number variations (CNV) and microsatellites are not analyzed), and the fact that the contribution of less common SNPs is not assessed and that these studies have usually been conducted in European individuals, which means that other population groups have been little studied.
Genome-Wide Association Studies for Coronary Artery Disease and Myocardial Infarction (Region 9p21)Recent GWAS have found a new region without annotated genes (9p21) associated with CAD9, 11, 12 and MI13 independently of its association with CRF. Region 9p21 contains various SNPs in linkage disequilibrium, such as rs1333049, rs10757274, rs10757278, rs2383206, and rs2383207; of these rs1333049 has shown the greatest evidence of an association (odds ratio [OR]=1.24; 95% confidence interval [CI], 1.20-1.29). A metaanalysis of some 40 000 subjects showed that 25% of European individuals have 2 copies of the rs1333049 risk allele, which increases the risk of CAD by 1.6.14 In addition, these SNPs have been associated with abdominal and intracranial aortic aneurysm, arterial rigidity, myocardial damage caused by coronary spasm, severe coronary stenosis, and coronary artery calcification.6
Gene Expression Studies at Chromosome Region 9p21Of all the loci associated with CAD and MI, region 9p21 has been the most replicated and has been shown to have the strongest association, motivating the attempt to determine the molecular mechanism underlying this relationship. The risk haplotype for cardiovascular disease on 9p21 is located in an area without annotated genes, close to cluster INK4a/ARF composed of the tumor suppressor genes CDKN2A and CDKN2B which are implicated in causing various cancers. Adjacent to these, an antisense noncoding RNA gene known as ANRIL has been identified.15 Various groups have evaluated the expression of CDKN2A/2B/ANRIL and its association with risk SNPs on 9p21.
Broadbent et al.,16 using QRT-PCR, detected the expression of the ANRIL transcript (DQ485453) in primary coronary smooth muscle, macrophages and in carotid endarterectomy and abdominal aortic aneurysm tissue samples. Jarinova et al.17 demonstrated that ANRIL has 4 evolutionarily conserved sequences, one of which (CNS3) potentiates gene expression when amplified from the risk homozygote compared to the reference homozygote. Folkersen et al.18 identified 8 novel ANRIL transcripts in lymphoblastoid cells and carotid artery, medial aorta, and mammary artery plaque tissue. Holdt et al.19 analyzed gene expression in peripheral blood mononuclear cells of CAD patients and carotid, aortic, and femoral plaques. These studies confirm that the only gene that manifests differential expression is ANRIL in association with risk SNPs on 9p21 and with atherosclerotic severity.
In addition, Mathews et al.20 demonstrated in coronary endarterectomy plaque samples that smooth muscle cells undergo senescence associated with DKN2A (p16) and p21 protein overexpression as measured by Western blotting. Furthermore, knockout mice for several tumor suppressor genes, such as p19ARF, p53, p27Kip1, and pRB, present aggravated atherosclerosis,21 indicating the role of cell cycle control proteins in this disease.
In summary, gene expression studies in 9p21 show that the genes CDKN2A, CDKN2B, and ANRIL are expressed in atherosclerotic tissue and are transcriptionally corregulated. The ANRIL gene has several transcripts whose expression levels demonstrate a stronger correlation with risk SNPs genotypes in 9p21 than with CDKN2A/2B. In addition, GWAS have identified ANRIL as a susceptibility gene for diabetes type 2,22 glioma,23 and basal cell carcinoma.24
Genome-Wide Association Studies for Coronary Artery Disease and Myocardial Infarction (Other Chromosome Regions)GWAS have also found associations with CAD and MI in SNPs of chromosome regions other than 9p21. Some of these studies and the characteristics of the most replicated studies are described below.
The SNP rs599839 (1p13.1) is of special importance as it has been associated with increased LDLc values, CAD, and MI; that is, it has been associated with a CRF and with its clinical outcomes. The variant rs599839 is located on the intergenic region adjacent to the PSRC1/CELSR2/SORT genes. Although the function of the first 2 genes remains unknown, SORT1 (sortilin 1) is a multiligand cell surface receptor that binds LDL receptor-associated protein (RAP), lipoprotein lipase, and apolipoprotein A-V and participates in endocytosis and intracellular protein traffic.25 Linsel-Nitschkeet et al.,25 using whole blood genome expression profiles, found that the G-allele of rs599839 was associated with higher SORT1 expression levels, decreased LDLc concentrations, and a 9% reduction of CAD risk. This association is supported by the fact that the overexpression of sortilin 1 in HEK293 cells transfected with SORT1 cDNA resulted in increased LDLc uptake into these cells. The authors propose a possible mechanism of action consisting of LDLc binding to the sortilin receptor in the plasma membrane and subsequent LDLc endocytosis. Kathiresan et al.,26 using human liver global expression data, found that rs599839 influences the amount of PSRC1/CELSR2/SORT1 mRNA with a stronger regulatory effect for SORT1. Subsequently, Musunuru et al.27 mapped the haplotype 1p13 responsible for the association with LDLc and identified SNP rs12740374 as a variant that creates a C/EBP transcription factor binding site, altering the hepatic expression of SORT1. By using gene overexpression and siRNA knockdown for SORT1 in mouse liver, these authors demonstrated that this gene alters LDLc levels. The importance of this novel pathway is based on a 40% difference in MI risk between alternative 1p13 homozygotes, an effect comparable to the common variants of LDLR and PCSK9. The 1p13 minor allele frequency is around 30% in Europeans and is present in other populations, and thus this locus is considered a global genetic determinant of MI.27 In another study, the risk allele A of rs599839 was associated with increased susceptibility for developing CAD (1.29; 95% CI, 1.18-1.40).28
The intronic polymorphism rs6922269 is located in gene MTHFD1L. This gene codes a mitochondrial enzyme responsible for synthesizing forms of tetrahydrofolate. The possibility of a functional connection between MTHFD1L and CAD is supported by the fact that its activity influences plasma homocysteine levels.11
The SNP rs2943634 is located in the region 2q36.3 where only a pseudogen (ENSG00000197218) has been annotated. The SNP rs501120 is located upstream of CXCL12, a chemokine with a core role in tissue regeneration in ischemic heart disease and angiogenesis via endothelial progenitor cell recruitment.11 The SNP rs2943634 has been associated with hypertension and low HDLc levels in the MORGAN cohort.29
The SNP rs17228212 is located in gene SMAD3, a transcriptional modulator activated by TFRβ which participates in cell growth and inhibition, both of which are fundamental in the progression of atherosclerotic plaque.11 This SNP has been associated with cholesterol other than HDLc in the MORGAN study,29 and it is assumed that its effect on CAD is mediated by its association with this CRF. The SNP rs9818870 is found in gene MRAS, which belongs to the RAS superfamily of GTP-binding proteins and is highly expressed in the heart. It is thought to participate in adhesion molecule signaling activity, an important process in the initial phase of atherosclerotic disease. rs9982601 is a variant located in SLC5A3, which participates in Na+ and myo-inositol transport as a response to hypertonic stress.11
The SNP rs12526453 is located in gene PHACTR1. This gene encodes for an inhibitor of enzyme activity of protein phosphatase 1, responsible for the dephosphorylation of serine and threonine residues in proteins.30
Polymorphism rs17465637 is located in gene MIA3, which is involved in cell growth and inhibition.11 Finally, SNP rs3184504 is located in gene SH2B3, which encodes for an adapting protein of the intracellular signaling pathway of T lymphocyte activation.30 Table 2 shows a list of the SNPs mentioned and others, extracted from the GWAS catalog8 by searching for the terms “coronary heart disease” and “myocardial infarction.” It is of note that after identifying the target loci other trials are required (gene and protein expression, knock-out/down, biological activity) to explain the associations and identify therapeutic targets of potential clinical interest.
Table 2. Single-Nucleotide Polymorphisms Associated With Coronary Artery Disease and Myocardial Infarction in European Populations.
SNP (dbSNP) | Gene/chromosome | Frequency of the risk allele in controls | Odds ratio | Trend, P |
WTCCC 9 | ||||
rs17672135 C/T | FMN2 | Allele T 86% | 1.32 (0.79-2.22) | 1.04×10−4 |
rs383830 A/T | 5q21 | Allele T 78% | 1.92 (1.40-2.63) | 5.72×10−6 |
rs6922269 A/G | MTHFD1L | Allele A 25% | 1.65 (1.32-2.06) | 6.33×10−6 |
rs8055236 G/T | 19q12 | Allele G 80% | 2.23 (1.56-3.17) | 9.73×10−6 |
rs688034 C/T | SEZ6L | Allele T 31% | 1.62 (1.34-1.95) | 6.90×10−6 |
rs1333049 C/G | 9p21 | Allele C 47% | 1.90 (1.61-2.24) | 1.79×10−14 |
rs7250581 | 19q12 | Allele G 78% | 1.40 (1.05-1.86) | 9.12×10−6 |
Samani et al. 11 | ||||
rs1333049 C/G | 9p21 | Allele C 47% | 1.37 (1.27-1.49) | 1.80×10−14 |
rs6922269 A/G | MTHFD1L | Allele A 25% | 1.23 (1.13-1.35) | 6.33×10−6 |
rs2943634 A/C | 2q36.3 | Allele C 66% | 1.22 (1.11-1.33) | 1.19×10−5 |
rs599839 A/G | PSRC1 | Allele A 77% | 1.24 (1.12-1.38) | 2.19×10−5 |
rs17465637 A/C | MIA3 | Allele C 71% | 1.23 (1.12-1.34) | 1.00×10−5 |
rs501120 A/G | 10q11.21 | Allele T 87% | 1.24 (1.09-1.41) | 1.31×10−3 |
rs17228212 C/T | SMAD3 | Allele C 70% | 1.19 (1.09-1.3) | 1.18×10−4 |
McPherson et al. 12 | ||||
rs10757274 A/G | 9p21 | Allele G 49% | 1.29 (1.09-1.52) | — |
rs2383206 A/G | 9p21 | Allele G 73% | 1.26 (1.07-1.48) | — |
Helgadottir et al. 13 | ||||
rs2383207 A/G | 9p21 | Allele G 49% | 1.25 (1.17-1.34) | 1.3×10−11 |
rs10757278 C/T | 9p21 | Allele G 45% | 1.29 (1.21-1.38) | 3.6×10−14 |
Kathiresan et al. 28 | ||||
rs12526453 C/G | PHACTR1 | Allele C 65% | 1.13 (1.09-1.17) | 6.54×10−10 |
rs6725887 C/T | WDR12 | Allele C 14% | 1.16 (1.10-1.22) | 4.29×10−7 |
rs9982601 C/T | SLC5A3/MRPS6/KCNE2 | Allele T 13% | 1.19 (1.13-1.27) | 2.12×10−9 |
rs4977574 A/G | 9p21 | Allele G 56% | 1.28 (1.24-1.33) | 1.08×10−41 |
rs1746048 C/T | CXCL12 | Allele C 84% | 1.19 (1.13-1.25) | 8.14×10−11 |
rs646776 A/G | CELSR2/PSRC1/SORT1 | Allele T 81% | 1.18 (1.12-1.25) | 9.36×10−11 |
rs17465637 A/C | MIA3 | Allele C 72% | 1.13 (1.09-1.19) | 1.33×10−8 |
rs1122608 G/T | LDLr | Allele G 75% | 1.14 (1.09-1.19) | 1.49×10−8 |
rs11206510 C/T | PCSK9 | Allele T 81% | 1.15 (1.1-1.21) | 2.02×10−8 |
Gudbjartsson et al. 30 | ||||
rs3184504 C/T | SH2B3 | Allele T | 1.13 (1.08-1.18) | 8.06×10−8 |
Trégouët et al. 31 | ||||
rs2048327(SLC22A3) | SLC22A3-LPAL2-LPA | CCTC: 0.021 | CCTC: 1.82 (1.56-2.11) | 4.20×10−15 |
rs3127599 (LPAL2) | CTTG: 0.178 | CTTG: 1.20 (1.13-1.27) | 1.19×10−9 | |
rs7767084, rs10755578 (LPA) | WTCCC | |||
Clarke et al. 32 | ||||
rs10455872 | LPA | Allele G 7% | 1.70 (1.49-1.95) | 3.60×10−166 |
rs3798220 | Allele C 2% | 1.92 (1.48-2.49) | 5.90×10−51 | |
Erdmann et al. 33 | ||||
rs9818870 C/T | MRAS | — | 1.15 (1.11-1.19) | 7.44×10−13 |
rs7048915 A/G | GLIS3 | — | 0.95 (0.92-0.99) | .0073 |
rs2259816 A/C | HNF1A/C12orf43 | — | 1.08 (1.05-1.11) | 4.81×10−7 |
dbSNP, single nucleotide polymorphism database; SNP, single nucleotide polymorphism.
The frequency data for the risk allele were obtained from controls. If the SNP is located in a non-annotated gene region, the chromosome region is noted. The odds ratios refer to the genetic model of the homozygous risk allele.
Cardiovascular risk tables are mathematical models based on prospective cohort studies that model the risk of suffering CAD over time given certain risk factors. In the mid-20th century, the Framingham study developed the first risk table and since then others have been created, such as PROCAM, QRISK, ASSIGN, etc.34
However, the measurement of CRFs related to lipid values alone has proven to be an inefficient marker for the prediction of cardiovascular risk, as these are affected by many variables. On the other hand, genetic polymorphisms remain constant throughout life, underlining the immense value of genetic screening. Several groups have included SNPs in cardiovascular risk tables to create models with greater predictive power compared to the use of CRFs alone, allowing the reclassification of subjects into more accurate risk categories. Reclassification is based on a statistical analysis to assess whether the incorporation of a given genetic risk score (composed of several SNPs) moves subjects toward high risk categories more often than toward low risk categories, and controls toward low risk categories more frequently than to high risk ones.34
Humphries et al.34 investigated coronary risk prediction by analyzing subjects from the Northwick Park Heart Study II over 10.8 years. The value of area under the ROC curve for a model composed of the CRFs age, triglycerides, total cholesterol, smoking, and systolic blood pressure was 0.66 (0.61-0.70). However, when the model was composed of SNPs in genes UCP2, ApoE, LPL, and ApoA4, the ROC value was 0.62 (0.58-0.66). The model including genotypic and environmental variables (area under the ROC curve, 0.72 [0.68-0.76]) showed a significant increase in the predictive power of the Framingham equation compared to isolated models.
In a follow-up study of 10 000 patients from the ARIC (Atherosclerosis Risk in Communities) cohort conducted over 14.6 years, Brautbar et al.35 analyzed the prediction of MI, coronary revascularization and cardiac death, and genotyped one SNP on 9p21 (rs10757274). The authors reclassified 1.3% and 0.8% of the subjects compared to the Framingham equations; the 9p21 allele had the greatest influence on the intermediate risk categories. Morrison et al.36 conducted a follow-up study of the ARIC cohort for the onset of cardiovascular events over an average of 13 years and genotyped 116 SNPs; they created a score that was significantly associated with CAD in black individuals (risk ratio [RR]=1.2; 95% CI, 1.11-1.29) and European individuals (RR=1.1; 95% CI, 1.06-1.14). The area under the ROC curve calculated with the risk score and CRFs was significantly larger than that which only took into account these factors in black individuals; however, the result was not statistically significant in Europeans.
Paynter et al.37 conducted a 10.2-year follow-up of 22 129 European women who were health professionals and part of the Women's Genome Health Study. They genotyped the SNP rs10757274, which was associated with the presence of cardiovascular events (RR=1.25; 95% CI, 1.04-1.51). However, adding this SNP to a predictive model using CRFs (C-reactive protein and a family history of MI) had no effect on the discriminative power of the model.
McGeachie et al.38 used the prospective study MESA based on subclinical atherosclerosis markers. The authors created a predictive model of coronary calcification with 13 SNPs and an ancestry informative marker. The clinical variables were sex, age, weight, smoking, and diabetes mellitus. When the model (SNP plus clinical variables) was compared to other models that included SNPs alone (77%) or clinical variables (78.3%), the area under the ROC curve increased to 85%. In Spain, 2 private companies (Ferrer inCode and GenDiag) have created a test for cardiovascular risk prediction called Cardioincode (http://www.ferrerincode.com) that improves predictions based on CRFs and 11 associated SNPs according to GWAS.
Although the statistical differences obtained in these studies are not sufficient to warrant immediate clinical interest, and the prediction models can only be used at the population level rather than the individual level, they are still relevant given that the associated SNPs, despite representing moderate risk, are frequently found in the general population and thus provide valuable epidemiological information. Since the GWAS are an active research area, in the future it may be possible to create better predictive models that could include a greater number of SNPs associated with CAD, MI, and CRF, as well as other types of polymorphisms such as the CNVs recently investigated in GWAS.39
Conflicts of interestNone declared.
Received 3 August 2010
Accepted 27 January 2011
Corresponding author: Unidad de Investigación, Servicio de Nefrología, Hospital Universitario de Gran Canaria Dr. Negrín, Universidad de Las Palmas de Gran Canaria, 35010 Las Palmas de Gran Canaria, Spain. jrodperd@gobiernodecanarias.org