Stroke: the scale of the problem
Stroke represents a major health problem. Every year in the USA, 795,000 people experience a new or recurrent stroke . Mortality data from 2008 indicate that stroke accounted for 1 in 18 deaths in the USA . The Framingham Study showed that 1 in 5 women and 1 in 6 men aged 55 to 75 years will experience stroke sometime during their life . Although the incidence may be reducing in developed countries, it has been estimated that stroke mortality will double worldwide by 2020, owing to an ageing population and an increasing incidence in developing countries.
Stroke is not only one of the major causes of death, but is also the leading cause of long-term disability, making it very costly to the economy. It has been estimated that the cost of the 152,000 incident strokes annually in the UK, and the cost of looking after patients who have had previous stroke, is around £8 billion in total, including at least £3 billion in direct healthcare costs .
Cerebrovascular disease (CVD) also causes vascular dementia, which is not only an important cause of dementia in its own right, but also seems to act synergistically with Alzheimer's disease pathology, increasing the chance of resulting clinical dementia . In addition, CVD is the most common cause of adult-onset epilepsy, and there is increasing evidence that vascular changes contribute to late-onset depression .
Conventional cardiovascular risk factors are important in stroke risk, and include hypertension, smoking, diabetes mellitus, hyperlipidaemia, and coexistent cardiovascular disease including ischemic heart disease and atrial fibrillation. However, conventional risk factors fail to account for all stroke risk, as the proportion of unexplained risk has been estimated at about 50%, although such estimates vary .
Stroke as a syndrome not a single disease
Stroke describes the clinical syndrome of focal neurological loss of function, usually of sudden onset, resulting from disturbance in the blood supply to the brain. It can be caused by occlusion of, or hemorrhage from, a cerebral blood vessel. About 80% to 85% of stroke cases are ischemic, whereas 15% to 20% are hemorrhagic.
Cerebral hemorrhage can be caused by multiple pathologies. Most cerebral hemorrhages are primary intracerebral hemorrhages, and many of these are subcortical hemorrhages associated with hypertension . However, many other pathologies can also cause intracerebral hemorrhage, including cerebral amyloid angiopathy and underlying structural lesions. A minority of cerebral hemorrhage cases results from subarachnoid hemorrhage, which is often associated with rupture of an intracerebral aneurysm.
Not only can cerebral hemorrhage be caused by multiple different pathologies, but ischemic stroke is also heterogeneous. The three main causes of ischemic stroke are large-artery stenosis, small-vessel disease (SVD), and cardioembolism .
Large-artery stroke results from atherosclerotic plaque in the carotid, vertebral, or major intracerebral arteries. Plaques, often associated with stenosis, become unstable, resulting in formation of thrombus, which subsequently embolizes distally to occlude cerebral vessels.
Cardioembolic stroke results from a variety of intracardiac pathologies including atrial fibrillation, cardiomyopathy, and thrombus at the site of previous myocardial infarction.
SVD (lacunar stroke) affects the small perforating arteries supplying subcortical structures, and results in small lacunar infarcts affecting the white matter and deep grey matter nuclei. The major risk factor for SVD is hypertension, and the underlying pathologies described include both diffuse small-vessel arteriopathy (lipohyalinosis) and focal atheroma.
In addition to these three most common types of ischemic stroke, there are many other rarer causes, including carotid and vertebral dissection, vasculitis, and single-gene disorders .
Identifying individual stroke subtypes requires detailed investigation including brain imaging, imaging of the extracerebral and intracerebral vessels, and cardiac imaging. Despite this, in as many as 25% to 40% of patients with ischemic stroke, an underlying pathology cannot be found .
This heterogeneity of stroke implies that different pathological mechanisms and risk factors are responsible for different stroke subtypes. Recent genetic studies from stroke are consistent with this, as described later.
The role of genetics in stroke risk
A number of single-gene disorders can result in both ischemic and hemorrhagic stroke, and these tend to cause specific stroke subtypes. The most common is cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), a monogenic cause of cerebral SVD. Although important to an individual patient, these monogenic causes of stroke are rare, and contribute little to overall population risk of stroke. 
Epidemiological studies suggest that genetic risk factors are important for common 'sporadic' stroke. The most robust data on the heritability of a disease come from twin studies, which have compared the incidence of stroke in monozygotic compared with dizygotic twins. Twin studies in stroke support a genetic predisposition, but the number of stroke cases in prospective twin studies is small and therefore, the confidence intervals are wide . There is much more information from family-history studies, which show that a family history of stroke is more common in stroke cases than in stroke-free controls [11, 12]. Such an association could be caused by recall bias, with stroke cases more likely to have identified a family history of stroke, but prospective data from the Framingham Study, where information on family history was taken before the onset of stroke, confirms this association . However, an association with family history could also be caused by shared early life environment, and separating this from genetic risk is difficult. Family-history data suggest that the genetic risk may vary by stroke subtype, with stronger associations being reported for the large-artery disease and SVD subtypes [12, 14]
How can we identify genes for stroke?
Three main methods have been used; linkage, the candidate-gene approach, and genome-wide association studies (GWAS). Linkage relies on identifying associations between chromosomal markers and disease phenotype within families. Linkage is good at identifying genes that are associated with greatly increased risk, but is less successful in more common polygenic diseases, in which multiple genes each contribute a small amount to overall risk. Linkage techniques have identified many disease-causing genes, but these have been primarily single-gene disorders. Using linkage, a number of genes causing monogenic stroke, such as the notch3 gene causing CADASIL, have been discovered , but the technique has also been used to look for variants contributing to polygenic stroke. This approach found that variants in the phosphodiesterase 4D gene PED4D were associated with ischemic stroke in an Icelandic population,  but this could not be replicated in other European populations, suggesting that it is either not important in stroke as a whole, or is only important in specific populations. 
Until recently, the main technique used to look for genes predisposing to common stroke was the candidate-gene method. Using this method, genetic variants, usually single-nucleotide polymorphisms (SNPs), are identified in a 'candidate' gene that is thought to be involved in stroke risk. The frequency of the SNP is then compared between stroke patients and controls, using a case-control approach. Many candidate-gene studies on stroke have been published. but the results have been largely disappointing, with few associations replicated. This picture is common to the genetics of many other complex diseases. The reasons for this lack of success have been explored both in general, and specifically for stroke . Important factors are likely to include small sample sizes; a failure to replicate positive associations, coupled with publication bias resulting in preferential publication of positive associations; and a failure to phenotype cases accurately. A further problem with candidate-gene studies is that associations can only be identified in genes already known and implicated in stroke risk; completely novel genes cannot be identified.
The field of complex genetics has been revolutionized by the GWAS approach, which uses microarray technology to genotype up to one million or more SNPs, spanning the whole genome, in an individual subject . A case-control or cohort approach is then used to compare the frequency of individual SNPs between disease cases and controls, and this is combined with rigorous statistical multiple-comparison methods to account for the many associations tested. Unlike the candidate-gene method, GWAS allows associations between novel chromosomal loci and disease to be identified. This technology has been combined with a realization of the importance of both very large sample sizes and the need to replicate positive associations prior to publication. GWAS had resulted in more than 1,600 novel associations with many complex diseases being identified by September 2011 . Novel genetic associations have been reported in many CVDs including myocardial infarction, hypertension, hyperlipidemia, and diabetes. Most genetic variants discovered using GWAS account for only a small increase in disease risk, with odds ratios (Ors) most often between 1.1 and 1.3. This means that large sample sizes are required to identify such variants, and has resulted in the formation of disease consortiums that combine data from multiple studies in meta-analyses. Some of these now have 50,000 or more samples, allowing identification of variants with progressively smaller ORs.
Genome-wide association studies in stroke
GWAS in stroke have lagged behind those in other CVDs, perhaps because the heterogeneity of the stroke phenotype was thought to make the chance of success less likely; however, the approach is now identifying novel genetic variants for stroke. In this brief review, I will focus on ischemic stroke, although advances are also being made in cerebral hemorrhage and in the genetics of intracranial aneurysms.
Initial studies attempted to replicate GWAS associations that had been initially found in other diseases associated with increased stroke risk. Two variants (PITX2 and ZFHX3), which were initially associated with atrial fibrillation, have both been shown to be independent risk factors for ischemic stroke; associations are only apparent with cardioembolic stroke and not with other stroke subtypes [21, 22]. A variant at chromosome 9p21, which was originally associated with myocardial infarction and coronary artery disease , was found to be associated with ischemic stroke across multiple populations, but this association was present only with large artery stroke .The same locus has been associated with aortic and intracranial aneurysms .
There have been fewer novel GWAS associations initially identified in stroke itself, although a number of large GWAS in ischemic stroke are now taking place. Recently, the Wellcome Trust Case Control Consortium 2 ischemic stroke GWAS identified a novel association at 7p21; the most likely underlying gene is HDAC9, encoding histone deacetylase 9 .This association was confined to the large artery stroke subtype. HDAC9 is a member of a large family of genes that encode proteins responsible for deacetylation of histones, and therefore regulate chromatin structure and gene transcription. The mechanism by which variants in HDAC9 increase large artery stroke risk is not immediately clear, although the specific association with large artery stroke might suggest that they act through increasing atherogenesis. Sodium valproate, which has HDAC inhibitory properties, has been shown to inhibit atheroclerosis in animal models , and intriguingly, sodium valproate therapy in humans has been associated with lower stroke and myocardial infarction rates compared with other anti-epileptics .
A GWAS on Japanese cases with ischemic stroke identified an SNP in a member of the protein kinase (PKC) family, PRKCH, which was associated with small-vessel stroke . This was replicated in an independent cohort, and the association has been further replicated in a Chinese population , and also with MRI-determined silent brain infarction . PRKCH is a serine/threonine kinase that regulates a variety of cellular functions including differentiation, proliferation, and apoptosis . The SNP identified is very rare in white populations, and whether it contributes to disease risk in populations other than Japanese and Chinese, or whether the association can be replicated more widely, remains to be determined.
Most studies to date have used a case-control design. This design could be open to bias if the gene in question is associated with early mortality in a disease such as stroke, where there is a significant early mortality, and if samples are collected only from survivors. Cohort studies, in which subjects are followed for many years and those developing stroke are compared with those remaining stroke-free, avoid this bias; however such studies tend to have smaller numbers of stroke, and also subtyping can be difficult as the strokes may occur in multiple hospitals. A meta-analysis of prospective cohort GWAS studies reported an association with the 12p13 region , but this could not be replicated in a much larger case-control study  or in a second Swedish study . This could be because the variant is associated with stroke mortality, but it could also represent a false-positive association.
An alternative design is to genotype more than one family member and use family-based analysis methods. These can involve parent and offspring trios or more extensive family structures, or be based on sibling pairs. The former are difficult to collect for a late-onset disease such as stroke; affected sibling-pair collections are more realistic, but even collecting these in stroke is challenging, and obtaining the large sample sizes required for GWAS has proved difficult .
In summary, GWAS studies have identified a few robust associations with ischemic stroke. The associations found to date have been for specific stroke subtypes, emphasizing the importance of careful phenotyping, and suggesting that different stroke subtypes have different pathophysiologic mechanisms and genetic risk-factor profiles.
The largest GWAS studies in stroke to date have comprised about 3,000 cases. Larger studies are underway. The Stroke Genetics Network (SiGN) study, funded by the National Institute of Neurological Disorders and Stroke, aims to genotype at least a further 6,000 cases. The Metastroke collaboration has brought together groups with GWAS data on ischemic stroke from throughout the world, and currently comprises about 14,000 cases and 50,000 controls.
Meta-analysis of GWAS data from tens of thousands of patients with stroke is likely to identify further variants, as it has for other complex diseases. If variants do indeed predispose to specific stroke subtypes, then even larger sample sizes may be required. For example, even though a sample size of 10,000 sounds large, it will only include approximately 2,000 individuals with large artery stroke.
The GWAS approach is suited to identify common variants, each of which contribute a small amount to disease risk. It is less effective at detecting rare variants, which might still be important in disease risk. Whole-genome sequencing enables these rare variants to be identified, and the cost of this technique is rapidly falling . As yet, results are not available from this approach for stroke, although studies are underway. Many current sequencing studies limit coverage to sequencing of the exome, or protein coding part, of the genome. Exome sequencing has been successful in many rare, primarily monogenic, diseases and may offer a cost-effective way to screen for multiple single-gene causes of stroke in one assay. Studies are underway using exome sequencing to try to identify rare variants that may contribute to more common polygenic diseases . Such studies may particularly benefit from the use of family-based approaches, to help separate causal from non-causal variants.
Another emerging area is epigenetics, although to date there have been few studies for stroke. Epigenetics describes the study of heritable changes in gene expression or cellular phenotype, which are caused by mechanisms other than changes in the underlying DNA sequence . It therefore refers to functionally relevant modifications in the genome that do not involve the changed nucleotide sequence. Examples of such changes are DNA methylation and histone modification, both of which serve to regulate gene expression without altering the DNA structure. Methods are now becoming available to assess epigenetic changes. For example, array-based methods can be use for typing DNA methylation in large populations, and case-control studies similar to those being performed for GWAS are beginning to reveal interesting results .
How can identifying genes for stroke help patients?
Even though monogenic stroke is rare, identifying the underlying gene can be important for the individual patient. In such diseases, a mutation in a specific gene results in disease, and most individuals with the mutation are likely to develop stroke or other clinical presentations of the disease at some stage in their life. Identifying the underlying mutation allows diagnosis, information on prognosis, and in some cases, specific treatments. It also enables counseling of other family members, and prenatal testing if desired. However, many monogenic forms of stroke are untreatable, and therefore, specialized genetic counseling is important before mutation testing. This is particularly important in asymptomatic individuals, or those with mild disease; for example, potential CADASIL patients who have migraine but have not yet developed stroke or dementia.
However, the vast majority of stroke is 'polygenic', with many genes thought to be involved, each conferring a small risk and probably interacting with multiple environmental risk factors to cause disease. How can such genetic knowledge benefit patients? Can we really offer personalized medicine in which the genetic profile in an individual provides useful information on stroke risk? Genetic testing for polygenic diseases is already being developed, and indeed, some gene tests for cardiovascular disease have already been marketed, with the individual purchasing a test online and sending off a saliva swab for DNA extraction and testing for a number of at-risk SNPs. However, the clinical use of such tests has been questioned. Genetic epidemiological studies have suggested that the sibling relative risk for stroke is approximately 2 to 3, with the higher estimate applying if younger patients are considered . Assuming that the genetic variants identified confer ORs of between 1.1 and 1.2, it has been calculated that 100 to 300 different genetic variants explain this degree of risk . Therefore, panels that include only 5 to 10 or so variants will explain only a small proportion of overall disease risk and so be poor disease predictors. The genetic variants for stroke described to date account for only a small proportion of overall stroke risk. Therefore, even when combined, their predictive value is low, and even if an individual has these variants, they may not develop stroke during their lifetime, whereas people without these variants could be at risk of stroke. Until we have a more complete understanding of the molecular basis of genetic variation, such predictive testing is likely to provide limited information.
There are also questions about the usefulness of such personalized testing in patients with complex diseases such as stroke. We already know many risk factors for stroke, such as hypertension and smoking, but despite their importance, patient compliance is often suboptimal. Unless there are specific novel treatments for individual genetic variants, it is likely that the advice given to a patient identified as having a high genetic risk of stroke would merely be to adhere more closely to cardiovascular risk-factor prevention, yet it is unclear whether such high-risk patients would indeed do so. Furthermore, there is the possibility that patients deemed to have low genetic risk might pay less attention to general risk-factor prevention and therefore, expose themselves to increased risk. There has also been concern over the psychological consequences of testing.
Therefore, the clinical use of genetic profiling of stroke risk is likely to be some way in the future. In the more immediate future, identifying novel genetic variants may contribute to treating disease by identifying new pathways involved in the pathogenesis of stroke. Using information gained from GWAS to develop novel treatments for complex diseases is beginning to bear fruit in other diseases such as macular degeneration and Crohn's disease [42, 43]. One criticism is that the genetic associations identified are unlikely to be important in view of the small increase in risk (OR) associated with each one. However, it is relevant that the total variance in disease risk explained by genes involved in pathological processes targeted by already established successful currently available drugs, such as statins for hypercholesterolaemia, sulfonylureas for diabetes, and estrogens for bone density, is often small .
One area where personalized genetic medicine may have earlier application is pharmacogenomics. Genetic variation influences drug metabolism, and thus both drug efficacy and risk of drug-related adverse effects (AEs). Pharmacogenomics uses an individual's genotype to assist in choosing therapies and identifying the optimal dose, with the aim of ensuring maximum efficacy with minimal AEs . In addition, it can provide new insights into the mechanisms of drug action, and therefore contribute to the development of new therapeutic agents. As yet, pharmacogenomics has had little effect on routine clinical stroke care in most countries, but two potential applications in tailoring anti-platelet therapy and warfarin dosage in patients with CVD, including stroke, have been proposed, for the drugs clopidogrel and warfarin.
Clopidogrel is widely used for prevention of secondary stroke. It is more effective than aspirin alone , and has similarly effectiveness to the combination of aspirin and dypridamole in long-term secondary prevention . Approximately 5 to 30% of clopidogrel-treated patients exhibit low or no reactivity to clopidogrel, which is referred to as 'clopidogrel resistance.' Clopidogrel requires transformation into an active metabolite by cytochrome P450 (CYP) for its anti-platelet effect. Different CYP isoenzymes are responsible for clopidogrel activation, and among these, CYP2C19 has been found to play a key role. Carriers of at least one CYP2C19*2 reduced-function allele (about 25 to 30% of the population) have a one-third reduction in the active metabolite of clopidogrel compared with non-carriers, whereas the 2% of individuals homozygous for the polymorphism have a much greater reduction . The variant has been associated with a corresponding reduction in platelet inhibition .
A number of reports have suggested that this variant was associated with increased cardiovascular events in patients on clopidogrel, particularly after coronary stenting, where there is a high risk of stent thrombolysis. This led to the US Food and Drug Administration (FDA) announcing in 2010 that clopidogrel would receive a boxed warning in the prescribing information, which cautioned that slow metabolism of clopidogrel was associated with higher cardiovascular event rates, and suggested that genetic testing could identify individuals who were slow metabolizers, thereby allowing physicians to implement 'alternative treatment strategies.' . This FDA announcement was controversial. The American Heart Association (AHA) and American College of Cardiology (ACC) issued a consensus statement that concluded 'The evidence base is insufficient to recommend either routine genetic or platelet function testing at the present time' . This controversy has continued, with arguments being made both for  and against  testing, although these have been primarily applied to treatment in stented individuals and not in non-stented patients with coronary ischaemia or patients with stroke.
A recent systematic review and meta-analysis identified 32 studies of 42,016 patients reporting 3,545 cardiovascular events and 1,413 bleeding events . Six studies were randomized trials with comparison against another treatment (effect-modification design) whereas the remaining 26 studies comprised individuals exposed to clopidogrel with no control arm (treatment-only design). In the analysis of treatment-only studies, individuals with one or more CYP2C19 alleles had a lower risk of bleeding, and a higher risk of cardiovascular events. However, when analyses were restricted to studies with 200 or more events, the association was no longer significant, consistent with publication bias. In studies that included a control arm, the CYP2C19 genotype was not associated with modification of the effect of clopidogrel on cardiovascular end points or bleeding.
Even if an association can be shown between a genetic variant and drug efficacy, the most powerful evidence for the use of such a genetic test in clinical practice is a randomized, controlled trial that compare a strategy of modifying treatment, based on the results of genetic screening, with standard care (that is, no testing). Such trials will not only provide reliable estimates of the effect of genotype on drug response but will also take into account the potential effect of the testing procedure itself on patient outcomes .
Most studies of the CYP2C19 polymorphisms have been in coronary artery disease, but in a genetic substudy of the Clopidogrel for High Atherothrombotic Risk and Ischemic Stabilization, Management, and Avoidance (CHARISMA) study, about 20% of the 4,819 genotyped patients had ischemic stroke at entry . Carriers of CYP2C19 loss-of-function alleles did not have an increased rate of ischemic events, but did have a significantly lower rate of any bleeding when on clopidogrel.
The story of a personalized pharmacogenomic approach to clopidogrel therapy illustrates the difficulties in implementing such an approach, and the need for rigorous assessment of its benefit and effect on clinical outcome. This does not mean that the approach may not prove useful in the longer term. The CYP2C19 loss-of-function alleles account for only 12% of the variability in response to clopidogrel, whereas 72% of the variability is heritable . Genetic testing of a wider range of variants that better captures this heritability is likely to provide more predictive information.
Warfarin reduces the risk of stroke in patients with non-valvular atrial fibrillation, and is also used in patients with other cardiac lesions associated with a high risk of cardioembolism, including prosthetic heart valves and mural thrombus. The high variability in drug response means that blood monitoring of coagulation with the international normalized ratio (INR) is required, but there is a narrow therapeutic index, and there is a a risk of thrombosis with under-anti-coagulation and of hemorrhage with over-anti-coagulation. Warfarin is the second leading drug-related reason for emergency department visits , and the most frequently cited reason for drug-related mortality . Therefore, methods to improve the safety and effectiveness of warfarin therapy would have wide application.
A number of genetic variants have to shown to influence warfarin levels . Warfarin is a racemic mixture, with S-warfarin being more potent than R-warfarin. CYP2C9 is a hepatic drug-metabolizing enzyme in the CYP450 superfamily, and is the primary metabolizing enzyme of S-warfarin. Two common CYP2C9 allozymes have markedly reduced enzyme activity. It was shown that patients who required a low final dose of warfarin on the basis of INR values often carried one or two of these two common CYP2C9 variant alleles, and were at increased risk for hemorrhage during warfarin therapy, presumably because they metabolize the drug more slowly . Vitamin K epoxide reductase complex subunit 1, is the target for warfarin-based anticoagulants, and SNPs in the VKORC1 gene are also associated with the dose of warfarin required to achieve a target INR value. Together, the CYP2C9 and VKORC1 polymorphisms explain about 30 to 40% of the total variation in the final warfarin dose .
To assess the added contribution of testing for these genetic variants, the clinical and genetic data from 4,043 patients were used to create a dose algorithm that was based on clinical variables only, and an algorithm in which genetic information was added to the clinical variables . This was validated in a second cohort of 1,009 subjects. Use of the pharmacogenetic algorithm produced dose recommendations that were significantly closer to the required stable therapeutic dose than those derived from the clinical algorithm, particularly for patients who required unusually high or low warfarin doses.
Supporting this approach, a mulitcenter national study prospectively collected data on rate of hospitalization over a 6-month period in 896 patients receiving warfarin genotyping, and compared this with 2,688 matched historical controls . The genotyped cohort had 31% fewer hospitalizations overall, and 28% fewer hospitalizations for bleeding or thromboembolism.
In February 2010, the FDA revised the label on warfarin, providing genotype-specific ranges of doses, and suggesting that genotypes be taken into consideration when the drug is prescribed. CYP2C9 and VKORC1 genotyping is now clinically available. as are online and web-based algorithms incorporating genotypic information to calculate dosage . Despite this, implementation of this genetic testing in clinical practice has been slow. It has been argued that a truly randomized trial is required to confirm the effect on clinical management and to examine cost-effectiveness. In addition, new anticoagulants with a wider therapeutic range and acting by different mechanisms have been shown to be as effective as warfarin in stroke prevention, and may be preferred for patients for whom warfarin therapy is difficult or has anticipated side-effects.
Promise of pharmacogenomics
The cases of clopidogrel and warfarin demonstrate the promise of pharmacogenomics, but also the difficulties in evaluating the clinical effect of such an approach. With the increasing reliance on evidence-based medicine and large randomized trials, it is likely that a similar degree of evidence will be required before such approaches are widely implemented. Nevertheless, this is an area that could significantly improve targeting of therapies and reduce side-effects.