Arterial Spin Labeling Calibration between Sites and Comparability
Thomas T. Liu
Matthias J.P. van Osch
Matthias Günther
Xavier Golay
Arterial spin labeling (ASL) has been in use for more than two decades,1 yet it is still not part of standard clinical practice.2 There are many reasons for this, and it would be pointless to designate a specific culprit. Yet several aspects of ASL have made it particularly difficult to calibrate, and reproducibility studies have been lacking for many years until recent developments made it possible to test several sequences across different sites. Among the main reasons for this lack of clinical interest would be the various implementations of ASL and confusing acronyms, the low signal-to-noise ratio (SNR) of the technique, making it difficult to use in noncooperative patient populations, the lack of established phantoms or other calibration tools, and the uncertainty of the clinical community toward a self-declared quantitative magnetic resonance imaging (MRI) method and its potential clinical uses.2 But this situation is currently changing. One of the main drivers for this change has been the recent implementation of commercially available MRI ASL sequences, as well as the increased availability of MRI systems operating at high magnetic field strength (≥3T).3
This chapter will summarize the current state of the art in published multicenter ASL studies and will provide a small overview of the existing initiatives designed to standardize ASL protocols for most neurologic applications.
Reproducibility
The success of MRI-based multicenter reproducibility studies often depends on the consistency of the examination parameters. This includes everything from the briefing of the subjects to the length of the protocol and the scan order, the subject positioning and fixation in the scanner, and the operator-dependent planning of the image sections to be acquired. If any of these steps are not fulfilled, the hope of getting good reproducibility will decrease almost linearly with the number of sites and subjects scanned within the trial.
Patient briefing is of significance for their comfort, the lack of which will be translated directly into the severity of motion artifacts. Equally important is the length of the protocol, and generally, the older the patient population or the more disabled the patients are, the shorter the protocol should be. It is also advisable to perform the most motion-sensitive acquisitions at the beginning of the protocol, when the patient is still very stable; but this has to be put into the context of the trial itself, for which some of the primary outcome measures might not be motion-sensitive acquisitions.
Patient positioning is of importance for the postprocessing of the data where differences in slice angulations can easily affect the subjective reading by radiologists or change quantitative measures, in particular for advanced MRI pulse sequences. This problem is exacerbated in multislice acquisitions, especially when acquired with a gap between slices, and is of less importance in true three-dimensional isotropic acquisitions, such as high-resolution anatomical imaging. And ASL has been, at least historically, mostly acquired in a multislice anisotropic fashion.4 Furthermore, some sequences, such as the quantitative signal targeting by alternating radiofrequency labeling of arterial regions (QUASAR) sequence,5 will not cover the entire brain, and, as such, might present an even more difficult challenge when comparing multiple scans from multiple centers. In addition, some methods use echo-planar imaging (EPI) readout, which is mostly distorted along the phase-encode dimension. Thus, simply the wrong choice of the phase-encode direction can lead to considerably different distortions within the same patient. Similarly, the tilting of the head relative to the main magnetic field can lead to significantly different distortions.
Quality Analysis for Arterial Spin Labeling
Very little if anything has been published on quality analysis of ASL data. As a subtraction technique that relies heavily on signal averaging, ASL is by definition very sensitive to motion artifacts. To tackle this issue, three main approaches have been proposed. The first is to automatically discard any pair of subtracted (control-label) images that shows clear motion-related artifacts.6,7 This can be done based on the mean signal intensity of the subtracted images over time, because motion-corrupted images will show a much larger average signal than the one related only to perfusion.7 Because a large number of averages is usually acquired (typically 30 to 50, see Chapter 15), removing those averages that are most corrupted by motion might be one solution. Here, a slight reduction in the number of averages (by about 20%) might still provide an acceptable perfusion-weighted image with a relatively small reduction in SNR. Figure 34.1 shows the effects of averaging on the standard deviation of the perfusion signal in babies, a patient population highly subject to motion artifacts. The
disadvantage of this technique is that if it were used in a clinical trial, it might induce a bias because the SNR could be related to the type or severity of the disease, which might be correlated with patient motion. The second possibility is to coregister successive averages similarly to what is used in functional MRI.8 Such algorithms might be difficult to use because ASL is usually acquired using an anisotropic acquisition scheme (see Chapter 16), and, therefore, interpolation errors might produce more spurious signal than it can correct. In addition, subtle changes in the ASL images between label and control might lead to biased corrections, rendering the whole dataset unusable. The final approach is the use of prospective motion correction (currently available on every clinical platform), in which motion is determined during acquisition and corrected in runtime by using either a navigator approach9 or an optical tracking method.10 Navigator methods, such as the one proposed by Thesen et al.,9 have the disadvantage of a substantial time lag of the motion correction response (e.g., 1 repetition time (TR)), while optical methods require extra hardware and are usually not available on most clinical sites, rendering the method difficult to use for multicenter studies.
disadvantage of this technique is that if it were used in a clinical trial, it might induce a bias because the SNR could be related to the type or severity of the disease, which might be correlated with patient motion. The second possibility is to coregister successive averages similarly to what is used in functional MRI.8 Such algorithms might be difficult to use because ASL is usually acquired using an anisotropic acquisition scheme (see Chapter 16), and, therefore, interpolation errors might produce more spurious signal than it can correct. In addition, subtle changes in the ASL images between label and control might lead to biased corrections, rendering the whole dataset unusable. The final approach is the use of prospective motion correction (currently available on every clinical platform), in which motion is determined during acquisition and corrected in runtime by using either a navigator approach9 or an optical tracking method.10 Navigator methods, such as the one proposed by Thesen et al.,9 have the disadvantage of a substantial time lag of the motion correction response (e.g., 1 repetition time (TR)), while optical methods require extra hardware and are usually not available on most clinical sites, rendering the method difficult to use for multicenter studies.
Apart from motion correction, no real analysis of the artifacts and other errors present in ASL perfusion-weighted or calculated cerebral blood flow (CBF) maps have been proposed. In particular, the lack of existing good perfusion phantom has hindered most of the development of proper quality analysis, and most comparison studies (presented later in the chapter) have relied on scanning a set of healthy individuals, usually graduate students and young postdoctoral fellows from several sites who were scanned in a repeated manner. Obviously, using volunteers for comparative assessment of perfusion parameters between sites bears the risk of bias owing to biologic variation of the volunteers, and it is well known that factors such as caffeine intake, hydration, time of the day, physical activity, among others, all have an effect on human physiology. As such, there has been no published procedure thus far that accounts for the differences among different manufacturers’ software and for software modifications during longitudinal studies.
The main assumption, however, is that the measured parameter, the CBF, is a physiologic parameter, independent of the acquisition strategy (Chapter 16) and quantification model (Chapter 15). Some of the studies that compared acquisition methods, field strength, or quantification procedures are summarized in the next section.
Single-Site Reproducibility Studies
The vast majority of papers published about ASL did not try to address the problem of reproducibility (for a review, see Golay et al.4). However, a few studies have been published lately that examined the different pulse sequences usually available on clinical platforms.
Wang et al.11 looked at the reproducibility of ASL experiments in the context of power calculations for pharmacologic MRI. Their main finding was that with the precision of their pseudo-continuous ASL (pCASL) sequence, detection of a 15% change in CBF with 90% power, given 15% variation between repeated measurements, would require approximately 20 subjects. Another study showed similar results, necessitating from 7 to 15 subjects per group in a crossover design (i.e., using similar subjects in two sessions), 6 to 10 subjects in a within-session design (i.e., using subjects scanned within a single session), and 20 to 41 subjects in a between-groups design (i.e., using patients and controls) for the detection of a
15% change in CBF, depending on the region of interest.12 Henriksen et al.13 found that ASL would be the preferred method if a low within-subject variability is required, such as in the case of a drug effect in a patient population, as compared with phase-contrast MR angiography, dynamic contrast-enhanced perfusion, and oxygen-based positron emission tomography. These data were also in line with another study that found that ASL (pCASL in this case) provides a reliable whole-brain CBF measurement in elderly adults as compared with the traditional oxygen-based positron emission tomography perfusion.14
15% change in CBF, depending on the region of interest.12 Henriksen et al.13 found that ASL would be the preferred method if a low within-subject variability is required, such as in the case of a drug effect in a patient population, as compared with phase-contrast MR angiography, dynamic contrast-enhanced perfusion, and oxygen-based positron emission tomography. These data were also in line with another study that found that ASL (pCASL in this case) provides a reliable whole-brain CBF measurement in elderly adults as compared with the traditional oxygen-based positron emission tomography perfusion.14
Finally, other groups have examined the reproducibility of pCASL,15,16 pulsed ASL,17,18,19 or both,20 all performed at 3T, as the de facto designated field strength to be used for ASL studies. Their conclusions were all similar, with a within-subject variance between ±10% and 20% for whole gray matter assessment.