Next: Conclusions Up: Permutation Testing Made Practical Previous: Comparison to Other Methods

Test Cases

For an evaluation of the permutation test on actual fMRI data, we used images from eight normal, right-handed subjects collected during performance of the Stroop colour-word interference task [MacLeod 1991]. All subjects signed an informed consent approved by the McLean Hospital Institutional Review Board, and had no history of head injury, psychotropic medication, seizure disorder, substance abuse, or other neurological or psychiatric disorder. Functional scans were acquired on a Signa 1.5T system (General Electric, Milwaukee, Wisconsin) modified by Advanced NMR Systems (Wilmington, Massachusetts). The paradigm consisted of two 30-second blocks of the task, alternating with three 30-second periods of rest. During the task periods, subjects viewed colour names projected onto a screen in front of the scanner in an incongruent colour. Subjects were asked to say the name of the display colour; to succeed, they had to suppress the tendency to read the colour name (e.g. the word `BLUE' written in red). Stimuli were displayed in lines of six. During the 30-second activation period, six of these lines were presented for 4.5 seconds each, with 0.5-second inter-stimulus intervals.

Fifty T2^*-weighted single-shot gradient-echo coronal images in each of 12 slices were acquired (effective TE=40ms, TR=3s, flip angle 90°, 64x64 matrix, in-plane resolution 3.125mm, slice thickness 6mm, slice gap 1mm). Slices were perpendicular to the plane defined by the anterior and posterior commissures, and covered the region from the central sulcus to the tip of the frontal pole. The images were motion-corrected in k-space using the DART (Decoupled Automated Rotation and Translation) algorithm [Maas & al. 1997].

Brain tissue was distinguished from non-brain areas of the image using an automated procedure that examined the median-filtered histogram of voxel intensities averaged over the entire time series, and selected the minimum value of this histogram within the interval between the brain and skull-air peaks. The intensity associated with this histogram minimum was then used as a threshold to identify putative brain voxels. Finally, a region-growing algorithm identified the largest set of connected putative brain voxels, and labelled all areas within or enclosed by this region as brain. Minor corrections to the output of this procedure were implemented by hand for each data set.

The permutation test as described above was applied to each data set, restricted to the set of brain voxels. In a parallel procedure, the standard, Bonferroni-corrected test was also applied, with the correction factor calculated as the reciprocal of the number of brain voxels. In each case, a simple square wave was used as the ideal time series against which to compute correlations. The probability values output by the permutation test and by the Bonferroni procedure were transformed to z-scores for storage and further computation.

Table 1 gives descriptive statistics for each test applied to each data set, as well as comparative statistics between the two tests. For each of the two test procedures, the total number of voxels activated with a two-tailed of 0.05 was calculated. In all cases the permutation test activated more voxels than the Bonferroni test, and in no case did the permutation test omit any voxels that were activated by the Bonferroni test. The increase in activated volume ranged from 7% to 26%.

The total number of activated voxels that were part of clusters was also calculated, where a cluster was defined as any group of more than one adjacent voxel, within a single slice, in which the sign of the activation was uniformly positive or uniformly negative. For the purposes of this computation, diagonal adjacency was allowed. Although the permutation test always increased the total number of activated voxels, as Table 1 shows, it always decreased the proportion of unclustered voxels. In other words, the voxels added by the permutation test tended to be part of activated clusters rather than occurring over widespread regions throughout the image (see Figure 2 for an example).

Figure 2. Comparison of activations in the prefrontal cortex of Subject 1. Activated voxels are cross-hatched upwards, deactivated voxels downwards. Voxels activated only under the permutation test are highlighted with boxes. Left and right are reversed as per radiological convention. Note how the permutation test extends the left dorsolateral activation into the sulcus between the inferior and middle frontal gyri, and links previously unconnected deactivations in medial orbitofrontal cortex, as well as adding voxels close to other regions of activation.

For the set of voxels that were activated by both tests (which in all cases equalled the set of voxels activated by the Bonferroni test), we compared the activation levels from each test, and produced a count of the number of voxels that were more activated by one test than by the other. Voxels whose probability levels differed between the two tests by an amount less than the resolution of the permutation test (10^-4 in the case of our implementation) were considered equally activated and were thus excluded from these counts. As can be seen in Table 1, the permutation test almost always produced higher levels of activation than the Bonferroni test. Binomial tests comparing the counts were all highly significant.

We also wished to get some idea of the relative strengths of activations of the voxels identified by the two tests. In particular, we wished to determine whether the permutation test selectively identifies weakly activated voxels. To answer this question, for each data set we examined the z-scores (transformed from probability levels) of the set of voxels that were activated more strongly by one test than by the other. (In half of the cases, no voxels were activated more strongly by the Bonferroni test than by the permutation test; the table cells corresponding to these cases are therefore empty.) In each case two comparisons were performed, one on the set of z-scores derived from the Bonferroni test and one on the set of z-scores derived from the permutation test. In all cases analysed, for both comparisons separately, the z-scores of the voxels that were more strongly activated by the permutation test were lower than those of the voxels that were more strongly activated by the Bonferroni test. Thus the permutation test demonstrated an ability to identify weaker activations.

$\begin{landscape}\begin{table}[htb] \begin{center} \begin{tabular}{\vert c\vert ... ...nificance of difference in permutation-test z-scores. \end{table}\end{landscape}$

Next: Conclusions Up: Permutation Testing Made Practical Previous: Comparison to Other Methods