Reader Comments

Post a new comment on this article

Difficulties with this study

Posted by Kounios on 02 Feb 2008 at 23:15 GMT

This is an interesting paper, but a careful read reveals a number of problems. This is frustrating, because the study had the potential to be quite informative. Some of the major problems are described below. The most important one is last.

a) Examination of the figures shows that the data are very noisy. This is partly explained by the fact that the number of data points per cell per subject is very low, well below what is usually considered acceptable in EEG/ERP research. In fact, the data were noisy enough that the authors resorted to averaging neighboring electrodes in an attempt to boost statistical power, even though their EEG acquisition system had only 32 electrodes.

b) The authors utilized the basic experimental design, stimuli, and approach of Jung-Beeman et al. (PLoS Biology, 2004). Yet, while solving the same problems, the subjects used by S&B were extremely slow by comparison. In fact, S&B don’t even give the mean reactions times, but it is interesting that their timeout interval was 45 sec and that that one subject’s data were excluded because his or her mean impasse times were “peculiar” because they were “too early” (i.e., median of 17.3 sec). In contrast, the subjects of Jung-Beeman et al. (2004) had mean (not median) reaction times under 10 sec. In addition, note that when S&B provided “hints” to subjects who either reached impasse or timed out, the hints did not seem to help them to solve the problems. So it is not clear what S&B’s subjects were doing.

c) S&B used an EEG analysis technique known as Event-Related Synchronization/Desynchronization (ERS/ERD) in which EEG power for each frequency band and each time sample is computed as a ratio using power during a baseline interval. In this case, the baseline interval was the half second before each problem was presented (during a warning interval). From the results of Kounios et al. (Psychological Science, 2006), it is known that brain activity during such a pre-problem interval actually predicts later brain activity during the solving of the problem; in some cases, it actually anticipates later brain activity during problem solving. And this pre-problem brain activity predicts what type of problem-solving strategy (and associated brain activity) that the subject will have during problem solving. So, by using the pre-problem interval as a baseline, S&B may have wiped out many of the strongest and most interesting brain activations while artificially creating others. In contrast, Jung-Beeman et al. (2004) used absolute EEG power without such baselining. Jung-Beeman’s EEG finding of a burst of gamma band activity over the right anterior temporal lobe associated with sudden insight was replicated by them with a very different measure – fMRI. S&B did not find this right temporal gamma burst, even though they used the same compound remote associate problems as Jung-Beeman et al. with a very similar procedure. The fact the S&B used a very similar procedure and a problematic analysis technique and did not replicate the basic Jung-Beeman et al. finding (which was itself replicated with fMRI) casts doubt on the accuracy of all of S&B’s EEG findings. The authors could fix this problem by (a) collecting more data and (b) by reanalyzing their data using the wavelet method used by Jung-Beeman et al. instead of using the ERS/D method (with it’s inappropriate prestimulus baseline).

Responses to your comments

joydeep replied to Kounios on 04 Feb 2008 at 17:44 GMT

Thanks for the comments. Here are my immediate responses to your critical comments:

1. Your comment
the data were noisy enough that the authors resorted to averaging neighboring electrodes in an attempt to boost statistical power, even though their EEG acquisition system had only 32 electrodes.

Our Response:
I could not understand what was meant by “ … the data were noisy enough that the authors resorted to averaging technique.”
First of all, no averaging was done in the data analysis, i.e. spectral power was calculated at each time and each frequency point for each electrode location and followed by statistical comparison again for each time point, frequency, and electrode location. And in order to avoid multiple comparison problem and to control the Type I error, these three dimensional matrices of F-values (time x frequency x electrode) were used as a input to a nonparametric randomization test (Osipova et al., J Neuroscience, 2006).
The TFR plots were averaged only for visual inspection and also to show the link between the cluster and TFR.


Your comment
b) The authors utilized the basic experimental design, stimuli, and approach of Jung-Beeman et al. (PLoS Biology, 2004). Yet, while solving the same problems, the subjects used by S&B were extremely slow by comparison.

Our Response:
One of the primary aims of this study was to study the neural correlates of mental impasse (MI), so we purposefully decided to give “sufficient” time to work on the problem. MI can only be properly registered when the participants have seriously tried as opposed to reporting MI immediately after the problem was given. Actually we asked the participants to try serioiusly! This twist in the design might contribute to an increase in reaction time, but I would not categorize the group of participants as very slow.


Your comment:
In fact, S&B don’t even give the mean reactions times, but it is interesting that their timeout interval was 45 sec and that that one subject’s data were excluded because his or her mean impasse times were “peculiar” because they were “too early” (i.e., median of 17.3 sec).

Our Response:
See the Table-2 and supplementary tables for the numbers.
See the earlier comments – we believed that if MI button was pressed too early, the participant did not exhaust the possible set of alternatives. This is not an unrealistic assumption we hope. True that we lost a few trials but what we kept truly reflected the MI.

Your comment:
In addition, note that when S&B provided “hints” to subjects who either reached impasse or timed out, the hints did not seem to help them to solve the problems. So it is not clear what S&B’s subjects were doing.

Our Response:
See Table 2 which shows the percentages of correct and incorrect trials with and without hint.
Typicall, the participants correctly solved 38.8% (mean, SD = 9.6) of all compound remote associate task trials without a hint and gave an incorrect solution to 11.2% (SD = 8.0). After hint presentation, 39.4% (SD = 8.1) of all solutions were correct and 4.7% (SD = 6.1) of the hints led to an incorrect solution. 55.9% (SD = 11.9) of all post-hint trials could not be solved within the provided time limit. These are clearly stated in the beginning of Results Section.


Your comment:
c) S&B used an EEG analysis technique known as Event-Related Synchronization/Desynchronization (ERS/ERD) in which EEG power for each frequency band and each time sample is computed as a ratio using power during a baseline interval. In this case, the baseline interval was the half second before each problem was presented (during a warning interval). From the results of Kounios et al. (Psychological Science, 2006), it is known that brain activity during such a pre-problem interval actually predicts later brain activity during the solving of the problem; in some cases, it actually anticipates later brain activity during problem solving. And this pre-problem brain activity predicts what type of problem-solving strategy (and associated brain activity) that the subject will have during problem solving. So, by using the pre-problem interval as a baseline, S&B may have wiped out many of the strongest and most interesting brain activations while artificially creating others. In contrast, Jung-Beeman et al. (2004) used absolute EEG power without such baselining. Jung-Beeman’s EEG finding of a burst of gamma band activity over the right anterior temporal lobe associated with sudden insight was replicated by them with a very different measure – fMRI. S&B did not find this right temporal gamma burst, even though they used the same compound remote associate problems as Jung-Beeman et al. with a very similar procedure. The fact the S&B used a very similar procedure and a problematic analysis technique and did not replicate the basic Jung-Beeman et al. finding (which was itself replicated with fMRI) casts doubt on the accuracy of all of S&B’s EEG findings. The authors could fix this problem by (a) collecting more data and (b) by reanalyzing their data using the wavelet method used by Jung-Beeman et al. instead of using the ERS/D method (with it’s inappropriate prestimulus baseline).

Our Response:
First, what we believed important is the change in spectral power not the absolute power – on prior inspection we did find wide variations in terms of raw power over subjects. The normalized (w.r.to baseline (or resting activity)) power is a standard measure of brain oscillations and also offers easier interpretation than the raw power values. Further, baseline corrected power offers an excellent technical advantage over raw power: Instantaneous raw power estimates are nonlinear function of the data, usually raw power follows chi-square distributions if raw EEG amplitudes are normally distributed. So one has to be really cautious while using standard parametric statistics on the raw power values. However if one corrects the raw power wrt the baseline power, the difference instantaneous power values are normally distributed. This would subsequently justify the application of parametric statistics. See the paper by Kiebel at el. Human Brain Mapping, 2005 for a nice treatment on this issue.

On complex demodulation or wavelet:
Different analysis such as Morlet wavelet, Hilbert transform based complex demodulation, or STFT are largely equivalent, because they all conform to a linear convolution with some filtering kernel (see, Kiebel eta l, Human Brain Mapping, 2005; Bruns, J Neurosci Meth, 2004). This Hilbert Transform based complex demodulation technique offers excellent temporal resolution, so seems very appropriate for studying time varying changes in brain oscillations.

The aims of this study were to study the spatiotemporal neural dynamics during different stages of insightful problem solving. With out causing any offence to the earlier seminal studies on neural correlates of insight, we strongly felt that a binary division based on subjectively reported insight and noninsight tell us little about the actual cognitive processes involved in solving these problems (Weisberg, personal communication). The researchers spent too much of time on these Aha! phenomenon but didn’t pay proper attention to the neurocognitive mechanisms associated with different characteristic features of insightful problem solving.

We did mention a few reasons why our results are a bit different from Jung-Beeman et al.. See the paper for details.

RE: Responses to your comments

Kounios replied to joydeep on 05 Feb 2008 at 17:39 GMT

Thanks for your replies, Joy. Below I respond to these. My replies are in capital letters, to make it easier to keep all this straight. - JK

1. Your comment
the data were noisy enough that the authors resorted to averaging neighboring electrodes in an attempt to boost statistical power, even though their EEG acquisition system had only 32 electrodes.

Our Response:
I could not understand what was meant by “ … the data were noisy enough that the authors resorted to averaging technique.”
First of all, no averaging was done in the data analysis, i.e. spectral power was calculated at each time and each frequency point for each electrode location and followed by statistical comparison again for each time point, frequency, and electrode location. And in order to avoid multiple comparison problem and to control the Type I error, these three dimensional matrices of F-values (time x frequency x electrode) were used as a input to a nonparametric randomization test (Osipova et al., J Neuroscience, 2006).
The TFR plots were averaged only for visual inspection and also to show the link between the cluster and TFR.

THE TIME-FREQUENCY PLOTS LOOK VERY NOISEY. IN THE METHOD SECTION OF THE PAPER (PAGE E1459), IT STATES: " NEXT WE SEARCHED FOR BOTH POSITIVE AND NEGATIVE T-STATISTIC CLUSTERS IN TIME, FREQUENCY AND ELECTRODE SPACE, WHERE WE CONSIDERED ELECTRODES WITH A DISTANCE OF LESS THAN 7 CM AS NEIGHBORS (YIELDING ON AVERAGE 6.8 NEIGHBORS PER CHANNEL). ... WE ASSUMED THAT A ROBUST CLUSTER SHOULD ENCOMPASS AT LEAST 4 NEIGHBORING CHANNELS. ...FOR EACH CLUSTER WE CALCULATED THE SUM OF THE T-TEST STATISTICS AS THE TEST STATISTIC." I UNDERSTAND THAT CHANNELS WERE AVERAGED IN THE FIGURES. BUT IT IS NOT CLEAR TO ME WHAT THE JUSTIFICATION IS FOR SUMMING THE T-SCORES FOR NEIGHBORING CHANNELS RATHER THAN AVERAGING THEM. DOESN'T SUMMING THESE SCORES INFLATE THE STATISTICAL SIGNIFICANCE?


Your comment
b) The authors utilized the basic experimental design, stimuli, and approach of Jung-Beeman et al. (PLoS Biology, 2004). Yet, while solving the same problems, the subjects used by S&B were extremely slow by comparison.

Our Response:
One of the primary aims of this study was to study the neural correlates of mental impasse (MI), so we purposefully decided to give “sufficient” time to work on the problem. MI can only be properly registered when the participants have seriously tried as opposed to reporting MI immediately after the problem was given. Actually we asked the participants to try serioiusly! This twist in the design might contribute to an increase in reaction time, but I would not categorize the group of participants as very slow.


Your comment:
In fact, S&B don’t even give the mean reactions times, but it is interesting that their timeout interval was 45 sec and that that one subject’s data were excluded because his or her mean impasse times were “peculiar” because they were “too early” (i.e., median of 17.3 sec).

Our Response:
See the Table-2 and supplementary tables for the numbers.
See the earlier comments – we believed that if MI button was pressed too early, the participant did not exhaust the possible set of alternatives. This is not an unrealistic assumption we hope. True that we lost a few trials but what we kept truly reflected the MI.

I DID NOT SEE THE TABLE IN THE SUPPLEMENTARY ONLINE INFORMATION. MY APOLOGIES. THE REACTION TIMES ARE THERE. FOR LOW RESTRUCTURING (0 AND 1), THE REACTION TIMES ARE COMPARABLE TO THOSE IN JUNG-BEEMAN ET AL. HOWEVER, FOR HIGH RESTRUCTURING (2 AND 3), THE REACTION TIMES WERE, BY COMPARISON, VERY SLOW. EVEN THOUGH JUNG-BEEMAN ET AL. DIDN'T DIRECTLY EXAMINE IMPASSE, IT IS CURIOUS THAT THEIR SUBJECTS, BY COMPARISON TO THE PRESENT DATA, DIDN'T SEEM TO HAVE IMPASSES.


Your comment:
In addition, note that when S&B provided “hints” to subjects who either reached impasse or timed out, the hints did not seem to help them to solve the problems. So it is not clear what S&B’s subjects were doing.

Our Response:
See Table 2 which shows the percentages of correct and incorrect trials with and without hint.
Typicall, the participants correctly solved 38.8% (mean, SD = 9.6) of all compound remote associate task trials without a hint and gave an incorrect solution to 11.2% (SD = 8.0). After hint presentation, 39.4% (SD = 8.1) of all solutions were correct and 4.7% (SD = 6.1) of the hints led to an incorrect solution. 55.9% (SD = 11.9) of all post-hint trials could not be solved within the provided time limit. These are clearly stated in the beginning of Results Section.

YES, THE HINTS INCREASED CORRECT SOLUTIONS BY ONLY .6%. THE HINTS SEEMED TO ONLY REDUCE INCORRECT RESPONSES, NOT HELP THEM TO COME WITH CORRECT ONES. THE HINTS EFFECTIVE GOT SUBJECTS TO INCREASE THEIR CONSERVATISM IN GETTING THEM TO WITHOLD RESPONSES THAT ARE INCORRECT. BUT WHY DIDN'T THE HINTS HELP THEM TO GET CORRECT SOLUTIONS?

Your comment:
c) S&B used an EEG analysis technique known as Event-Related Synchronization/Desynchronization (ERS/ERD) in which EEG power for each frequency band and each time sample is computed as a ratio using power during a baseline interval. In this case, the baseline interval was the half second before each problem was presented (during a warning interval). From the results of Kounios et al. (Psychological Science, 2006), it is known that brain activity during such a pre-problem interval actually predicts later brain activity during the solving of the problem; in some cases, it actually anticipates later brain activity during problem solving. And this pre-problem brain activity predicts what type of problem-solving strategy (and associated brain activity) that the subject will have during problem solving. So, by using the pre-problem interval as a baseline, S&B may have wiped out many of the strongest and most interesting brain activations while artificially creating others. In contrast, Jung-Beeman et al. (2004) used absolute EEG power without such baselining. Jung-Beeman’s EEG finding of a burst of gamma band activity over the right anterior temporal lobe associated with sudden insight was replicated by them with a very different measure – fMRI. S&B did not find this right temporal gamma burst, even though they used the same compound remote associate problems as Jung-Beeman et al. with a very similar procedure. The fact the S&B used a very similar procedure and a problematic analysis technique and did not replicate the basic Jung-Beeman et al. finding (which was itself replicated with fMRI) casts doubt on the accuracy of all of S&B’s EEG findings. The authors could fix this problem by (a) collecting more data and (b) by reanalyzing their data using the wavelet method used by Jung-Beeman et al. instead of using the ERS/D method (with it’s inappropriate prestimulus baseline).

Our Response:
First, what we believed important is the change in spectral power not the absolute power – on prior inspection we did find wide variations in terms of raw power over subjects. The normalized (w.r.to baseline (or resting activity)) power is a standard measure of brain oscillations and also offers easier interpretation than the raw power values. Further, baseline corrected power offers an excellent technical advantage over raw power: Instantaneous raw power estimates are nonlinear function of the data, usually raw power follows chi-square distributions if raw EEG amplitudes are normally distributed. So one has to be really cautious while using standard parametric statistics on the raw power values. However if one corrects the raw power wrt the baseline power, the difference instantaneous power values are normally distributed. This would subsequently justify the application of parametric statistics. See the paper by Kiebel at el. Human Brain Mapping, 2005 for a nice treatment on this issue.

MY PREVIOUS COMMENT STANDS. IN THIS CASE, LOOKING AT THE CHANGE IN ACTIVITY FROM PRESTIMULUS BASELINE DISTORTS THE RESULTS BECAUSE THE PRESTIMULUS ACTIVITY HAS A SYSTEMATIC RELATIONSHIP TO POST-STIMULUS ACTIVITY. THE FACT THAT JUNG-BEEMAN'S EEG RESULTS WERE REPLICATED WITH FMRI, BUT THAT THE S&B PAPER COULDN'T REPLICATE THESE ALREADY-REPLICATED FINDINGS SUPPORTS THE NOTION THAT BASELINING WITH PRESTIMULUS ACTIVITY WARPS THE RESULTS.

On complex demodulation or wavelet:
Different analysis such as Morlet wavelet, Hilbert transform based complex demodulation, or STFT are largely equivalent, because they all conform to a linear convolution with some filtering kernel (see, Kiebel eta l, Human Brain Mapping, 2005; Bruns, J Neurosci Meth, 2004). This Hilbert Transform based complex demodulation technique offers excellent temporal resolution, so seems very appropriate for studying time varying changes in brain oscillations.

The aims of this study were to study the spatiotemporal neural dynamics during different stages of insightful problem solving. With out causing any offence to the earlier seminal studies on neural correlates of insight, we strongly felt that a binary division based on subjectively reported insight and noninsight tell us little about the actual cognitive processes involved in solving these problems (Weisberg, personal communication). The researchers spent too much of time on these Aha! phenomenon but didn’t pay proper attention to the neurocognitive mechanisms associated with different characteristic features of insightful problem solving.

We did mention a few reasons why our results are a bit different from Jung-Beeman et al.. See the paper for details.

MY PREVIOUS COMMENTS STAND.

THANK YOU FOR YOUR REPLIES.
JK