As CA readers are aware, Stephan Lewandowsky of the University of Western Australia recently published an article relying on fraudulent responses at stridently anti-skeptic blogs to yield fake results.
In addition, it turns out that Lewandowsky misrepresented explained variances from principal components as explained variances from factor analysis, a very minor peccadillo in comparison. In a recent post, I observed inconsistencies resulting from this misdescription, but was then unable to diagnose precisely what Lewandowsky had done. In today’s post, I’ll establish this point.
Rather than conceding the problems of his reliance on fake/fraudulent data and thanking his critics for enabling him to withdraw the paper, Lewandowsky has instead doubled down by not merely pressing forward with publication of results relying on fake data, but attempting to “manufacture doubt” about the validity of criticisms, including his most recent diatribe – to which I respond today.
In a post several days ago, I temporarily considered other issues in the Lewandowsky article beyond the reliance on fake responses, reporting on my then progress in trying to replicate results – not easy since his article omitted relevant methodological information. Separate from this, Roman Mureika and I (but especially Roman) have made further progress in trying to replicate the SEM steps – more on this later.
I reported a puzzle about explained variance results as reported in Lewandowsky’s article – results that could not be replicated using a standard factor analysis algorithm. Roman Mureika also tried to figure out the discrepancy without success. I pointed out that Lewandowsky’s factor analysis did not seem to have much effect on the downstream results where the real problems lay.
The reason why we were unable to replicate Lewandowsky’s explained variance from factor analysis was that his explained variance results were not from factor analysis, but from the different (though related) technique of principal components, a technique very familiar to CA readers.
The clue to reverse engineering this particular Lewandowsky misrepresentation came from a passim comment in Lewandowsky’s blog in which he stated:
Applied to the five “climate science” items, the first factor had an eigenvalue of 4.3, representing 86% of the variance. The second factor had an eigenvalue of only .30, representing a mere 6% of the variance. Factors are ordered by their eigenvalues, so all further factors represent even less variance.
Eigenvalues are a term that arise from singular value (“eigen”) decomposition SVD. As an experiment, I did a simple SVD of the correlation matrix – the first step in principal components, a technique used in principal components and was immediately able to replicate this and other Lewandowsky results, as detailed below. Lewandowsky’s explained variance did not come from the factors arising from factor analysis, but from the eigenvectors arising from principal components. No wonder that we couldn’t replicate his explained variances.
But instead of conceding these results, Lewandowsky fabricated an issue regarding the number of retained eigenvectors in this analysis, a point that I had not taken issue and which did not affect the criticism, as I’ll detail.
Factor Analysis of “Free Market” Questions
Lewandowsky’s first example was his factor analysis of the free market questions, where he reported results using two factors as follows:
For free-market items, a single factor comprising 5 items (all but FMNotEnvQual) accounted for 56.5% of the variance; the remaining item loaded on a second factor (17.7% variance) by itself and was therefore eliminated.
In my previous post, I described my emulation, in which I confirmed that factor analysis with two factors did indeed result in loading of the second factor on the second question, but, as a puzzle, observed that I had been unable to replicate the reported variances:
I tested this using the R function factanal, the loading report from which is shown below. The loading on the first factor so calculated (48.7%) is somewhat lower than Lewandowsky’s report (56.6%). In addition, while the second factor is indeed dominated by the second question, there is also some loading from the third question onto this factor. Such a simple first step should be directly replicable. The calculation here is close, but only close.
I’m not sure that this (or the other factor analysis calculations) matter in the end. My interpretation of Lewandowsky’s sketchy description is that the “latent variable” used downstream is a weighted average of the 5 retained series (with relatively even weights – see his Table 2.)pc=factanal(lew[,1:6],factors=2) pc$loadings .... Factor1 Factor2 SS loadings 2.924 1.095 Proportion Var 0.487 0.183 Cumulative Var 0.487 0.670
However, if one does the first step of a (correlation) principal components analysis – a SVD on the correlation matrix – one does get the results claimed by Lewandowsky as shown in the following code:
R=cor(lewdat[,1:6]) pc=svd(R) pc$d/sum(pc$d) #proportion of explained variance # 0.5653 0.1774 0.0873 0.0735 0.0570 0.0395 pc$d # 3.392 1.065 0.524 0.441 0.342 0.237
In this example, I had done the calculation with the same number of retained factors as Lewandowsky but had got a different result. In his blog post, Lewandowsky ignored this example, focusing instead on a different example where I had reported results using a different number of factors (though this was not the only experiment that I had done.) Lewandowsky pretended that it was the difference in number of retained factors that explained the difference, but this was untrue – even though he knew or ought to have known that the discrepancies arose in this case where there was no difference in retained factors. Why did Lewandowsky conceal this in his blog post?
As noted in the earlier post, this discrepancy didn’t seem that important for the downstream analysis.
Factor Analysis of the Conspiracy Questions
Lewandowsky’s incorrect reporting of explained variance for factor analysis of the conspiracy questions had an identical explanation. Lewandowsky reported his results as follows:
For conspiracist ideation, two factors were identified that accounted for 42.0 and 9.6% of the variance, respectively, with the items involving space aliens (CYArea51 and CYRoswell ) loading on the second factor and the remaining 10 on the first one (CYAIDS and CYClimChange were not considered.
Again, I had been unable to replicate these results. I made no attempt to replicate Lewandowsky’s procedure for deciding how many factors should be used, an aspect of the algorithm not discussed in the article itself.
Once again, the explained variance reported by Lewandowsky can be obtained SVD on the correlation matrix (principal components) i.e. loading on the first two eigenvectors, not the first two factors.
R=cor(lewdat[,c(13:15,17:24,26)]) pc=svd(R); round(pc$d/sum(pc$d),3) #  0.420 0.096 0.071 0.066 0.061 0.058 0.049 0.047 0.044 0.039 0.032 0.017 round(pc$d,3) #  5.036 1.154 0.848 0.793 0.732 0.692 0.593 0.568 0.533 0.463 0.383 0.205
Factor Analysis on CO2 Questions
On the factor analysis of the five CO2 questions, Lewandowsky said only:
The 5 climate change items (including CauseCO2) loaded on a common factor that explained 86% of the variance; all were retained.
As with the other two factor analyses, I was unable to replicate Lewandowsky’s reported explained variance. Once again, it can be shown that Lewandowsky’s explained variance claim comes not from the first factor, but from the first eigenvector from principal components. Factor analysis yields an explained variance of 82.7%. By coincidence, factor analysis using two factors yields explained variance of 86%.
Puzzled by the unexplained inconsistencies, I speculated that Lewandowsky might have inadvertently reported the explained variance from two factors, rather than the explained variance from one factor. ( I certainly wasnt taking a position on the merit of either decision; I was simply trying to figure out what Lewandowsky had done. I had asked at Lewandowsky’s blog that he provide source code to clarify various questions, but he did not respond.) I reported this speculation as follows:
I wasn’t able to replicate Lewandowsky’s claim at all. I got explained variance of 43.5% in the first factor(versus Lewandowky’s 86%). I notice that the explained variance for two factors was 86%: maybe Lewandowsky got mixed up between one and two factors. If so, would such an error “matter”? In Team-world, we’ve seen that even using contaminated data upside down is held not to “matter”; perhaps the same holds in Lew-world, where we’ve already seen that use of fake and even fraudulent data is held not to “matter”.
As it turns out, I was right to question Lewandowsky’s explained variance claims, but my speculation as to the source of the 86% value was incorrect: as noted above, Lewandowsky had misrepresented explained variance from principal components eigenvectors as explained variance from factor analysis.
Lewandowsky’s Manufactured Doubt
At his blog, Lewandowsky fulminated:
How could Mr. McIntyre fail to reproduce our EFA?
Simple: In contravention of normal practice, he forced the analysis to extract two factors. This is obvious in his R command line:
In this and all other EFAs posted on Mr. McIntyre’s blog, the number of factors to be extracted was chosen by fiat and without justification.
In Lewandowsky’s FM factor analyses, it was Lewandowsky’s decision to use two factors. Lewandowsky did not describe his methodology for deciding on two factors and, in my blog post, I made no attempt to guess. The reason why I was unable to replicate his results for FM and Conspiracy questions (or CO2 questions) had nothing to do with the number of retained factors. It was to do with Lewandowsky’s reporting of explained variance using principal components.
It’s hard to believe that Lewandowsky was unaware of this at the time that he wrote his blog post. If so, his suggestion that I had “rigged” my reanalysis to locate discrepancies is contemptible even by Lewandowsky’s standards:
Or else, he intentionally rigged his re-“analysis” so that it deviated from our EFA’s in the hope that no one would see through his manufacture of doubt.
The reason why I was unable to replicate these results was that Lewandowsky had presented explained variance from principal components – not from factor analysis.
Retained Eigenvectors and Factors
The decision on how many eigenvectors/principal components to retain has been a wheelhouse issue at Climate Audit.
Steig (in Steig et al 2009) had misunderstood the commentary of North et al 1982 on principal components and had additionally and incorrectly reified Chladni patterns as physical patterns. We observed that Steig’s retention of only three eigenvectors had incorrectly spread observed warming in the Antarctic peninsula onto the rest of the continent.
Retained principal components also (famously) arose in discussion of MBH, where Mann and others have created massive disinformation. Mann had notoriously used a highly biased principal components algorithm (not that a centered principal components method was necssarily correct either.) Using a centered principal components method, the bristlecone hockey stick, said by Mann to be the “dominant pattern of variance”, was demoted to a lower order PC (the PC4). Why a PC4 of a regional network should be a unique and magic thermometer for the entire world was never explained. In the NAS panel report, they recommended that bristlecones be avoided in reconstructions (regardless of which PC.) One would have though that this would have put a silver bullet in the MBH reconstruction. However, the climate science community has proved unequal to the small task of rejecting Mann’s use of contaminated upside-down Tiljander data. All these issues linger on.
In the wake of the original error in Mann’s PC algorithm, Mann proposed (ex post) a retention policy that went deep enough to include the bristlecones. There was no mention of this method in the original paper. Nor did this algorithm yield the pattern of retained eigenvectors in other networks. Nor has Mann ever explained how he calculated the number of retained eigenvectors. Instead, Mann and his associates merely threw mud, seemingly to the great pleasure of the climate community.
Like Mann, Lewandowsky did not describe his retention policy in his original article. Nor was it even described in his blog post. Instead, Lewandowsky curiously described the effect of a common retention as “illustrative”:
One core aspect of EFA is that the researcher must decide on the number of factors to be extracted from a covariance matrix. There are several well-established criteria that guide this selection. In the case of our data, all acknowledged criteria yield the same conslusions.
For illustrative purposes we focus on the simplest and most straightforward criterion, which states one should extract factors with an eigenvalue > 1. (If you don’t know what an eigenvalue is, that’s not a problem—all you need to know is that this quantity should be >1 for a factor to be extracted). The reason is that factors with eigenvalues < 1 represent less variance than a single variable, which negates the entire purpose of EFA, namely to represent the most important dimensions of variation in the data in an economical way.
Lewandowsky observed that application of the eigenvalue-less-than-1 criterion to the CO2 questions resulted in the retention of 1 factor. The second eigenvalue of the FM questions was greater than 1, but in that case, Lewandowsky eliminated the question loading on the second factor.
Again, at this point, I’m not taking exception to any particular criterion. Nor did I argue that two factors should be retained for the CO2 questions. I was merely trying to guess how Lewandowsky had obtained his explained variance results.
Lewandowsky ended his blog post with the following accusation:
There are two explanations for this obvious flaw in Mr. McIntyre’s re-“analysis”. Either he made a beginner’s mistake, in which case he should stop posing as an expert in statistics and take a refresher of Multivariate Analysis 101. Or else, he intentionally rigged his re-“analysis” so that it deviated from our EFA’s in the hope that no one would see through his manufacture of doubt.
As noted above, the reason why I was unable to replicate Lewandowsky’s explained variance claims was because they were incorrect – they came from the eigenvectors (from principal components) and not the factors (from factor analysis). The person who appears to be in need of Multivariate 101 is Lewandowsky himself.
Lewandowsky’s attempt to divert attention to the number of retained factors was a fabricated diversion on several counts. I made no attempt to emulate lewandowsky’s unreported retention procedure. I used two factors to analyse the FM question not because of a “fiat” on my part, but because Lewandowsky himself had used that number. Nor did I propose the use of two factors for the third (CO2) analysis: I noticed that 86% explained variance arose with two factors. As matters turned out, Lewandowsky had made a different error – one that I had not guessed in my previous post, but one that pervaded his other factor analyses as well.
Lewandowsky repugnantly alleged that I might have “intentionally rigged his re-“analysis” so that it deviated from our EFA’s in the hope that no one would see through his manufacture of doubt.”
Lewandowsky’s results are bogus because of his reliance on fake and fraudulent data, not because of replication issues in his factor analysis. Nor do I believe that there should be any “doubt” on this point. In my opinion, the evidence is clearcut: Lewandowsky used fake responses from respondents at stridently anti-skeptic blogs who fraudulently passed themselves off as skeptics the seemingly credulous Lewandowsky.
That Lewandowsky additionally misrepresented explained variances from principal components as explained variances from factor analysis seems a very minor peccadillo in comparison (as I noted at the time.) On this last point, to borrow Lewandowsky’s words, there seem to be two alternatives. Either Lewandowsky “made a beginner’s mistake, in which case he should stop posing as an expert in statistics and take a refresher of Multivariate Analysis 101″.
Or else Lewandowsky, cognizant of how thoroughly compromised his results are by fake/fraudulent data, rather than thanking his critics for spotting defects and withdrawing his study, has decided to double down by trying to manufacture doubt about criticism of the degree to which his data and results have been thoroughly compromised in the “hope that no one would see through his manufacture of doubt.”