As most readers will know, this Thursday (18th September 2014), residents of Scotland will vote in a referendum to decide whether to become independent of the UK. While the No campaign had previously maintained a reasonably healthy lead against Yes, in recent weeks the race has tightened considerably, on the basis of polls of voting intentions. In particular, two polls have now shown larger proportions saying they will vote Yes compared to the proportions voting No. With a flurry of polls conducted in the last week, each with slightly different results, I decided to perform a simple meta-analysis of the poll results, to estimate the current state of play, based on the available evidence.
The data
The meta-analysis includes data from the last six polls carried out as of 14th September, which all took place in the last week, as given on the excellent website whatscotlandthinks.org. From the whatscotlandthinks.org website the percentages for yes and no, and the total number polled, were used to calculate the number stating an intention to vote yes and the number polled excluding the undecided voters, as shown below.
[table]
Paper,Poll company,Poll start,Poll end,# exc. undecideds,# Yes
The Times and The Sun,YouGov,09/09/2014,11/09/2014,1205,571
The Guardian,ICM,09/09/2014,11/09/2014,820,400
The Observer,Opinium,09/09/2014,11/09/2014,992,475
Better Together,Survation,10/09/2014,12/09/2014,844,389
The Sunday Telegraph,ICM,10/09/2014,12/09/2014,642,345
The Sunday Times,Panelbase,10/09/2014,14/09/2014,943,466
[/table]
The analysis
The analysis was conducted with R using the metaprop function in the meta package. This function combines the number of yes’s and numbers polled (excluding undecideds) to give an overall estimate of the mean proportion of Yes voters. Two analyses are performed, one a so called fixed effect meta-analysis, and the second a random-effects meta-analysis. The former assumes that the polls are all estimating the same underlying quantity, but differ only due to sampling variation. The latter assumes that each poll is estimating a somewhat different underlying proportion, with these underlying proportions differing for example due to differences in designs and sample selection methodologies between the different polls.
The results
The so called forest plot below shows the results of the meta-analysis. The overall estimated proportion voting yes is (under a random-effects model) 48.72%, with a 95% confidence interval from 46.82% to 50.62%. Thus on the basis only of these six polls (and see the caveats below), the estimated proportion who will vote yes is a fraction under 50%, and the confidence interval shows the data are consistent with the ‘true’ proportion being fractionally above 50%.
An interesting observation from the forest plot is that the 95% confidence intervals from the different studies overlap with each substantially, illustrating the point that given the (relatively) small numbers polled in each, small differences in results could be purely due to sampling error.
Interpretation
On the basis of this simple analysis here of the data from the six most recent polls, and ignoring those who were undecided, the vote really does look like it’s hanging on a knife edge, with a slight advantage still for the No side. Yet more motivation, if it were needed, for the teams on both sides of the argument to campaign hard in the final few days.
Caveats
The simple analysis presented here likely has many flaws. I’m not a pollster, and I’m not even a survey statistician (I’m a biostatistican!).
The meta-analysis acts as if random samples were taken, and only of people who are willing to state which they will vote. In truth the poll samples are constructed using more sophisticated survey design techniques, and probably didn’t ought to be analysed (as I have done) as if they are simple random samples.
The meta-analysis also completely ignores the still sizable proportion of people polled who currently don’t know which way they will vote. Using results based only on those who have said which way they will vote corresponds to implicitly assuming that the current undecided voters will decide to vote in the same proportions as those who have already made up their mind. This assumption is likely false – it may well be that among those voters currently undecided a larger or smaller proportion will vote yes compared to those currently stating their intentions.
R output
For those who are interested, shown below is the R output from the meta-analysis, giving proportions from each poll, the 95% CI for each, the weights assigned to each poll (in fixed and random effects analyses), the estimated I^2 (proportion of variability attributable to true heterogeneity), the estimate of between poll variance, and a test for whether this differs from zero. Based on I^2, around half of the observed variation is attributed to genuine between poll heterogeneity. The test for heterogeneity is not quite statistically significant.
proportion 95%-CI %W(fixed) %W(random) The Times and The Sun 0.4739 [0.4453; 0.5025] 22.12 19.16 The Guardian 0.4878 [0.4531; 0.5226] 15.09 15.99 The Observer 0.4788 [0.4473; 0.5104] 18.23 17.57 Better Together 0.4609 [0.4269; 0.4952] 15.44 16.19 The Sunday Telegraph 0.5374 [0.4979; 0.5765] 11.75 13.94 The Sunday Times 0.4942 [0.4618; 0.5266] 17.36 17.16 Number of studies combined: k=6 proportion 95%-CI z p.value Fixed effect model 0.4859 [0.4726; 0.4991] NA -- Random effects model 0.4872 [0.4682; 0.5062] NA -- Quantifying heterogeneity: tau^2 = 0.0045; H = 1.42 [1; 2.25]; I^2 = 50.3% [0%; 80.3%] Test of heterogeneity: Q d.f. p.value 10.07 5 0.0734 Details on meta-analytical method: - Inverse variance method - DerSimonian-Laird estimator for tau^2 - Logit transformation - Clopper-Pearson confidence interval for individual studies