Wednesday, September 30, 2015

The electoral calculation: Methodological note – Le Journal de Montreal

The electoral calculation is an interactive site that provides daily estimates of the vote and the counting of seats for the elections on 19 October, from a projection model of survey results at each of the 338 federal ridings.

This note outlines the methodology we used to design the projection instrument of survey results entitled “The electoral calculation” to be published by the day the election on the Website Journal of Montreal and Journal de Québec and, on occasion, in the pages of these newspapers. I received for the realization of this site valuable assistance from Matthew Pfeffer, statistician-programmer, who performed statistical analyzes and simulations, Alexandre Rousseau, the Journal de Montreal, who designed and programmed the interactive website and Michel Dumais, Director of the Journal of Reviews section.

The purpose of this section is multiple. First, it aims to present all publicly available survey results of the federal election being clear and easily accessible way, in addition to complete this presentation by an estimate of the impact of popular support level each party in each region on the distribution of seats in the House of Commons.

At all stages of this journey, we favored four basic scientific principles. First, our approach emphasizes transparency . Each component of our approach is set out explicitly and encrypted, which in principle should allow anyone who wants to put the time and effort to replicate our results. This also will allow those who wish to criticize our process of knowing on what basis do so. Then, to the extent possible, we have sought to base the main components of our data conversion model surveys of seats verifiable empirical basis rather than arbitrary approximations. In addition, we sought to account for uncertainty surrounding the extent of the main components of our model in the presentation of our results. Finally, taking advantage of the opportunities we offer various platforms for graphic presentation, textual and numeric information, we have sought to favor a Clear communication, accessible and integrates of the results of our analyzes. Of course we are open to feedback from readers and users of the Website Journal , which will help us improve the various components.

A poll is not a prediction

The estimate of the popular vote and the projection of the number of seats that could be won by each party in an electoral system like Canada’s are delicate exercise that must be addressed with prudence and modesty. The first thing to remember for the reader is that each survey is not a prediction of the final result on election day but a snapshot of the state of opinion at the time of taking the survey. Obviously, the longer the survey is closer to the election day, the better the results tend to match those that will materialize in due course, but polls must always be considered as indicators of the state of opinion at the time where they are kept and not as predictions of the results of an upcoming election.

Several surveys are better than

While it rarely results of a single survey correspond in every respect to the actual voting intentions, most experts agree that the average of the polls conducted by reputable firms more likely to be closer to the true distribution Voting intentions one isolated survey. This is why the data underlying our projections from the most recent public opinion polls. The method of collection and compilation of national surveys and regional or provincial data is one that was developed by Claire Durand, a professor of sociology at the University of Montreal and recognized specialist in survey methodology, which shared some Data has assisted us in setting up our database.

Whenever one or several new polls are made public, a new estimate is made which takes as reference point on the last day or the field of the most recently completed surveys. During the election campaign, all public surveys conducted by recognized firms whose field ends on a specified date or during the previous six days form the basis of an aggregate estimate. Before the campaign, the polls are less frequent and therefore are aggregated based on a two-week period. When a set of surveys consists, the results of each are weighted within each province or region according to the temporal proximity of the survey and the actual sample size in this region. Thus, if two surveys were conducted on the same day, their weightings will be proportional to the size of their samples.

A second level of weighting within each period takes account of the proximity of Surveys in time. During the campaign, the period of aggregation is seven days. Compared to a weighting equal size survey conducted on the last day of the period, a survey conducted a day earlier is weighted 6/7, two days earlier at 5/7, and so on up 1/7 for the first day of the period of one week. Before the start of the campaign, we selected an aggregation period of two weeks and polls are weighted in the same way there (1/14 less for each day of decline).

Changes in the electoral map: districts of 308-338

Before we get to the projection of the survey results at the circonscrptions, it must be emphasized that the election of counties 2015 are not exactly the same as in 2011. Every ten years, following the Census Canadians, an independent commission recounts the electoral map to reflect as adequately as possible the new distribution of the population and provide adequate representatives provinces whose population has increased most rapidly. The 2011 map were 308 counties and that of 2015 by 338. Quebec has won three districts (in the Montreal area), Ontario has gained 15, Alberta and British Columbia each six (see here ). In other provinces, minor changes were also made to reflect population shifts. Elections Canada provides a database for determining how many votes were received by each party in each polling station, and redistribution of these polls in the new counties. It is from these results transposed to the new board we calculate the results in each constituency in 2011 used below. The political effects of these changes in the electoral map were analyzed in a previous post on this blog: “The card (election) mistress Harper”

How to polls seat.?

The projection method of survey results in terms of number of seats is based on the projection within each district variations in the levels of support that can be measured by sampling at higher levels (region, province or group of provinces). We retain as comparing unit provinces or groups of provinces as used by recognized pollsters (Atlantic Provinces, Quebec, Ontario, Prairies, Alberta, British Columbia). By definition, a proportion of voting intentions measured in a province represents the average of all districts of the province of the vote. The same is true for a change in voting intentions. As we know the actual vote in the last election for each party in each county and in every province and as we can estimate by sampling the vote for each party in each province, we can estimate the proportion of votes for each party in each county Assuming that the variation in this measure between the last election and the poll is more or less uniform throughout all counties of a province. It is also possible that certain factors particular candidates to ensure that they get more or less support than the average of their party in the province, and we consider this possibility as well.

therefore seek to establish the parameters that link these measures according to the following functions (difference or ratio):

Difference: VoteEst2015 ijk = f [(Vote2011 ijk + (Sond2015 jk - Vote2011 jk )) + Candidate ik ]

Ratio: VoteEst2015 ijk = f [Vote2011 ijk * (Sond2015 jk / Vote2011 jk ) + Candidate ik ]

Where:

VoteEst2015 ijk = Percentage of voting estimated when surveyed in each county (i) for each party (k)

Vote2011 ijk = Percentage of actual vote in the 2011 election in each county (i) for each party (k).

Sond2015 jk = Percentage of survey felt voting intentions for the k party in the province j.

Vote2011 jk = Percentage real vote in 2011 for the party k in province j.

Candidate ik = Features exclusive to the candidate of the party in the county i k (we used three variables coded 1 if the attribute is present and zero s it is absent, either: he is the candidate the incumbent? The candidate is there or has there been a minister? The candidate is he leading his party?)

To set this function on solid empirical basis, we used a database including all relevant variables observable for all the candidates of the five major parties in all counties for the previous three elections from 2006 to 2011. This is firstly to compare the model based on the ratio and the one based on the difference, and then assess whether the three variables used for candidates affect the actual data

Difference. Vote ijkt = a + b 1 * (Vote ijkt-1 + (Division jkt – Vote jkt-1 )) + b 2 (incumbent) + b 3 ( Minister) + b 4 (leader)

Ratio: Vote ijkt = a + b 1 * (Vote ijkt-1 * (Vote jkt / Vote jkt-1 )) + b 2 (incumbent) + b 3 (Minister) + b 4 (leader)

Where:

Vote ijkt = Percent real vote in a given election (time t) from 2006 to 2011 in each county (i) for each party (k).

Vote ijkt-1 = Voting Percentage real in the previous election (time t-1) in each county (i) for each party (k).

Vote jkt = Percentage of voting for any given election ( time t) in the province j for the party k.

Vote jkt = Percentage of voting for any given election (time t) in the province j for the party k.

a is evaluated constantly.

b1, b2, b3 and b4 are valued coefficients.

The model that best fits the actual data is the one that is based on differences in which the “Minister” variable coefficients and “leader” are not statistically significant

The model used is therefore this:.

Voteijkt = + 0.8 ± 0.2 (0.947 ± 0.010) * (voteijkt-1 + (votejkt – votejkt-1)) + (2.2 ± 0.5) * Incumbent

NOTE:. The estimated coefficients have margins of error that we consider in the conversion equation. Summary statistics of this equation are: R = 0.971; R2 = 0.943; n = 3727

So if we transpose this equation that we used for the estimation, we get:.

VoteEst2015 ijk = 0 8 ± 0.2 + (0.947 ± 0.010) * (Vote2011 ijk + (Sond2015 jk – Vote2011 jk )) + (2, 2 ± 0.5) * Candidate ik .

The next step is to introduce a margin of error around the measurement of voting intentions for each measured by survey party in each province. We do not use the margin provided by the house surveys, because it is based on the full sample and uniform proportion we need to remember a different error based on sample sizes and different proportions (NB: within the margin of error, most of the values ​​obtained on a large number of repetitions are concentrated around the center according to a so-called normal distribution).

VoteEst2015 ijk = 0.8 ± 0.2 + (0.947 ± 0.010) * (Vote2011 ijk + (Sond2015 jk ± (1.96 * e Sond2015jk ) – Vote2011 jk )) + (2.2 ± 0.5) * Candidate ik ; where: e Sond2015jk = (p (1-p)) / √n

However, there are some exceptions to this equation. Applying this model to some recent results, we found that the order of parties within each county allows relatively stable predictions of the winner between parties that have a chance of winning, but introduces heavy distortions among small parties and exaggerates the effect of small differences in their case. In cases where the measurement of the vote in 2011 is less than two times the difference between the provincial vote between the 2011 election and the recent survey, we substitute the extent that the ratio of the difference:

If: Vote2011 ijk & lt; (2 * (Sond2015 jk – Vote2011 jk )) then:

VoteEst2015 ijk = 0.8 ± 0, 2 + (0.947 ± 0.010) * (Vote2011 ijk * (Sond2015 jk ± (1.96 * e Sond2015jk ) / Vote2011 jk )) + (2.2 ± 0.5) * Candidate ik ; where: E Sond2015jk = (p (1-p)) / √n

For the analysis of an individual sample, the sample size is the actual number interviews; to an aggregation of polls, we use not the cumulative number but the average number of interviews, avoiding too narrow the margin of error.

The next step is to produce 10,000 simulated estimates VoteEst2015 variable for each party in each county (according to a procedure called simulation Monte Carlo), allowing us to obtain a range of possible outcomes according to a very large number of combinations of possible values ​​of the estimated coefficients or differences within the margin of error of the survey data.

The application of our formula gives percentages for all five major parties in each county that reflect their relative positions. However, we must adjust the results for the percentages of the five major parties is equal to 100 (other parties, which have tiny chance of winning, are excluded from our analysis).

compilation of 10,000 simulation results in each county allows us to estimate two important data for each party: average percentage , which corresponds to our estimate of the vote for the party, and the proportion of simulations where each party comes first, which corresponds to its odds . It is important to note that this estimate of odds is not in itself a prediction. This estimate is valid only for the period of the survey depends on the assumption that the movement of votes between the previous election and the time of the survey is more or less uniform between the counties of a province or a region. What must be understood is that over two percentages of voting intentions are neighbors, the more likely, given the errors inherent in our measurements, the model has incorrectly identified the winner. When the odds of winning are less than the leader of two-thirds (67%), one can truly say the race is too tight to identify a clear leader. Between two-thirds and 95% level, we can identify a leader but the race remains tight. Between 95% and 100%, we can speak of a clear leader but changes in fair trend beyond the margin of error could alter the situation. When 100% of the 10,000 simulations are going in the same direction, we can speak of a clear advance, which can prove insurmountable in many cases.

Note the constituency polls

In a number of constituencies, polls can be held during the election campaign to assess more specifically the voting intentions of voters in that particular constituency. These surveys provide valuable unindice the state of opinion in these ridings and we take into account the following way. For the day of the holding of a constituency survey, our estimate is the result of a 50/50 weighting of the survey results (taking into account the margin of error in simulations) and voting intentions estimated by our projection model. This is to avoid giving too much weight to surveys whose quality is sometimes questioned by some experts. For subsequent dates after holding a constituency survey, we change voting intentions for each party depending on the evolution of their support throughout the province between the day of the survey and the following days. A comprehensive list of constituency polls is available here.

Some exceptions to the polls Nanos

The Nanos Research firm produces good quality polls are regularly broadcast on CTV and the Globe and Mail. Since early September, polls of this firm are published daily due to sampling method “rolling” where a third of the sample is renewed each day for a total of 1,200 respondents, which allows the firm deliver new results every day. However, as the sample does in fact only renewed every three days, we chose to include surveys of this firm every three days. Moreover, unlike other pollsters, Nanos does not distinguish between Alberta results from those of the two other prairie provinces. Therefore, we can not use their survey results for the three provinces.

An instrument and formats changing

In the course of the campaign, this post will be updated occasionally to reflect changes in our procedure analysis or our ways to view data. Feel free to send us your suggestions for us to improve the presentation of data.

LikeTweet

No comments:

Post a Comment