Wednesday, October 29, 2008

Polling and Data


Everyone's all a buzz with the general election coming up. We see cool touch screen maps on CNN and tons of polling data all around us. I just heard that one candidate took a stance on the whole Georgia/Florida water feud to pick up some votes from Floridian oyster farmers. What?! This is really crazy to me. How does the campaign even know these people exist?

My issue isn't with oyster farmers. I question these polls. As a matter of fact, I am a huge huge suporter of informed decsion making, and guess what, data is information. Let me contextualize my point of view.

For years I worked with non-profits. Non-profits are notorious for being disorganized and mismanaged*. But the misuse of data and innacurate conclusions from poor quality information collection struck me from the begining. I happened to be working for an awesome small organization which has now gone national. Fortunately they were cutting edge: 1) They had a database to track all of their "clients" 2) The database tracked a lot of different types of data 3) They understood the value of using this data for making decisions for and about their "clients" 4) They were willing to learn. I was fortunate enough to be put in charge of the database and I was the lead number cruncher. Interestingly enough I spent most of my time educating my co-workers in how to accurately communicating our results. It was easy to say that 80% of our clients improved when in reality 80% of clients received the instruction and tools which we believed led to improvement. Big difference.

But I digress. As I got deeper into the data analysis and our number crunching became more complex I stumbled upon an anecdote from a college prof. She warned of the dangers of using statistical software to do data anlysis. In her day she had to write the code for the stats software and then crunch the numbers. It took her years to get the training to do this and guess what, she knew her stuff. Today data analysis software easy to come across and large data sets are often offered for free. This means that more people are crunching numbers and coming to conclusions. Today's statistical conclusions, in essence, are cheaper and this has effected the quality of the conclusions. Understand this, the people crunching the numbers today don't necessarily have the same training as the people crunching numbers years ago.

So, I pose this question. Do you trust poll results? Everyone has their unique methodology. Everyone has their biases. Is it possible for polls to be used as a political tool to sway the public? I have seen organizations want a good result so much that they unknowlingly distort conclusions. What is your opinion? 

* This isn't a statement about all nonprofits. There are many organizations that are awesome businesses and I think some publically traded companies should take a hint from some of these smaller organizations.

2 comments:

Daniel said...

Sure. The bigger questions are: To what extent do I trust them? Do I understand what they're telling me? Can I understand the differences between samples and methods and results?

There are statistics freaks out there who go to great lengths to dissect every polling result ever collected, and thank god for them.

But there's a saying about how the precision of our analysis can exceed the accuracy of our method. No matter what the margin of error within an individual sample, we're making a mistake if we look at the predictive value of a snapshot and consider it absolute. Particularly in politics.

That said, when you've got lots and lots of polls indicating generally the same result, you can generally state with great confidence that the general outcome is likely to be one way or the other. And when there are lots of polls and they're very close but they return different outcomes, then the answer isn't "a narrow lead for X," but "a toss-up."

And on the subject of how "you can't trust the polls," one big statement: Polls may be flawed, but once you account for those flaws, statistics can give you some idea of how unlikely a particular outcome is.

This is hugely important, particularly when you consider that the odds of exit polls being so wrong, all in favor of Bush, in three battleground states (Ohio, Pennsylvania and Florida) were 1 in 660,000 chances. This is the mathematical smoking gun, and so it is in the interest of the cheaters to make sure that citizens adopt a jaded, "oh you can't trust polls" attitude.

Anonymous said...

There are many ways in which one could use numbers to come to various conclusions – which is the main reason why stats exist - but you are right, it is hard to make judgments based solely on the preciseness of a group of numbers gathered for a specific reason (the reason being the biasing factor.) Freakonomics has many clear examples of how the manipulation of numbers can lead us to discover some amazing facts – insofar as they reflect the temporary evidence of a comparison. For me, however, the bigger question is not whether the numbers are exact but whether or not reporting these numbers early in the game influenced the outcome the general election. In other words, whether or not premature polling influenced the final verdict by discouraging the opposition and motivating the winning party.

Another, perhaps more important question – and one rather personal – is how does the United States have such LOW standards when it comes to the most important act of democracy? In other words, how could we attempt to utter that voting this year was HIGHER than expected? In Ecuador, my country, we expect the WHOLE country to vote so any number greater than the figure which encompasses all citizens over 18 years of age is unexpected. There aren't long lines either because people vote on Sunday and there are more voting locations per capita than there would ever be here, even when the United States is regarded as the nation of nations!