Statistical analysis calls Research 2000 polls into question

Markos Moulitsas fired Research 2000 as the pollster retained by Daily Kos a few weeks ago after R2K fared poorly in “pollster ratings” compiled by FiveThirtyEight’s Nate Silver. At the time I wondered whether Markos reacted a bit harshly, since Silver himself admitted, “The absolute difference in the pollster ratings is not very great.” In addition, some polling experts had raised questions about Silver’s rating system (see also here).

Today Markos published a remarkable analysis of “problems in plain sight” with Research 2000’s polling. Three researchers uncovered “extreme anomalies” in certain results and concluded, “We do not know exactly how the weekly R2K results were created, but we are confident they could not accurately describe random polls.” You should click over and read the whole thing, but here are the anomalies in question:

  1. A large set of number pairs which should be independent of each other in detail, yet almost always are either both even or both odd.

  2. A set of polls on separate groups which track each other far too closely, given the statistical uncertainties.

  3. The collection of week-to-week changes, in which one particular small change (zero) occurs far too rarely. This test is particularly valuable because the reports exhibit a property known to show up when people try to make up random sequences.

Markos has renounced “any post we’ve written based exclusively on Research 2000 polling” and asked polling sites to “remove any Research 2000 polls commissioned by us from their databases.”

Based on the report of the statisticians, it’s clear that we did not get what we paid for. We were defrauded by Research 2000, and while we don’t know if some or all of the data was fabricated or manipulated beyond recognition, we know we can’t trust it. Meanwhile, Research 2000 has refused to offer any explanation.

This analysis only covered R2K’s weekly national tracking polls for Daily Kos, but based on the findings I no longer have confidence in R2K’s state polling either, including various Iowa polls I’ve discussed at Bleeding Heartland. Some of those were commissioned by Daily Kos, and others were commissioned by KCCI-TV, the CBS affiliate in Des Moines.

Last year the Strategic Vision polling firm was brought down by convincing allegations that at least some of its polling results had been fabricated. Research 2000 had a much better reputation than Strategic Vision, though. Markos listed some of the news organizations that have commissioned R2K polls. I am seeking comment from KCCI News Director Dave Busiek about the company’s future plans regarding polls, and I’ll update this post when I hear back from him.

Share any relevant thoughts in this thread.

UPDATE: Daily Kos is suing Research 2000 for fraud, and R2K has issued a cease and desist letter to Silver’s blog FiveThirtyEight.com.

WEDNESDAY UPDATE: Mark Blumenthal contacted a forensic data guru for his take on the statistical anomalies. Excerpt:

[Walter] Mebane says he finds the evidence presented “convincing,” though whether the polls are “fradulent” as Kos claims “is unclear…Could be some kind of smoothing algorithm is being used, either smoothing over time or toward some prior distribution.”

When I asked about the specific patterns reported by Grebner, et. al., he replied:

   

None of these imply that no new data informed the numbers reported for each poll, but if there were new data for each poll the data seems to have been combined with some other information—which is not necessarily bad practice depending on the goal of the polling—and then jittered.

In other words, again, the strange patterns in the Research 2000 data suggest they were produced by some sort of weighting or statistical process, though it is unclear exactly what that process was.

JULY 4 UPDATE: Mark Blumenthal reviews what we know so far about this “troubling” story at Pollster.com.

About the Author(s)

desmoinesdem

  • Research 2000's Response

    Research 2000 just responded: http://www.fivethirtyeight.com…

    This is going to be interesting!

  • Lies, Damned Lies, AN=nd Statistics

    The problem for me is that you never get to see the code book to determine how variables are defined in the context of the polling matrix. I can’t find anything from R2K on any poll that would let me accurately defend their research in the simplest of terms, i.e. mean, median, mode, and range.

    Maybe I just don’t get “polling”,  I didn’t do “polling” as an RA working on NIH projects.

    And don’t get me wrong, I’m not just picking on R2K.  The dependency on large firms contracting polling numbers is absolutely ridiculous.  It’s an addiction to a game that I find most people I know can’t even wrap their head around.  

    People who oppose my political views throw numbers at me, as if that nails the argument in their favor.  And when I question them, I find that they are incapable of even defining “mean, median, mode, and range”, let alone getting into parametric vs non-parametric tests.  For example, DmD, what is the standard normal distribution for Chi-square?  C’mon, think girl, you know this one…

    • only took one semester of statistics

      It probably won’t surprise you to learn I’m more of a verbal person than a numbers person. I do know what mean, median, mode and range are, though!

      • Only one semester? That actually surprises me.

        I guess hanging and working with geekernatural types most of my life has colored my sense of what they actually teach people.  ðŸ™‚

        I love math, many things can be defined with arithmetic. Some trends analysis can produce a pretty accurate model of aspects of society.

        And I understand that these independent contractors have some proprietary things going on, but I see that as a bit of a problem.

        That’s something to ponder on, how to create an open source polling team and methodology that would create independently verifiable results.  Of course it might be a problem for the folks cashing in on the media’s numbers addiction.  

        Move over Indymedia!  And where’s Ragbrai08 these days when I need her to help get this off the ground?  Need an ubergeek here!

  • Same Pattern in KCCI Polls

    See here and here:

    http://www.kcci.com/news/22602…

    http://www.kcci.com/politics/2…

    Men and women are always either both even or both odd in every single breakdown.

    • that's a small sample size

      You could easily have that happen by chance in a few polls. It’s more weird when you look at hundreds of polls. But if there is some kind of algorithm creating that pattern, it seems logical that R2K would have used it in state-level polls as well as in national polls.

    • hang on

      I see what you mean–it’s not just on the favorability questions, it’s on every question in the poll that men and women are either both even numbers or both odd numbers. Weird.

    • also true for KCCI poll of June 2010

      On every question, responses for men and women are either both odd numbers or both even numbers:

      http://www.kcci.com/politics/2…

Comments