March 20, 2014

Guilty Until Proven Innocent


A few years ago, I accepted an engagement with a physician practice that had undergone a Centers for Medicare & Medicaid (CMS) audit – and, as a result of the extrapolation, was required to repay over a million dollars in what the auditor labeled overpayments. My job was to conduct a full statistical review of the sampling, point estimates, and extrapolation and render an opinion as to the validity of the auditor’s findings. And as happens often with my analyses, my findings were that the auditor did not follow either CMS’s rules or basic statistical standards of practice. But it gets worse, and here’s the story:

The audit consisted of a review of 100 claims in four strata. As is often the case with these audits, the determination for the creation of strata was without merit or logic. The purpose of stratification is to separate the variables within the universe into bundles so that the characteristics of the data within each bundle are mostly homogeneous; that is, they pretty much look the same while being significantly different from the characteristics of the data within the other bundles. Like with so many audits, in this case the strata were created based on paid-to-provider amounts. For example, stratum 1 may be claims with paid-to-provider amounts of under $100, stratum 2 with amounts between $100 and $250, stratum 3 with amounts between $250 and $1,000, and stratum 4 with amounts over $1,000. In doing this, the auditor makes an assumption that the dollar amounts define the claim characteristics, which simply doesn’t hold up to fact in many cases. A more appropriate method would be to separate E&M codes from surgical codes, for example, as each has a totally different mechanism for selecting the correct code, creating certain differences in characteristics. Anyway, back to the story.

The other major problem with the analysis was that the auditor chose to select an equal number of claims for each stratum, which works when you extrapolate by stratum, but in this case (as with every other case I have worked on), that was not how it was done. In fact, after the audit was complete, the auditor recombined the results into one single 100-sample file and calculated the extrapolation, including all 100 of the sample data points. This is a huge problem, and it occurs way too often in these audits. Let’s say that 75 percent of your claims had a paid-to-provider amount that placed them into the first stratum, which consists of the lowest paid-to-provider amounts, or the lowest value claims. The sample, by design, would contain 25 percent of these claims, which would be one-third of the ratio found within the universe. Then let’s say that the fourth stratum, which contained the highest-value claims (over $1,000 in our example, meaning 10 times the value of those in stratum 1), was only 1 percent of the universe. Still, because of the sampling design, the sample contained 25 percent of this group, which is a ratio 25 times higher than that of the universe. By doing this, you under-sample the lowest-value claims while oversampling the highest-value claims. The result is a sample that is significantly biased towards those high-value claims.

In Chaves County Home Health Servs. v. Sullivan, a court of appeals found that “absent an explicit provision in the statute that requires individualized claims adjudications for overpayment assessments against providers, the private interest at stake is easily outweighed by the government interest in minimizing administrative burdens; in light of the fairly low risk of error, so long as the extrapolation is made from a representative sample and is statistically significant, the government interest predominates.” This means that, while extrapolation is considered a reasonable and acceptable method to determine overpayment amounts, it cannot be leveraged unless it includes a “representative sample and is statistically significant.” Do you see what is missing here? That phrase doesn’t include the term “random” despite the fact that a sample can certainly be random (if every data point has an equal opportunity to be selected) without being representative or statistically significant.

So in this case, using the two most common statistical tests for representativeness (the two-sample t-test and the chi-square goodness-of-fit test), the results were unequivocal: this sample was not statistically significant in that it was not representative of the universe of claims. This is not my opinion, mind you, it is a statistical certainty. Both tests yielded miserable failures and it was quite obvious why. The average paid-to-provider amount for the sample was three times that of the universe, which means that it was pretty much certain that the estimated overpayment amount per claim was significantly higher than it should have been, resulting in an extrapolation that was hugely overstated – this, of course, presented a serious financial injury to the practice. And the reason was because they so oversampled the highest-paid claims and under-sampled the lowest-paid claims, which resulted in an abject failure of the chi-square test. Remember, at this point, my opinion is moot; the facts spoke for themselves.

As expected, the practice lost at both the reconsideration and redetermination levels, mostly because appeals at these levels are a complete and total waste of time and taxpayer money. The administrative law judge (ALJ) hearing was finally scheduled, but 18 months down the line. Fast forward to a few months ago and the practice finally got to plead its case before the ALJ, but not before already paying back nearly half a million dollars – because in the CMS audit universe, a practice is guilty until proven innocent. During the hearing, I presented my statistical findings, and when it came time for the auditor’s statistician to counter, he outright lied to the judge. He stated that his company always computes the extrapolated overpayment amount per stratum and then combines these to get a total. In fact, the sample plan and extrapolation reports specifically stated that this was not the case; the documentation stated that the results were first combined and then the extrapolation was performed on the entire sample. And the report also referred to an Excel file that, in fact, supported exactly what the document stated. This was not my first rodeo with this auditing entity, and during the last 50 or so audit reviews that I have conducted, they never did it the way the statistician claimed.

When the judge pointed this out to their statistician, he said something like the following: “OK, your honor, even if Mr. Cohen is correct and the sample is not statistically valid or representative of the universe, what are we supposed to do? Throw it out and start over?” This was one of the few times in my life that I was speechless. My jaw was on the floor and all I could do was sputter, stutter and spit! When the judge asked for my response to that question, I didn’t know what to say, except “heck yes, your honor, you throw it out and start again” – and the judge agreed. In fact, before the end of the hearing he ruled that the extrapolation was invalid and that overpayments, if there were any, would be determined based on the face value of the claims. And the story didn’t end there, as by the time the hearing was over, the auditor conceded more than 50 percent of the denied claims.

What I have shared with you here is not some unique, one-in-a-million case study. This happens all the time to medical providers all across the country. Currently, the ALJ has a backlog of nearly half a million cases. Anyone wonder why? Because the auditors are basically incompetent and providers know that once an impartial and reasonable person hears their case, the results will almost always be in their favor. At that point, plenty of questions inevitably pop up. Why didn’t anyone at the first two levels of appeal bother to consider the facts of the statistical review? Why did it take nearly three years for this practice to be vindicated from what was a clear case of an abusive and egregious overreach by the auditing agency? Why shouldn’t the auditor repay to the practice all of their costs involved in defending at least the statistical portion of the analysis? And perhaps of greatest importance, why aren’t any sanctions placed on the auditors when they lie to a judge in order to prevail on their findings?

Until we answer these questions and we engage our leaders to right these wrongs, we are going to continue to disincentivize physicians and other medical providers from treating patients covered by government programs – and for me, as 65 isn’t as far away as it used to be, that has become a grave concern. It is unfortunate that, in order to get paid for providing quality care to a given patient population, healthcare providers have to be subject to reviews from government contractors that appear, at least from my experience, to have a significant imbalance between hubris and ethics.

About the Author

Frank Cohen is the senior analyst for The Frank Cohen Group, LLC. He is a healthcare consultant that specializes in data mining, applied statistics, practice analytics, decision support and process improvement. Mr. Cohen is also a proud member of the National Society of Certified Healthcare Business Consultants (NSCHBC.ORG).

Contact the Author

To comment on this article please go to

Frank D. Cohen, MPA, MBB

Frank Cohen is the director of analytics and business intelligence for DoctorsManagement, a Knoxville, Tenn.-based consulting firm. Mr. Cohen specializes in data mining, applied statistics, practice analytics, decision support, and process improvement.

This email address is being protected from spambots. You need JavaScript enabled to view it.