May 31, 2018

Analysis: Bias Alleged in Extrapolation Audits


Improper use of extrapolation statistics is stunningly egregious, according to the author.

Over the years, I have written extensively about the statistical extrapolations that are a large part of the audit process. In general, auditors—both government and private—make use of the extrapolation to significantly increase the overpayment estimates imposed on providers.

For example, an auditor may review, say, 30 claims and find a total of $1,250 in overpayments. Using statistical extrapolation techniques, they infer that overpayment to the universe of claims from which the sample of 30 was drawn. In many cases, this pushes that $1,250 to hundreds of thousands, if not millions, of dollars in overpayment demand.

One has to know that, in order to employ the use of extrapolation, everything else has to be done as accurately and precisely as possible. At least, that’s the opinion of the statistical community at large. But, apparently, that opinion is not shared by the U.S. Department of Health and Human Services (HHS) or the Centers for Medicare & Medicaid Services (CMS), as evidenced in the hundreds of post-audit extrapolation cases I have either reviewed or worked on directly as the statistician defending the provider.

The patterns I have seen with regard to improper use of extrapolation statistics are stunningly egregious, and while I would like to lay blame on the auditors, such as the Program Safeguard Contractors (PSCs), Recovery Audit Contractors (RACs), Zone Program Integrity Contractors (ZPICs), Unified Program Integrity Contractors (UPICs), etc., the real blame is on the guidelines, which are written in such a way as to give huge latitude to the auditors when it comes to their math and methodologies. These guidelines are contained in Chapter 8 of the Program Integrity Manual (PIM) and one would think that because improper use of extrapolation can impose huge and inaccurate penalties, these guidelines would at least approximate the standards of statistical practice. Yet nothing could be further from the truth, and I have hundreds of examples to prove my case.

The provider, in any of these audits, has the right to appeal the findings of the auditor and CMS has established five levels of appeal, although most providers will only take advantage of the first three.

The first is called Redetermination and the review is conducted by a Medicare Administrative Contractor (MAC). Now, get this straight—the first level of appeal is handled by folks who are paid by the same boss as the folks who conducted the audit. You would be hard-pressed to find anyone who would deny that the redetermination is nothing more than a rubber stamp of approval for the auditor’s findings. It is a near complete waste of time and even CMS can’t quote a percent of extrapolations that are excluded at this level. I have never seen an extrapolation thrown out at the first level of appeal.

The second level of appeal is called Reconsideration, which is performed by a Quality Improvement Contractor (QIC), but guess with whom they are contracted? Of course: the same folks who pay the auditor and the MAC. Once again, this is normally just a rubber stamp of the audit and so far, the provider has wasted months and thousands of dollars on a process that is so broken that it shouldn’t exist at all. But it does. In the past 10 years of working on hundreds of post-audit extrapolation mitigation cases, I have had a total of five extrapolations thrown out at this level. And all of these were due to the auditor failing to produce the requisite information.

The third level, and likely the last for most folks, is the Administrative Law Judge (ALJ). In the process, this is the first level of appeal that employs an independent and unbiased arbitrator and my experience has been that well in excess of 50 percent of ALJ hearings have resulted in the discharge of the extrapolation due to fatal flaws found within the statistical process. At face value, it is quite amazing that a truly independent judge would find so many extrapolations faulty enough to be thrown out when this almost never happens at the prior two levels. It not only defies probability, it defies logic; and I can’t think of a stronger case for investigating either incompetence or structured bias at the first two levels.

As an example, in one audit where the face value of $34,000 was extrapolated to $2.4 million, the auditor used the wrong unit for the audit. Here, the auditor chose to use the claim as the unit but there were so many claim lines representing various unrelated procedure codes for each claim, the overall variability in both the distribution of procedures and the paid amount should have invalidated the use of the sample for extrapolation. But the QIC responded by quoting of the PIM, where it says, “In principle, any type of sampling unit is permissible as long as the total aggregate of such units covers the population of potential mis-paid amounts." Note the words “In Principle.” What does that mean? I take it to mean that it might work as long as everything else falls in place. But this same section also says that sampling units may be an individual line within claims, individual claims or clusters of claims (e.g. a beneficiary). The auditor is saying that, even though the claim results in a flawed result, as long as it is mentioned in the PIM, it’s okay. Even though in this case the use of the beneficiary as the unit would have resulted in a far more accurate result, the PIM protects the auditor from abiding by the standards of statistical practice. It should be noted that in this case, a federal judge agreed with me rather than the auditor.

In another case, it was clear the sample size was way too small and the distribution for the paid amount for the sample frame (and subsequently the sample) were heavily skewed to the right. When this happens, we see a violation of the Central Limit Theorem (CLT), which is a non-negotiable axiomatic foundation of inferential statistics. In fact, the CLT requires three criteria for extrapolation; that the data are normally distributed, the units are independent of each other, and the sample size doesn’t exceed 10 percent of the sample frame. In this case, the issue of normality was in question. Here, the QIC quoted section of the PIM, where it says, “A challenge to the validity of the sample that is sometimes made is that the particular sample size is too small to yield meaningful results. Such a challenge is without merit as it fails to take into account all of the other factors that are involved in the sample design.” What does that mean, anyway? What other factors, and how can you say the challenge is “without merit” if you don’t specify those other factors? Again, the rules within the PIM are in contrast to standards of statistical practice, yet they seem to be cast in stone.

In one case, I objected to the use of the paid amount as the variable of interest, for both calculating sample size as well as stratification. And again, the auditor quoted chapter and verse in section of the PIM, stating “A common situation is one in which the overpayment amount in a frame of claims is thought to be significantly correlated with the amount of the original payment to the provider or supplier.” This correlation rarely exists because the range of paid amounts for any one line item can be huge. For example, in a recent audit, the paid amount for a 99215 ranged from $0 (that’s right, that line item wasn’t paid) to around $195. If you calculated the average and standard deviation, you would see that there was almost no correlation between the paid amounts and the overpaid amounts. But, again, if the PIM doesn’t address that issue specifically, then the auditor takes license to violate those statistical standards.

This one is of particular interest to me because the U.S. Department of Health and Human Services (HHS) Office of Inspector General (OIG) and Dr. Edwards, who works closely with CMS to create and defend these guidelines, have both objected to the use of the paid amount as the variable of interest. Go to, which is the FAQ page the OIG has put together for folks subject to a Corporate Integrity Agreement (CIA). Here, you will see this question: “Can the full sample size be estimated based on paid amounts rather than overpayments?” The answer is, “No. Estimating the sample size based on paid amounts will yield an estimate of the sample size needed to determine the actual amount paid, a figure already known. Instead, the objective is to estimate the amount of the overpayment; thus the figures entered into the full sample size calculation must be the overpayment amounts.” The auditor’s response? “We aren’t the OIG and are not bound by their guidelines, even though those guidelines adhere to sound statistical principles.” What a sorry excuse! Even when it can be shown that there isn’t any correlation, they fall back to the latitude given under the PIM. In fact, one of the reference books listed in the PIM is “Sampling Techniques” by William Cochran, Ph.D., a former professor of statistics at Harvard University. This book is widely considered the bible of statistical sampling and extrapolation. On page 78 of that text, under section 4.7, Professor Cochrane specifies exactly what steps the auditor could take in order to estimate the overpayment amount, but hey, I guess that just requires too much work on the auditor’s part so rather than follow proper statistical practices, it is easier to hide behind the poorly written and elastic guidelines within the PIM.

Further on this point, even Professor Don Edwards weighed in on this flaw. On October 14, 2009, Professor Edwards gave at talk at the 13th Annual Medicare/Medicaid Statistics and Data Analytics conference in Omaha, NE. On slide 22, he makes the following statement: “In short, if you stratify by paid amount, YOU’RE STRATIFYING THE WRONG POPULATION.” ‘nough said.

One of the more egregious issues has to do with precision. Precision can be defined as the property of the set of measurements’ ability to be reproducible or of an estimate having a small random error of estimation. In general, you want the results of the sample to be as precise as possible, especially if you are going to use the results for extrapolation. With precision calculations, the smaller the precision, the better.

In a recent case, the precision was a whopping 34 percent, which is unheard-of with regard to applying the results to an extrapolation. In this case, it meant that the that the difference between the mean overpayment and the lower bound of the 90 percent confidence interval was 34 percent! That is just huge! And by no means can an estimate that imprecise be considered as appropriate for extrapolation. The auditor’s response? “There is no requirement that a certain precision level be achieved.” If you’re like me, you might find this really hard to believe. Basically, the auditor is saying that because the PIM does not specifically spell out some range of precision values, that precision simply no longer matters. But it does matter, and it matters in the right way. Go back to the OIG FAQ page for CIAs and you will see that they state that if the precision is above 25 percent, you should stop the audit and go home. The Office of Management and Budget (OMB) clearly states that when looking at overpayment calculations, the sample should be designed such that the results meet a precision level of no more than 2.5 percent. In fact, in the August 2008 publication of the Federal Register, CMS echoes the same requirements. So it does matter; it matters to the OIG, the OMB, CMS, and the general statistical community. It only doesn’t matter to CMS auditors.

The examples go on and on and on, and I imagine I could write hundreds of pages that exemplify the fatal flaws executed by government and private payer auditors. But where does the blame lie? Is it with the auditors or is it with the PIM, which constitutes a set of guidelines that are in conflict with so many standards of statistical practice? I say it’s both. Unfortunately, the auditors are financially incentivized to find as much overpayment as possible, so, in lay speak, they are mostly bounty hunters and they really don’t care whether they get it right or not. For the practice, it costs nearly $110 for every claim that is appealed, win, lose or draw. For the auditor, however, there isn’t any penalty for consistently getting it wrong. There aren’t any requirements with regard to accuracy or appeal rates when it comes to these complex reviews.

It’s time for this to change.

It’s time to take a stand against this. Literally billions of dollars a year are being extorted from healthcare providers due to a system of rules, regulations, and guidelines that are not outdated, but just plain wrong. If there are standards (and there are), then they need to apply to everyone, including the government. I am grateful for groups like the Physician Advocacy Institute (PAI) (, a very active and effective physician advocacy group. I know that they have been involved in trying to get Chapter 8 of the PIM updated, but they are meeting with resistance at every turn, and guess who are the opponents? I suggest you check out their web page and click on the Fair Medical Audits link. We need other groups to get involved, as well.

In the world according to Frank, every provider in the country would drop out of Medicare next week. I know it sounds harsh and I know it would place a burden on Medicare recipients, but in the long run, it would force a change in the system. Maybe not to advocacy, but certainly away from antagonism. But alas, that’s not going to happen.

My dear friend Henry Shaw used to say that people change when the pain of where they are at becomes greater than the fear of where they might end up. I guess we just aren’t yet in enough pain. In the words of one of my favorite musicians, Sam Cooke, a change is gonna come.

And that’s the world according to Frank.


Comment on this article

Frank D. Cohen, MPA, MBB

Frank Cohen is the director of analytics and business Intelligence for DoctorsManagement, a Knoxville, Tenn. consulting firm. He specializes in data mining, applied statistics, practice analytics, decision support, and process improvement. He is a member of the RACmonitor editorial board and a popular contributor on Monitor Monday.

This email address is being protected from spambots. You need JavaScript enabled to view it.

Related Articles