*The PIM is a woefully inadequate guide for audits leveraging extrapolation.*

EDITOR’S NOTE: This is the fourth in a series of reports on alleged bias the author has uncovered in extrapolation audits.

This is part four of a three-part series expressing my disappointment in chapter 8 of the Medicare Program Integrity Manual (PIM).

Did you get that? Part IV of a three-part series. That’s right.

After I finished up the trilogy, I realized that I missed one very important aspect of inferential statistics and extrapolation, and that was the issue of outliers. Actually, I could easily be forgiven for forgetting this very important topic, because, and this is really strange, nowhere within the entirety of chapter 8 of the PIM is the word “outlier” found. The reason this is so strange is that the issue of outliers is huge when it comes to inferential statistical methods – and in particular, extrapolation.

First, a bit of background. In addition to not mentioning outliers, chapter 8 of the PIM also fails to mention the “median” as a critical metric when considering inferential statistical calculations – and as I will explain later, the use of the median over the mean is quite common in the presence of outliers. I know this because in a recent federal trial in which I was the statistical expert, the prosecutor asked me to find anywhere in Chapter 8 where the word “median” was used, and, knowing this was just a setup question, I responded honestly that it was not. Her conclusion, then, was that if the PIM didn’t mention it, it couldn’t be a viable metric for extrapolation. That’s simply false, and I would venture to say that I would get a resounding agreement from not just statisticians, but anyone who studied high-school mathematics.

Outliers are nearly always a part of a billing and coding audit for reasons that were discussed in prior articles. Specifically, billing audits are almost always heavily skewed right because the paid amount is bounded on the left by zero. This is because the least a provider can be paid for any procedure is, well, nothing. And while there is a practical limit to the maximum a provider can be paid, it can be very high, and as such, this skews the data to the right. In the case of this type of a skewed database, it is almost always more appropriate and accurate to use the median over the mean, and this can be particularly true when outliers are present in the data.

According to the National Institute of Standards and Technology, a part of the department of commerce, “an outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal.” Notice the part that says that the determination of whether a data point (or points) constitute an outlier is up to the analyst to decide. It also mentions a consensus process, which means using established calculations based on standards of statistical practice.

Now, I could write a whole book on outliers, but there are some basic tenets that can be easily understood and observed in order for an extrapolation to accurately represent some estimate of overpayment in a billing audit.

To begin with, outliers are common because of the way in which most auditors identify the audit unit. The most common unit is the claim, and as has been discussed in prior articles in this series, the claim is not usually the best unit. The reason for this is because a single claim can contain many claim lines, each of which represents a different procedure code, code type, or category, as well as very variable paid amounts. In fact, all one has to do is look at the paid claims data within their own organization to see what a mess this can be with regard to statistical extrapolation.

For example, I can have five claims that all represent the same diagnosis with different procedure codes in different code categories, such as E&M, imaging, surgical, pathology, etc. And within those five claims, I could show you the same code present, with each service paid as some variable amount and even not paid at all. Also, when you mix claims from different providers in different specialties for different levels of patient acuity, you are going to often see huge swings in payments. Again, for example, I can show you data for a recent audit wherein the claim paid amount went from $0.88 to $6,243. That’s a huge range, and it contained lots of outliers, both high and low.

What is the effect of outliers on extrapolation? Some of this depends what metrics are used for measuring central tendency. If the average (or mean) is used, then outliers will have a huge impact because the mean is very sensitive to these outliers. And while any data set can have both low and high outliers, the majority are made up of high outliers, based on the nature of the billing process. Not too long ago, I testified in federal court as a statistical expert, defending a physician against a couple of extrapolations that were performed by a Zone Program Integrity Contractor (ZPIC). I was trying to explain to the judge why outliers were so destructive when using the average rather than the median, and he asked me to give him an example, so here is what I told him:

“A statistician walks into a bar with 100 patrons. With his pencil and paper, he goes around and asks each patron what their income was over the past year. After getting the responses, the statistician calculates an average income of $48,000, and because the data was normally distributed, a median income of $48,000. So far, so good. Then, Bill Gates walks into the bar and orders a beer. Now, on average, everyone in the bar is a multi-millionaire, but the median remains the same: $48,000.”

The point is this; since we are pretty much stuck with the use of the average, we need to aggressively address the issue of outliers. Remember, the median is not mentioned anywhere in chapter 8, so auditors don’t feel as though they need to consider it, even though it is accepted as a standard of statistical practice (and in most cases, the more appropriate and accurate metric).

There are three ways to address the issue of outliers. The first is through the use of stratification, which is the process of separating a population into more similar subsets, and while I will argue against the use of the paid amount as the basis for stratification, that is the variable of interest used by the auditors. Using the paid amount, there are statistical calculations and processes that can be employed to effectively and properly stratify a population.

Unfortunately, this requires a bit of effort, and I have never seen a single audit for which the stratification was based on any amount of logic or reason. Mostly, the auditors just guess at some break point and leave it at that. Remember, chapter 8 does not require that the auditors explain themselves when it comes to their logic (or lack of logic) in making any statistical decisions, so we are often left to second-guessing, which is never a good strategy.

The second method would be to change either or both the variable of interest and the unit. For example, instead of using the paid amount, I could stratify by code type or category. I covered this extensively in Part III of this series. Regarding units, if we went to a more granular level, such as the specific procedure code, while we would still see some variability in payment, it wouldn’t be nearly as much, since we would not be trying to balance a composite payment based on some variable number of units within the claim. In some cases, such as the federal trial referenced above, the auditor would have been better served using the beneficiary rather than the claim as the unit. It would have normalized the payment amount and smoothed the payment variability.

The third method is to exclude the outliers from the extrapolation, which is the preferred method, since it is never appropriate to include an outlier in an extrapolation. The way to do this is by using something called a certainty stratum, and this is mentioned in section 8.4.11.1 in Chapter 8, as follows: “If it is believed that the amount of overpayment is correlated with the amount of the original payment and the universe distribution of paid amounts is skewed to the right, i.e., with a set of extremely high values, it may be advantageous to define a ‘certainty stratum,’ selecting all of the sampling units starting with the largest value and working backward to the left of the distribution.”

Reread the two parts of this; the first is a bit conditional on the fact that the overpayment is correlated to the original payment amount, which in my experience is rarely the case. Irrespective, since the guidelines do nothing to actually address that issue, this is something that auditors almost always simply accept as fact without subjecting the data to even the most basic statistical tests. The second part mentions the need to control for “extremely high values,” yet once again, it does nothing to define this.

Are they talking about outliers? If so, what statistical method or methods are recommended? Well, the answer is none, because the PIM guidelines are there as a shield to protect the auditor from reasonable objections and not as a true source of statistical validity or fairness. It even says that you should start with the highest values and work backwards. But backwards to where? Well, they might say that this is left to the statistician to determine, but whose statistician? Obviously, they can do whatever they want and defend it using the nebulousness of the wording here, but if I should contest their work, for example stating that they didn’t follow any standard or logic, their defense again would be that they are not required to explain themselves.

This section further states: “When a stratum is sampled with certainty, i.e., auditing all of the sample units contained therein, the contribution of that stratum to the overall sampling error is zero.” This means that those data points that are placed into the certainty stratum are, in effect, being separated from the rest of the sample and therefore should not be included in the extrapolation – and this is very true. Here is the next sentence: ”In that manner, extremely large overpayments in the sample are prevented from causing poor precision in estimation.” This, again, is quite true, but one of the defenses that I have seen put forth by a ZPIC for not including a certainty stratum is that they are not held to any standard regarding precision. That is true. Even though precision is an important part of the U.S. Department of Health and Human Services (HHS) Office of Inspector General (OIG) audit process and specific guidelines are given by the Office of Management and Budget (OMB) and in the Federal Register, chapter 8 says nothing about precision – and as such, in general, the auditors don’t exhibit any obligation to be bound by any standards.

Finally, we read this from the same section: “In practice, the decision of whether or not to sample the right tail with certainty depends on fairly accurate prior knowledge of the distribution of overpayments, and also on the ability to totally audit one stratum while having sufficient resources left over to sample from each of the remaining strata.” This means that the decision to use a certainty stratum is based on whether the auditor has the ability to conduct a distribution and outlier analysis – yet, out of the hundreds of extrapolation audits in which I have been engaged as a statistical expert, I have rarely seen any auditor produce these types of analyses.

But it goes on to say that irrespective of the need or the importance or the contribution to precision and accuracy, the auditor does not have to employ this method if it involves too much work on their part. Is that even possible? That’s like saying that you have a right to a jury trial, unless the process of picking a jury will take too much time and effort, leaving not enough time for a fair trial. What a bunch of hooey.

In the interest of brevity, I am not going to repeat all of my conclusions from the prior three articles. In fact, I don’t think I have to, because I am confident that any reasonable person, statistician or not, understands how inadequate and outdated chapter 8 has become when using it as a tool to inappropriately recoup billions of dollars from otherwise hard-working and honest healthcare providers. I believe we are reaching a boiling point and that there is going to be a revolt against CMS regarding the use of these guidelines. You can deny people their rights for only so long before you break the camel’s back and this, in my opinion, is that straw.

And that’s the world according to Frank.

**Program Note:** Register to attend the Frank Cohen webcast today on this subject at 1:30 pm ET.