Yogi Berra is credited with the saying “It’s tough to make predictions, especially about the future.” Actually, it was Niels Bohr, a brilliant physicist who received the Nobel Prize in Physics in 1922, who is responsible for this quote, but more power to Yogi for getting the credit. A derivative of this might go something like this: “It’s tough to code, especially when using E&M codes.” (Credit to Frank Cohen)
The use of evaluation and management (E&M) codes is strictly bound to a set of complex and sometimes confusing criteria found within one of two sets of guidelines. There are the 1995 guidelines (which were really introduced in 1994) and the 1997 guidelines (which were really introduced in 1996). Both are made up of a series of tables, lists, and grids that must be followed in order to determine the correct codes. This is particularly true for, say, office and hospital encounters. For both of these, there are three key components: history, physical exam, and medical decision-making.
Which code you choose depends upon a very complex and combinatorial matrix of grids, which convert criteria into key components. In fact, I once did an informal calculation of the complexity of coming up with the proper E&M code through which I determined that a provider (or coder) has to go through some 1,600 decision points before assigning a code. And even then, if you believe several good scholarly studies that have been performed on the subject, 42 percent of those that reviewed the chart would disagree with the other 58 percent who also reviewed the same chart, at least as it pertains to code level. God bless coders. I don’t know how you do it, but I’m glad it’s you and not me. I’ll stick to simple areas, like applied statistics and predictive analytics. It may not sound easy, but at least I am bound to a set of rules that are relatively axiomatic rather than elastic.
So there’s the history lesson on E&M codes; at least that’s all I really care about. The point is, it’s tough to code, especially when using E&M codes. As such, human beings have long been searching for an easier and more consistent method to code correctly. I guess we should forget about scrapping the incredibly poor method of guidelines we use now because, well, nobody seems to talk about that much. Instead, the industry has created a niche market of “cheat sheets,” and even though I am not a coder, I have seen hundreds if not thousands of these over the past 20 years. Go to any medical conference and probably half the vendors are giving them away for free. They are either little cards (or big cards) or one-sheets that, in their own way, have found a method with which to create grids to simplify the process. Great. Works for me.
Eventually technology was introduced into the system, and if a little bit is good, then a lot is better, right? Wrong, actually. But there are those who see it as the silver bullet to the E&M coding problem. Right now there are electronic medical record (EMR) systems out there that purport to be able to select the most correct and appropriate E&M codes based on the information the provider enters into the program. The question is, does it work? And even more importantly, if it does work, how well does it work? This, my friends, can be a compliance nightmare. I have worked with many organizations that have faced this dilemma: shoot or don’t shoot. Should they employ the automatic level-of-service wizard, or not?
My experience has been that, without knowing whether there will be a negative or positive impact (meaning whether the “wizard” would create a greater risk through an increase in coding errors) it is best not to use these automated systems for these purposes. Now, I didn’t say you shouldn’t use them at all; I said that, without some evidence that they don’t increase risk, you shouldn’t use them. And this is where statistics comes into play.
I have designed and conducted several experiments for clients to determine whether the “wizard” would prove to be a benefit or a hazard, and this is a summary of how it would work.
There are two parts to these types of studies. One examines visible risk and the other looks at inherent risk. For the former, it can be as simple as trending the average levels of codes within each category before and after the “wizard” is turned on within the system. The goal is to assess whether a change in utilization patterns will pose a visible risk; that is, will it look different enough to draw attention? Even if the coding is correct, the auditor doesn’t know that unless and until he or she reviews your charts. This is about being the squeaky wheel, and at the outset, if there is a large positive shift, this puts you on notice that your risk for auditor review may increase. But this is not a reason to scrap the wizard. That comes in the second part.
For the second part of the study, the goal is to test the system for error frequency – not to conduct a statistically valid analysis of error by providers, as the purpose is to test risk, not to create it. To begin, we need to establish some baseline error rate. To do this, I take a random sample of, say, office and hospital visit codes from some group of providers within the practice. It can be all providers or it can be some providers. If it is only some, then you are getting into more complex sampling methods, but since this isn’t really a statistically valid study, do your best to pick a random group of providers. How large should the sample size be? That depends on how precise you want your measurements to be. For example, if you estimate a starting error rate of, say, 20 percent and you want to be accurate within plus-or-minus 5 percent, you will need a minimum of 528 units. If you are OK with plus-or-minus 10 percent, then you can go with a minimum of 137 units. In any case, you take this sample and sterilize it, meaning that you don’t want to know the names of the physicians, because the purpose is not to conduct a qualitative analysis by provider, but rather to perform a quantitative analysis for the group. Audit the charts and record the error rate. Let’s say, for the sake of argument, you get an error rate of 15 percent. Then you turn on the wizard. Train for the first 30 days, normalize for the next 30 days, and then take another random sample (with the same number of units from the same group of physicians) from the next 30 days. Audit these and compare the results. If the error rate is the same or less, then the wizard doesn’t introduce any additional risk. If the error rate is higher (in my case, I like to see it statistically, significantly higher), then forget the wizard, as it will introduce more error and likely increase your risk.
The point here is that, before you employ a new technology, test it to see what impact it will have on your organization. It is my experience that CMS is not a fan of automated level-of-service wizards, so if you are determined to use it, at least make sure that, when the auditor comes knocking, you at least can say that you conducted a test and found that there were no statistically significant differences between using the wizard and not using the wizard. And a point of clarity here: the test itself can be significant without the review of the physicians being significant. This is important because remember, the goal is not to identify individual charting and coding issues, but rather to establish a benchmark for comparison and testing.
And what about that 15 percent error rate? Well, fix it – and that’s one less thing to worry about.
About the Author
Frank Cohen is the senior analyst for The Frank Cohen Group, LLC and CIO for DoctorsManagement. He is a healthcare consultant who specializes in data mining, applied statistics, practice analytics, decision support, and process improvement. Mr. Cohen is also a proud member of the National Society of Certified Healthcare Business Consultants (nschbc.org).
Contact the Author
To comment on this article go to firstname.lastname@example.org