What do you do when the predicted is zero?

A very strange spirometry report came across my desk a couple of days ago.

Observed: Predicted: %Predicted:
FVC: 3.07 0 29767
FEV1: 2.15 0 37586
FEV1/FVC: 70 71 101%

My first thought was that some of the demographics information had been entered incorrectly but when I checked the patient’s age, height, gender and race all were present, all were reasonably within the normal range for human beings in general and more importantly, all agreed with what was in the hospital’s database for the patient. I tried changing the patient’s height, age, race and gender to see if it would make a difference and although this made small changes in the percent predicted when I did this the predicteds were still zero.

Or were they? They actually couldn’t have been zero, regardless of what was showing up on the report, since the observed test values are divided by the predicted values and if the predicted were really zero, then we’d have gotten a “divide by zero” error, and that wasn’t happening. Instead the predicted values had to be very close to zero, but not actually zero, and the software was rounding the value down to zero for the report. Simple math showed me the predicted value for FVC was (very) approximately 0.0103 liters, but why was this happening?

Fortunately, my lab’s database can be viewed and manipulated via MS Access. Although I am far from an expert in SQL itself MS Access makes it relatively easy to write database queries and I’ve been doing this for at least the last 20 years or so. For example, there are several pulmonary physician researchers that look for patients that meet certain age, gender and FEV1 and FEV1/FVC ratio criteria for possible enrollment in different studies. I wrote a rather complicated query for this (it calculates predicted FVC, FEV1 and FEV1/FVC ratio using the NHANESIII reference equations) about 15 years ago, and have updated it every time it needed a new test date range, new patient age range, new FEV1 range or whatever. I’ve written numerous other types of queries off and on for years and I’m reasonably familiar with the lab database and its structure. This meant that if there was a problem with the data being used to calculate the predicteds, there was only a limited set of inter-related files that the error could be in, and I had a good idea which ones they were.

I wrote a quick query that selected the patient’s records and started paging through them. After a bit of searching I finally found the data field that was causing the problem. I corrected the value I found in it, and when I did and re-generated the report, it finally came out looking a lot more normal.

Observed: Predicted: %Predicted:
FVC: 3.07 3.41 90%
FEV1: 2.15 2.36 91%
FEV1/FVC: 70 71 101%

So what was the data field? It was labeled prediction_correction_factor and as best I can understand it, it appears to be a value that is used modify the predicted after it has been calculated. The value that was in the field was 21757 and all other patients tested on the same day had a no value at all in that field (i.e. the same field was empty for everybody else even for those with the same ethnicity). Since this is the first time I can remember ever seeing this problem it was most probably a computer glitch of some kind

So, problem solved and maybe I could go back to other things? Well, sort of yes, since as I said this problem doesn’t seem to have happened before and I’ve been reviewing reports for over 25 years. But I got to thinking about it and the only reason that it was obvious was that the value in that field was so large. It might never have been noticed if it had been a smaller value that made only a small change to a predicted value. Just as importantly there are also some issues about where this field is located in the database and how it can affect the predicted values for not just one visit, but all of the visits a patient makes to the lab.

First, why is there a field like this in the first place? My best guess (and it will probably remain a guess since the company we acquired our test systems from hasn’t answered any of the questions I’ve had about the database in over 10 years) is that it is a holdover from the time when it was more common to perform racial corrections using 85% of the predicted value for Caucasians for Blacks and 92% for Asians (or thereabouts, anyway). Since the 2005 ATS/ERS standards for interpretation were published however, the recommendation has been that that ethnicity-specific equations be used instead and that is the way our software is currently organized; there are different sets of equations for different ethnicities and the software selects which set is to be used based on the ethnicity entered in patient demographics. But our database goes back well before 2005 and there has to be be a way to bridge the difference between older and newer test records and this may have been one of the ways that this was done.

But our lab database actually stores records in several files to maintain a patient’s demographic information. For example, values that can and do change from one visit to another such as weight, height and age are kept in one file and information that doesn’t change such as ethnicity and date of birth is kept in another. There is in fact, only one record kept for a patient’s non-changing information no matter how many visits they make to the PFT Lab. Since this record contains the patient’s ethnicity, it shouldn’t be a surprise that it also contains the prediction_correction_factor as well. What this means is that if this field is somehow altered it will affect all patient visits, not just the current one.

There is no general fix for this kind of problem since realistically we are completely dependent on our test system software to calculate predicted and percent predicted values (not to mention making the test measurements in the first place). It’s just not possible to check all of the calculations by hand. We don’t have the manpower and we don’t have the time. I could write a query that regularly looks for odd values in the prediction_correction_factor field, but that’s not something that many other labs could do.

Still, it was somewhat disconcerting to find that our lab database contains a single field that is able to wreak as much havoc on predicted calculations as it appears that this one is able to.

What is particularly concerning is that this field is not accessible through any the regular lab software functions (which I suppose makes sense if it is a vestigial function) but is only by looking directly at the database. If we weren’t able to do this the only possible fix would have been to delete all of the patient’s records (for all of their visits, including their demographic records) and re-enter the numerical values manually, losing all of the graphical information (flow-volume loops and volume-time curves) along the way. And it would have taken a while to figure out that this was our only option because I know I would have tried a lot of other things before I thought about deleting and re-entering everything.

This problem a reminder that we still need to check calculated values whenever we think something is “off”. There’s no guarantee that our systems are always correct and as complicated as they are, errors of one kind or another are probably inevitable.

Creative Commons License
PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

3 thoughts on “What do you do when the predicted is zero?

  1. Dear Mr. Johnston, thank for your interesting case. I am working in the PFT Industry as a physiologist. Answering the client’s questions about reference values and coding them to software is a part of my job. I found that Mr. George Box was totally right by saying that all the models are wrong. Some time a modern, powerful model simply does not fit a target population in an European country, whilst it was based on Cauciasians. So I don’t trust any predicted value. Recently I found that from a good dataset, we can build an algorithm that allows distinguishing patients from healthy persons, using directly the measured values of PFT indices (Machine learning based approach). Such algorithm detects the pathological pattern in combined PFT data. The algorithm sees all interaction and correlation between PFT parameters and estimates the probability of the disease. The conventional predictive models simply aim to transform the measured value into Z-score or percentiles,but the physician will always need to interpret those results using a classification rule (decision tree).

    • Le –

      I think that the limitations of reference equations has a lot to do with the choice of anthropometic measurements used when they are generated. Machine learning sounds interesting but you’ll pardon me if I remain somewhat skeptical. I’ve seen numerous approaches to automated analysis over the years (branching logic, fuzzy logic, neural nets, Bayesian statistics) and all have largely failed due to an inability to assess test quality. Machine learning is only as good as the quality and completeness of the dataset it is trained with. The quality and accuracy of it’s output can only be as good as the quality and completeness of the data it is analyzing.

      Regards, Richard

Leave a Reply

Your email address will not be published. Required fields are marked *