When back-extrapolation goes astray

A spirometry report that looked very questionable came across my desk recently. The flow-volume loop was misshapen and the technician’s notes indicated that the results had been highly variable and to “interpret with caution”. I pulled up the raw test results and saw a series of test efforts with flow-volume loops that were all somewhat flattened and with no consistency in either the loops or the numerical results.

This kind of inconsistency can be an indication of poor patient effort but can also occur because of airway problems. The cardio-thoracic surgeons at my hospital have an active airway stenting program and so we see a fair number of patients with trachemalacia. One hallmark of tracheomalacia is that there is usually a flow limitation and that this means that there is usually a flat expiratory plateau in the flow-volume loops. These loops had peak flow-ish humps, but the humps seemed to appear in different locations in every loop and they seemed to have a relatively high frequency flutter.

Back_extrapolation_04_redacted

Back_extrapolation_06_redacted

One plausible explanation for the inconsistent results is vocal cord dysfunction (VCD). VCD is characterized by the paradoxical closure of the vocal cords that results in wheezing or stridor and shortness of breath. The gold standard for diagnosing it is laryngoscopy while the patient is symptomatic but it can be difficult to make a definitive diagnosis since symptoms can often come and go. VCD can mimic asthma but patients usually don’t respond to bronchodilators and have negative challenge tests. Spirometry results like these can only be suggestive, however.

The real problem though, was that the spirometry effort that had been selected for reporting indicated the patient had moderately severe airway obstruction (FEV1 56% of predicted) and there were several efforts that had a significantly higher FEV1. When I checked the numerical values it was apparent that this effort had been selected because it was the effort with the highest FEV1 whose back-extrapolation met ATS-ERS criteria.

Back-extrapolation is a technique for standardizing the beginning of exhalation during a forced spirometry effort.

Taken from the ATS-ERS Standardisation of spirometry, page 324.

Taken from the ATS-ERS Standardisation of spirometry, page 324.

Specifically, the ATS-ERS statement on spirometry says “…the back-extrapolation method traces back from the steepest slope on the volume-time curve.” Interestingly, this steepest slope coincides with peak flow. This approach to standardizing the measurement of time makes sense but it is also based on the assumption that it would correct for a “soft” start to exhalation, where the actual beginning of the effort was somewhat indeterminate.

When I looked at the other efforts, there was a test with a higher FEV1 but whose back-extrapolation was high. When I looked closely at the effort, what I was saw was that the start of the effort was actually quite good but that the peak flow occurred quite late in the effort.

Back_extrapolation_01_redacted

This means that the back-extrapolation was taken from a slope in the volume-time curve that occurred after 50% of the exhalation had occurred.

Effort back extrapolated

I don’t think this was how the back-extrapolation technique was supposed to be performed. I also think that the effort actually meets expectations for a rapid start and that the computerized back-extrapolation technique is actually mis-calculating the true start of the test. I pulled the volume-time curve into a graphics program with a ruler and found that if the start of the effort was taken from the “real” start of the test, that the FEV1 was actually 0.22 L less than reported.

Effort back extrapolated 02

Even taking this into consideration, the re-calculated FEV1 from this effort was almost a liter greater than the effort that had been selected and was therefore a more accurate representation of what the patient was capable of and so I selected this effort to be reported. In the end, both the FVC and FEV1 were reported to be WNL.

Was this the correct choice? Realistically, all of the test efforts were flawed for one reason or another and none of them could be considered to have acceptable quality. For this reason alone no choice could be the “right” choice. The effort that had been originally selected however, was chosen simply because it met the ATS-ERS criteria for back-extrapolation but it was actually quite flawed in other ways.

Original selection

Original selection

I’ve tried to decide whether this situation indicates a degree of failure in our training program or not. We try to teach new technicians the criteria for selecting spirometry results and even in this case I think we’ve been somewhat successful (if for no other reason than the note to “interpret with caution”), but I’m at bit of a loss on how we can teach the times when the selection criteria need to be ignored. I think this is only something that can come from experience and for the moment I’m going to have to leave it at that.

However, this problem also points out some significant limitations of the testing software. Out of curiosity I let the software select the “best” spirometry effort for reporting and it went back to the original selection. It is apparent that the software has been designed to reject any effort that does not meet the ATS-ERS criteria for back-extrapolation. To some extent I understand this but in this case there was another effort with an FEV1 that was larger by almost a liter. Back-extrapolation or not, this is an indication there is something wrong with the automatic selection process.

This also leads me to some concern about the results from office spirometry systems. I suspect that the staff performing tests on these systems rely heavily on the software to select the correct efforts. The manufacturers of these systems can say with complete honesty that they meet the ATS-ERS standards but how many times is a spirometry effort automatically rejected because it is only a little bit outside these guidelines despite being substantially better overall?

There are established criteria for spirometry test quality. There are valid reasons for all of these criteria but when test quality is poor it becomes necessary to understand the difference between “should” meet criteria and “must” meet criteria. At the moment there appears to be mostly a binary [ meets criteria / does not meet criteria ] approach to decision making. Since each of the criteria has a different degree of importance it seems to me that weighting the criteria and how far a test effort is away from meeting a specific criteria would give a more nuanced approach. It would be easy to say that none of this patient’s spirometry efforts should have been reported since none of them came close to meeting the criteria for test quality. You have to work with what you get sometimes, and in this case I think that not reporting would have been a worse choice than reporting what were admittedly flawed results.

References:

Brusasco B, Crapo R, Viegi G. Serier ATS/ERS Task Force: Standardisation of Pulmonary Function Testing. Standardisation of spirometry. Eur Respir J 2005; 26: 319-338.

Morris MJ, Christopher KL. Diagnostic criteria for the classification of vocal cord dysfunction. Chest 2010; 138: 1213-1223.

Wilson MA, King CS, Holley AB, Greenburg DL, Mikita JA. Clinical and lung-function variables associated with vocal cord dysfunction. Respir Care 2009; 54(4): 467-473.

Creative Commons License
PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.