A modest proposal for a clinical spirometry grading system

A while back I reviewed the spirometry grading system that was included in the 2017 ATS reporting standards. My feeling was, and continues to be, that its usefulness is very limited because it’s mostly a reproducibility grading system that relies on a few easy-to-measure parameters. This doesn’t mean that a grading system can’t be helpful, just that it needs to be focused differently.

In a clinical PFT lab many patients have difficulty performing adequate and reproducible spirometry, but that doesn’t mean the results aren’t clinically useful. Moreover, suboptimal quality results may be the very best the patient is ever able to produce. So what’s more important in a grading system than reproducibility is the ability to assess the clinical utility of a reported spirometry effort.

The two most important results that come from spirometry are the FEV1 and the FVC, and I strongly believe that they need to be assessed separately. For each of these values there are two aspects that need to be determined. First, is there a reliable probability that the reported value is correct? Second, are any errors causing the reported value to be underestimated or overestimated? The two are inter-related since a value with excellent reliability is not going to have any significant errors, but if there are errors then a reviewer needs to know which direction the result is being biased.

The current ATS/ERS standards contain specific thresholds for certain spirometry values such as expiratory time and back-extrapolation. Although these are certainly indications of test quality they are almost always used in a binary [pass | fail] manner. In order to assess clinical usefulness however, you instead need to grade these on a scale. For example an expiratory time of 5.9 seconds for spirometry from a 60 year-old individual would mean that there is a small probability that the FVC is underestimated, but with an expiratory time of 1.9 seconds the FVC would have a very high probability of being underestimated and this needs to be recognized in order to assess clinical utility.

Note: Although the A-B-C-D-F grading system is rather prosaic it is still universally understandable, so I will use it for grading reliability. An A grade or an F grade are probably easy to assign but differentiating between B-C-D may be more subjective, particularly since reliability depends on multiple parameters and judging their relative contribution is always going to be subjective at some point. For bias, I will be using directional characters (↑↓) to show the direction of the bias (i.e. positive or negative), so ↑ will indicate probable overestimation, ↓ will indicate probable underestimation, and ~ indicates a neutral bias.

FEV1 / Back extrapolation:

Back-extrapolation is a way to assess the quality of the start of a spirometry effort and the accuracy of the timing of the FEV1. The ATS/ERS statement says that the back-extrapolated volume must be less that 5% of the FVC or less than 0.150 L, whichever is greater.

My experience is that an elevated back-extrapolation tends to cause FEV1 to be overestimated far more often than underestimated. So a suggested grading system for back-extrapolation would be (and I’ll be the first to admit these are off the top of my head and open for discussion):

FEV1:    
Back-Extrapolation: Reliability: Bias:
Within standards: A ~
> 1 x standard, < 1.5 x standard: B
> 1.5 x standard, < 2 x standard C ↑↑
> 2 x standard, < 2.5 x standard: D ↑↑↑
> 2.5 x standard F ↑↑↑↑

Continue reading

Infection Control

The issue of infection control has been a topic of a couple of discussions I’ve had lately. In particular, it was reported to me that a PFT lab had come under fire from a Joint Commission inspector who did not believe that filter mouthpieces were adequate and that “patient valves and circuits need to be sterilized between each patient”.

Unfortunately with all the other things we have to worry about it’s all too easy to become blasé about infection control. This despite the fact that every hospital I’ve visited in the last dozen or so years has posted numerous signs about hand washing and the safe disposal of contaminated supplies. But maybe it’s because we’re inundated with reminders that we’ve developed a blind spot about it.

The 2005 ATS/ERS statement on general considerations has two pages devoted to infection control (pages 155-157). The ATS procedure manual also has four pages devoted to infection control (pages 34-38), although much of this is devoted to a discussion of tuberculosis, cystic fibrosis and sterilization procedures. Of necessity, the ATS/ERS statement and ATS procedure manual discuss infection control in generalities and any given lab will need to have a policy tailored for their specific circumstances. Even so, either or both of these (as well as Kendrick et al’s 2003 review) should be the basis for your lab’s policy on infection control (and you do have one, don’t you?).

So what are the issues?

Diseases can be transmitted by direct contact (saliva) or indirect contact (airborne particles). PFT Labs need to prevent cross-transmission of diseases by the use of barrier devices (gloves, filter mouthpieces) and proper cleaning procedures.

So yeah, it’s as simple as that, but as usual the devil is in the details and in particular there are trade-offs between expense, time and efficacy. Continue reading

Is gas trapping more common than we think it is?

Over the last couple of years I’ve run across a number of test systems that do not include tidal loops along with the maximal flow-volume loop. I’ve wondered why this was done and because of this I’ve thought a lot about tidal flow-volume loops and what additional information, if any, they add to spirometry interpretation.

One of my thoughts has been about the relationship between obesity and the IC and ERV. FVC and TLC are often reasonably preserved even with relatively severe obesity. FRC, on the other hand, is often noticeably affected with even minor changes in BMI (and interestingly this applies to reduced as well as elevated BMI’s). When FRC decreases because of obesity the IC usually increases and the ERV decreases and for this reason the IC/ERV ratio has been suggested as a way to monitor changes in FRC without having to actually measure lung volumes.

IC and ERV are not measured as part of spirometry but the position of the tidal loops gives at least a general indication of their magnitude and I’ve noticed that there’s a moderately good correlation between BMI and the position of the tidal loop.

With this in mind, I see up to a dozen reports a week with restrictive-looking spirometry (i.e. symmetrically reduced FVC and FEV1 with a normal FEV1/FVC ratio) on patients with a diagnosis of asthma. This is nothing new and there have probably been at least 10 articles in the last decade about the Restrictive Spirometry Pattern (RSP). Interpreting these kinds of spirometry results is always problematic, particularly when there are no prior lung volume measurements to rule-in or rule-out restriction. I’ve noticed however, that patients with a restrictive spirometry pattern almost always have the tidal loop on the far right-hand side of the flow-volume loop (zero or near zero ERV). For example:

Observed: %Predicted:
FVC: 1.65 74
FEV1: 1.21 73
FEV1/FVC: 73 100

But there doesn’t seem to be any relationship between this observation and the patient’s BMI and in fact, this is seen even when BMI is normal or somewhat reduced. Continue reading

Telling the right story

The 2005 ATS/ERS spirometry standard make it permissible and even recommends that the FVC and FEV1 be selected from different efforts. I disagree somewhat with their criteria for selecting the FEV1 but overall reporting composite results makes a lot of sense. In an ideal world we’d always get the best FVC and FEV1 in a single effort but what we more often get is a good FEV1 with a poor FVC or a poor FEV1 with a good FVC. So, it best serves the clinical needs of the patient to report the best elements from multiple spirometry efforts.

However, I was disappointed that the 2017 ATS reporting standards did not in any way address how to indicate that composite results are being reported, nor does it resolve the selection of the flow-volume loops and volume-time curves that accompany the numerical results. That leaves it to us to decide how to do this but this in turn is often limited by the capabilities of our equipment’s software.

One test system that I routinely take to a free spirometry screening clinic will only report the three “best” efforts based solely on the largest combined FVC + FEV1. Admittedly, to some extent this follows the 2005 ATS/ERS spirometry standards selection criteria but other than deleting a specific test effort I cannot override these selections nor can I mix and match the FVC and FEV1 values. This means that what it reports as the “best” effort doesn’t always agree with what in reality are the best results.

My lab’s software however, allows us to select which test efforts the FVC and FEV1 come from. In addition we can select which test effort the ancillary measurements (Peak Flow, Expiratory Time, FIVC, FEF50, etc.) and which effort the flow-volume loop and volume-time graphs comes from.

It is therefore possible to select the FVC, FEV1, ancillary measurements and the graphs from entirely different test efforts. Thankfully, this almost never done but when I review reports what I see most frequently is that the FVC is selected from one test effort, but the FEV1, ancillary measurements and graphs are selected from another. To some extent this makes sense because I’d usually agree that the Peak Flow should always be associated with the FEV1, and if that’s the case, then so should the flow-volume loop. The problem with this is that the FVC often comes from a test effort with a substantially longer expiratory time and when results are selected this the volume-time curve and expiratory time are instead reported for the effort the FEV1 came from.

This leads to a report that look like this:

Observed: Predicted: %Predicted:
FVC: 2.62 3.65 72%
FEV1: 2.01 2.58 78%
FEV1/FVC: 77 72 107%
Peak Flow: 8.83 6.73 131%
Exp. Time: 1.20

with graphs like:

Continue reading

I’ve got the old back-extrapolation blues

A couple days ago I pulled my copy of the Intermountain Thoracic Society manual on pulmonary function testing off the bookshelf and thumbed through it a bit. It was first published in 1975 and was the first major attempt towards standardizing the performance and interpretation of PFTs.

My first thought was that we’ve come a long way since then. Most importantly our understanding of what spirometry can (and cannot) tell us has improved dramatically.

Equipment too, has advanced since 1975, most particularly due to the first equipment standards that were published in that decade. As a reminder, spirometer accuracy was not a given and there are number of studies dating from that time period that detailed just how woefully inaccurate many of them were.

In 1975 computerized spirometers were exceptionally rare and I was reminded of this because 141 pages (two-thirds!) of the ITS manual is filled with look-up tables for predicted values and ATPS – BTPS – STPD conversion factors.

Most spirometry systems were entirely manual and the majority of us measured FVC and FEV1 manually from pen tracings on kymograph paper. The results were then hand-calculated and then hand-written onto report forms. Since our equipment is so much more accurate and our computers acquire and calculate test results automatically, everything is so much better now, isn’t it?

Overall, I’d have to say yes. Testing is much quicker and more accurate than it used to be in 1975, and no, I’m not particularly nostalgic about those days.

{Arrrhh, gather round lads and lasses and let me tell you of the days when coal-fired steam-powered spirometers rumbled and hissed in basement labs everywhere; when you had to solve regression equations with your slide rule on the fly or risk the horror of ripped kymograph paper, exploding alveolar sample bags and spirometer bells gone ballistic without warning. The toll this daily physical and mental trauma took amongst the lowly pulmonary techs was terrifying and only the bravest continued the daily battle against gnarly patients, sneering doctors, black-hearted administrators and monopolistic manufacturers…

…Oops! Wrong time-line; those are memories from the universe one north and two left of ours. Too much steampunk sci-fi late at night and too little sleep left me momentarily confused}

I ran across an error today that reminded me that although computerized test systems are essential to our ability to run efficient and accurate labs, at the same time the limitations of software that comes along with them hinders our ability to detect and correct errors.

Continue reading

A spirometry quality grading system. Or is it?

A set of guidelines for grading spirometry quality was included with the recently published ATS recommendations for a standardized pulmonary function report. These guideline are similar to others published previously so they weren’t a great surprise but as much as I may respect the authors of the standard my first thought was “when was the last time any of these people performed routine spirometry?” The authors acknowledge that the source for these guidelines is epidemiological and if I was conducting a research study that required spirometry these guidelines would be useful towards knowing which results to keep and which to toss but for routine clinical spirometry, they’re pretty useless.

I put these thoughts aside because I had other projects I was working on but I was reminded of them when I recently performed spirometry on an individual who wasn’t able to perform a single effort without a major errors. The person in question was an otherwise intelligent and mature individual but found themselves getting more frustrated and angry with each effort because they couldn’t manage to perform the test right. I did my best to explain and demonstrate what they were supposed to do each time but after the third try they refused to do any more. About the only thing that was reportable was the FEV1 from a single effort.

This may be a somewhat extreme case but it’s something that those of us who perform PFTs are faced with every day. There are many individuals that have no problems performing spirometry but sometimes we’re fortunate to get even a single test effort that meets all of the ATS/ERS criteria. The presence or absence of test quality usually isn’t apparent in the final report however, and for this reason I do understand the value in some kind of quality grading system. But that also implies that the grading system serves the purpose for which it is intended.

In order to quantify this I reviewed the spirometry performed by 200 patients in my lab in order to determine how many acceptable and reproducible results there were. To be honest, as bad as I thought the quality problem was, when I looked at the numbers it was worse than I imagined.

The spirometry quality grading system is:

Grade: Criteria:
A ≥3 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
B ≥2 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
C ≥2 acceptable tests with repeatability within 0.200 L (for age 2–6, 0.150 L ), or 10% of highest value, whichever is greater
D ≥2 acceptable tests with repeatability within 0.250 L (for age 2–6, 0.200 L ), or 10% of highest value, whichever is greater
E 1 acceptable test
F No acceptable tests

Continue reading

3-Equation DLCO

One of the limitations of the single-breath DLCO is that the equation used to calculate results implicitly assumes that the entire breath-holding period occurs at TLC. Mathematically, what happens to the diffusion of carbon monoxide (CO) during inspiration and expiration is not a consideration:

The different approaches towards measuring breath-holding time (BHT) make allowances for inspiration and expiration to one extent or another but realistically they should be considered fudge factors.

The 3-equation DLCO was first proposed by Graham et al in 1980 and it received its name because there is a separate equation for each phase of the single-breath DLCO maneuver. The individual equations are based on the mass-balance equation and attempt to account for the mass of CO inhaled, absorbed and exhaled during the single-breath maneuver. One of the most significant differences is that an iterative approach is used to determine DLCO. Specifically, an initial estimate of DLCO is made and then compared against the values measured during the three phases. Any differences in observed versus expected values is used to re-estimate the DLCO, and then re-compare it. The authors indicated that 10 iterations are usually sufficient to converge on a DLCO value that meets all measured conditions with a high degree of accuracy.

Continue reading

VA, two ways

One of the recommendations in the 2017 ERS/ATS DLCO standards was that VA should be calculated using a mass balance equation. I’ve discussed this approach previously, but basically the volume of the exhaled tracer gas is accumulated over the entire exhalation and the amount of tracer gas presumed to remain in the lung is used to calculate VA. The conceptual problem with this for DLCO measurements is that VA is calculated using the entire exhalation but CO uptake is based solely on the CO concentration in the alveolar sample. Since VA calculated using mass balance tends to be larger than VA calculated traditionally in subjects with ventilation inhomogeneities this mean that DLCO calculated with a mass balance VA is also going to be proportionally larger as well.

This problem has concerned me for a while but what wasn’t clear was what difference should be expected in the VA (and DLCO) when it is calculated both ways. In order to figure this out I’ve taken a real-world example of a subject with severe COPD and calculated the difference in VA and DLCO.

Fortunately, my lab software lets me download the raw data for DLCO tests (volume, CH4, CO at 10 msec intervals) into a spreadsheet. The PFT results for the subject looked like this:

  Observed: %Predicted:
FVC (L): 2.39 97%
FEV1 (L): 0.66 36%
FEV1/FVC: 27 38%
     
TLC (L): 6.11 126%
FRC (L): 4.84 174%
RV (L): 4.04 171%
     
DLCO: 9.21 57%
VA (L): 3.19 68%
Vinsp (L): 2.32  

In order to use the mass balance approach with the spreadsheet I found that I could determine the start of exhalation after the breath-holding period but determining where the alveolar plateau started was much more difficult. For this reason I had to include the dead space but made adjustments for this when calculating VA.

To start off with, using the inspired volume and concentration of CH4 in the DLCO test gas mixture, the volume of inhaled CH4 was:

2.32 L x 0.003 = 6.96 ml.

Continue reading

What’s normal about airway resistance?

The question that was actually posed to me a month or so ago was “when is RAW abnormal?” I didn’t have a good answer at the time since airway resistance (RAW) tests are not performed by my lab. The pulmonary physicians I work with don’t think that RAW is a clinically useful measurement and for a variety of reasons I don’t disagree with this. Nevertheless, RAW testing is routinely performed in many labs around the world so I thought it would be interesting to spend some time researching this.

When asking what’s normal the first issue is which RAW value are you talking about? The measurement of airways resistance using a body plethysmograph was first described by DuBois et al in 1956. Airway resistance (RAW) is the amount of pressure required to generate a given flow rate and is reported in cm H2O/L/Sec. A number of physiologists quickly found that the reciprocal of RAW, conductance (GAW), which is expressed as the flow rate for a given driving pressure (L/sec/cm H2O), was also a useful way to describe the pressure-flow relationship of the airways.

For technical reasons TGV (Thoracic Gas Volume) must be measured at the same time as RAW. It was soon noted that there was a relationship between RAW and TGV and that airway resistance decreased as lung volume increased.

Continue reading

Thinking about the past

This is the time of the year when it’s traditional to review the past. That’s what “Auld lang syne”, the song most associated with New Year’s celebrations, is all about. I too have been thinking about the past but it’s not been about absent friends, it’s been about trend reports and assessing trends.

In the May 2017 issue of Chest, Quanjer et al reported their study on the post-bronchodilator response in FEV1. I’ve discussed this previously and they noted that the current ATS/ERS standard for a significant post-bronchodilator change of ≥12% and ≥200 ml penalized the short and the elderly. Their finding was that a significant change was better assessed by the absolute change in percent predicted (i.e. 8%) rather than a relative change.

I’ve thought about how this could apply to assessing changes in trends ever since then. The current standards for a significant change in FEV1 over time (also discussed previously) is anything greater than:

which is good in that it is a way to reference changes over any arbitrary time period but it also looks at it as a relative change (i.e. ±15%). A 15% change however, comes from occupational spirometry, not clinical spirometry, and the presumption, to me at least, is that it’s geared towards individuals who have more-or-less normal spirometry to begin with.

A ±15% change may make sense if your FEV1 is already near 100% of predicted but there are some problems with this for individuals who aren’t. For example, a 75 year-old 175 cm Caucasian male would have a predicted FEV1 of 2.93 L from the NHANESIII reference equations. If this individual had severe COPD and an FEV1 of 0.50 L (17% of predicted), then a ±15% relative change in FEV1 would ±0.075 L (75 ml). That amount of change is half the acceptable amount of intrasession repeatability (150 ml) in spirometry testing and it’s hard to consider a change this small as anything but chance or noise. It’s also hard to consider this a clinically significant change. Continue reading