FEV1 and VC should be measured separately

The FEV1 and VC both provide quite different information about a patient’s lungs. Unfortunately, spirometry as it is currently practiced is optimized towards generating an accurate FEV1 more than an accurate VC. This is partly due to limitations in the maneuver itself and partly due to the lack of accurate end-of-test criteria for an adequate VC. In one sense this is okay since more than one person that I’ve known and respected has said that “it’s all about the FEV1”.

Having said that, an accurate FEV1/VC ratio is essential for detecting and quantifying airway obstruction and an SVC maneuver is more likely to obtain a more accurate VC. This matters because the current ATS/ERS spirometry guidelines recommend that the FEV1/VC be reported, where the VC is the largest value obtained from any test and reference equations indicate that the SVC is routinely larger than the FVC:

So, shouldn’t we be routinely performing both FVC and SVC maneuvers when we do spirometry on our patients? And why aren’t we?

Continue reading

A modest proposal for a clinical spirometry grading system

A while back I reviewed the spirometry grading system that was included in the 2017 ATS reporting standards. My feeling was, and continues to be, that its usefulness is very limited because it’s mostly a reproducibility grading system that relies on a few easy-to-measure parameters. This doesn’t mean that a grading system can’t be helpful, just that it needs to be focused differently.

In a clinical PFT lab many patients have difficulty performing adequate and reproducible spirometry, but that doesn’t mean the results aren’t clinically useful. Moreover, suboptimal quality results may be the very best the patient is ever able to produce. So what’s more important in a grading system than reproducibility is the ability to assess the clinical utility of a reported spirometry effort.

The two most important results that come from spirometry are the FEV1 and the FVC, and I strongly believe that they need to be assessed separately. For each of these values there are two aspects that need to be determined. First, is there a reliable probability that the reported value is correct? Second, are any errors causing the reported value to be underestimated or overestimated? The two are inter-related since a value with excellent reliability is not going to have any significant errors, but if there are errors then a reviewer needs to know which direction the result is being biased.

The current ATS/ERS standards contain specific thresholds for certain spirometry values such as expiratory time and back-extrapolation. Although these are certainly indications of test quality they are almost always used in a binary [pass | fail] manner. In order to assess clinical usefulness however, you instead need to grade these on a scale. For example an expiratory time of 5.9 seconds for spirometry from a 60 year-old individual would mean that there is a small probability that the FVC is underestimated, but with an expiratory time of 1.9 seconds the FVC would have a very high probability of being underestimated and this needs to be recognized in order to assess clinical utility.

Note: Although the A-B-C-D-F grading system is rather prosaic it is still universally understandable, so I will use it for grading reliability. An A grade or an F grade are probably easy to assign but differentiating between B-C-D may be more subjective, particularly since reliability depends on multiple parameters and judging their relative contribution is always going to be subjective at some point. For bias, I will be using directional characters (↑↓) to show the direction of the bias (i.e. positive or negative), so ↑ will indicate probable overestimation, ↓ will indicate probable underestimation, and ~ indicates a neutral bias.

FEV1 / Back extrapolation:

Back-extrapolation is a way to assess the quality of the start of a spirometry effort and the accuracy of the timing of the FEV1. The ATS/ERS statement says that the back-extrapolated volume must be less that 5% of the FVC or less than 0.150 L, whichever is greater.

My experience is that an elevated back-extrapolation tends to cause FEV1 to be overestimated far more often than underestimated. So a suggested grading system for back-extrapolation would be (and I’ll be the first to admit these are off the top of my head and open for discussion):

Back-Extrapolation: Reliability: Bias:
Within standards: A ~
> 1 x standard, < 1.5 x standard: B
> 1.5 x standard, < 2 x standard C ↑↑
> 2 x standard, < 2.5 x standard: D ↑↑↑
> 2.5 x standard F ↑↑↑↑

Continue reading

Is gas trapping more common than we think it is?

Over the last couple of years I’ve run across a number of test systems that do not include tidal loops along with the maximal flow-volume loop. I’ve wondered why this was done and because of this I’ve thought a lot about tidal flow-volume loops and what additional information, if any, they add to spirometry interpretation.

One of my thoughts has been about the relationship between obesity and the IC and ERV. FVC and TLC are often reasonably preserved even with relatively severe obesity. FRC, on the other hand, is often noticeably affected with even minor changes in BMI (and interestingly this applies to reduced as well as elevated BMI’s). When FRC decreases because of obesity the IC usually increases and the ERV decreases and for this reason the IC/ERV ratio has been suggested as a way to monitor changes in FRC without having to actually measure lung volumes.

IC and ERV are not measured as part of spirometry but the position of the tidal loops gives at least a general indication of their magnitude and I’ve noticed that there’s a moderately good correlation between BMI and the position of the tidal loop.

With this in mind, I see up to a dozen reports a week with restrictive-looking spirometry (i.e. symmetrically reduced FVC and FEV1 with a normal FEV1/FVC ratio) on patients with a diagnosis of asthma. This is nothing new and there have probably been at least 10 articles in the last decade about the Restrictive Spirometry Pattern (RSP). Interpreting these kinds of spirometry results is always problematic, particularly when there are no prior lung volume measurements to rule-in or rule-out restriction. I’ve noticed however, that patients with a restrictive spirometry pattern almost always have the tidal loop on the far right-hand side of the flow-volume loop (zero or near zero ERV). For example:

Observed: %Predicted:
FVC: 1.65 74
FEV1: 1.21 73
FEV1/FVC: 73 100

But there doesn’t seem to be any relationship between this observation and the patient’s BMI and in fact, this is seen even when BMI is normal or somewhat reduced. Continue reading

Telling the right story

The 2005 ATS/ERS spirometry standard make it permissible and even recommends that the FVC and FEV1 be selected from different efforts. I disagree somewhat with their criteria for selecting the FEV1 but overall reporting composite results makes a lot of sense. In an ideal world we’d always get the best FVC and FEV1 in a single effort but what we more often get is a good FEV1 with a poor FVC or a poor FEV1 with a good FVC. So, it best serves the clinical needs of the patient to report the best elements from multiple spirometry efforts.

However, I was disappointed that the 2017 ATS reporting standards did not in any way address how to indicate that composite results are being reported, nor does it resolve the selection of the flow-volume loops and volume-time curves that accompany the numerical results. That leaves it to us to decide how to do this but this in turn is often limited by the capabilities of our equipment’s software.

One test system that I routinely take to a free spirometry screening clinic will only report the three “best” efforts based solely on the largest combined FVC + FEV1. Admittedly, to some extent this follows the 2005 ATS/ERS spirometry standards selection criteria but other than deleting a specific test effort I cannot override these selections nor can I mix and match the FVC and FEV1 values. This means that what it reports as the “best” effort doesn’t always agree with what in reality are the best results.

My lab’s software however, allows us to select which test efforts the FVC and FEV1 come from. In addition we can select which test effort the ancillary measurements (Peak Flow, Expiratory Time, FIVC, FEF50, etc.) and which effort the flow-volume loop and volume-time graphs comes from.

It is therefore possible to select the FVC, FEV1, ancillary measurements and the graphs from entirely different test efforts. Thankfully, this almost never done but when I review reports what I see most frequently is that the FVC is selected from one test effort, but the FEV1, ancillary measurements and graphs are selected from another. To some extent this makes sense because I’d usually agree that the Peak Flow should always be associated with the FEV1, and if that’s the case, then so should the flow-volume loop. The problem with this is that the FVC often comes from a test effort with a substantially longer expiratory time and when results are selected this the volume-time curve and expiratory time are instead reported for the effort the FEV1 came from.

This leads to a report that look like this:

Observed: Predicted: %Predicted:
FVC: 2.62 3.65 72%
FEV1: 2.01 2.58 78%
FEV1/FVC: 77 72 107%
Peak Flow: 8.83 6.73 131%
Exp. Time: 1.20

with graphs like:

Continue reading

I’ve got the old back-extrapolation blues

A couple days ago I pulled my copy of the Intermountain Thoracic Society manual on pulmonary function testing off the bookshelf and thumbed through it a bit. It was first published in 1975 and was the first major attempt towards standardizing the performance and interpretation of PFTs.

My first thought was that we’ve come a long way since then. Most importantly our understanding of what spirometry can (and cannot) tell us has improved dramatically.

Equipment too, has advanced since 1975, most particularly due to the first equipment standards that were published in that decade. As a reminder, spirometer accuracy was not a given and there are number of studies dating from that time period that detailed just how woefully inaccurate many of them were.

In 1975 computerized spirometers were exceptionally rare and I was reminded of this because 141 pages (two-thirds!) of the ITS manual is filled with look-up tables for predicted values and ATPS – BTPS – STPD conversion factors.

Most spirometry systems were entirely manual and the majority of us measured FVC and FEV1 manually from pen tracings on kymograph paper. The results were then hand-calculated and then hand-written onto report forms. Since our equipment is so much more accurate and our computers acquire and calculate test results automatically, everything is so much better now, isn’t it?

Overall, I’d have to say yes. Testing is much quicker and more accurate than it used to be in 1975, and no, I’m not particularly nostalgic about those days.

{Arrrhh, gather round lads and lasses and let me tell you of the days when coal-fired steam-powered spirometers rumbled and hissed in basement labs everywhere; when you had to solve regression equations with your slide rule on the fly or risk the horror of ripped kymograph paper, exploding alveolar sample bags and spirometer bells gone ballistic without warning. The toll this daily physical and mental trauma took amongst the lowly pulmonary techs was terrifying and only the bravest continued the daily battle against gnarly patients, sneering doctors, black-hearted administrators and monopolistic manufacturers…

…Oops! Wrong time-line; those are memories from the universe one north and two left of ours. Too much steampunk sci-fi late at night and too little sleep left me momentarily confused}

I ran across an error today that reminded me that although computerized test systems are essential to our ability to run efficient and accurate labs, at the same time the limitations of software that comes along with them hinders our ability to detect and correct errors.

Continue reading

A spirometry quality grading system. Or is it?

A set of guidelines for grading spirometry quality was included with the recently published ATS recommendations for a standardized pulmonary function report. These guideline are similar to others published previously so they weren’t a great surprise but as much as I may respect the authors of the standard my first thought was “when was the last time any of these people performed routine spirometry?” The authors acknowledge that the source for these guidelines is epidemiological and if I was conducting a research study that required spirometry these guidelines would be useful towards knowing which results to keep and which to toss but for routine clinical spirometry, they’re pretty useless.

I put these thoughts aside because I had other projects I was working on but I was reminded of them when I recently performed spirometry on an individual who wasn’t able to perform a single effort without a major errors. The person in question was an otherwise intelligent and mature individual but found themselves getting more frustrated and angry with each effort because they couldn’t manage to perform the test right. I did my best to explain and demonstrate what they were supposed to do each time but after the third try they refused to do any more. About the only thing that was reportable was the FEV1 from a single effort.

This may be a somewhat extreme case but it’s something that those of us who perform PFTs are faced with every day. There are many individuals that have no problems performing spirometry but sometimes we’re fortunate to get even a single test effort that meets all of the ATS/ERS criteria. The presence or absence of test quality usually isn’t apparent in the final report however, and for this reason I do understand the value in some kind of quality grading system. But that also implies that the grading system serves the purpose for which it is intended.

In order to quantify this I reviewed the spirometry performed by 200 patients in my lab in order to determine how many acceptable and reproducible results there were. To be honest, as bad as I thought the quality problem was, when I looked at the numbers it was worse than I imagined.

The spirometry quality grading system is:

Grade: Criteria:
A ≥3 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
B ≥2 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
C ≥2 acceptable tests with repeatability within 0.200 L (for age 2–6, 0.150 L ), or 10% of highest value, whichever is greater
D ≥2 acceptable tests with repeatability within 0.250 L (for age 2–6, 0.200 L ), or 10% of highest value, whichever is greater
E 1 acceptable test
F No acceptable tests

Continue reading

Thinking about the past

This is the time of the year when it’s traditional to review the past. That’s what “Auld lang syne”, the song most associated with New Year’s celebrations, is all about. I too have been thinking about the past but it’s not been about absent friends, it’s been about trend reports and assessing trends.

In the May 2017 issue of Chest, Quanjer et al reported their study on the post-bronchodilator response in FEV1. I’ve discussed this previously and they noted that the current ATS/ERS standard for a significant post-bronchodilator change of ≥12% and ≥200 ml penalized the short and the elderly. Their finding was that a significant change was better assessed by the absolute change in percent predicted (i.e. 8%) rather than a relative change.

I’ve thought about how this could apply to assessing changes in trends ever since then. The current standards for a significant change in FEV1 over time (also discussed previously) is anything greater than:

which is good in that it is a way to reference changes over any arbitrary time period but it also looks at it as a relative change (i.e. ±15%). A 15% change however, comes from occupational spirometry, not clinical spirometry, and the presumption, to me at least, is that it’s geared towards individuals who have more-or-less normal spirometry to begin with.

A ±15% change may make sense if your FEV1 is already near 100% of predicted but there are some problems with this for individuals who aren’t. For example, a 75 year-old 175 cm Caucasian male would have a predicted FEV1 of 2.93 L from the NHANESIII reference equations. If this individual had severe COPD and an FEV1 of 0.50 L (17% of predicted), then a ±15% relative change in FEV1 would ±0.075 L (75 ml). That amount of change is half the acceptable amount of intrasession repeatability (150 ml) in spirometry testing and it’s hard to consider a change this small as anything but chance or noise. It’s also hard to consider this a clinically significant change. Continue reading

Why the FEV1/FVC ratio LLN as a percent of the predicted FEV1/FVC ratio is important

My medical director and I had a discussion today about where the cutoff for a normal FEV1/FVC ratio would be for a 93 year old patient of his. Part of the problem is that there are almost no reference equations for patients this age and the best you can usually do is to extrapolate. Another part is that anybody in their 90’s is a survivor and must have had good lung function throughout their life to reach that age, which means that they aren’t average so it’s not clear how well extrapolation actually works in this population. The final part is that the guidelines for PFT interpretation that are used by my lab were put into place about 40 years ago and reflect the thoughts at that time. I updated part of the guidelines with the 2005 ATS/ERS interpretation algorithm about 10 years ago, but the thresholds for normalcy (as well as the reference equations we use) still haven’t changed all that much. I’ve brought this issue up a number of times over the years (usually every time I get a new medical director) but haven’t gotten a consensus from the pulmonary physicians on either the need for change or for what threshold values should be used.

Anyway, both my medical director and I felt felt that the LLN for the FEV1/FVC ratio (when viewed as a percent of the predicted FEV1/FVC ratio) is probably lower for a 75 year old (and certainly for a 93 year old) than it is for a 25 year old, and that the current lab guidelines for interpretation were probably diagnosing airway obstruction in the elderly more often than they should. My lab currently uses the NHANESIII reference equations for spirometry however, and I wasn’t sure they showed this particularly well since the equations for the FEV1/FVC ratio and its LLN are quite simplistic compared to those for FVC and FEV1.

The NHANESIII reference equations were published in 1999 and at that time they were derived from the largest population that had ever been studied (7428 subjects, 40.9% male, 59.1% female) and with the most sophisticated statistical analysis that had been used up until that time. In 2012 however, the Global Lung Function Initiative (GLI) released a set of reference equations using data obtained from 73 centers world-wide on 97,759 subjects (44.7% male, 55.3% female). Statistical analysis of the GLI data was performed using the Lambda, Mu, Sigma (LMS) approach and a set of equations were derived that covered ages 3 to 95.

I have some reservations about how well the GLI equations match the population served by my lab but it’s a moot point whether I like them or not since even now, 5 years after the GLI equations were published, my lab’s software has not been updated to include them. The reason for this is that the GLI spirometry equations use what are called “splines” to generate the spirometry reference values and these are taken from a look-up table. My lab’s software does have an equation editor but it will not accommodate lookup tables so the GLI equations can’t be added. I’m sure our equipment manufacturer could get around this if they really wanted to, but so far it hasn’t happened.

I do have a lot of respect for the GLI equations however, and think that the overall view they give of the normal distribution of FVC, FEV1 and the FEV1/FVC ratio is far more correct than those of any prior studies. Using a spreadsheet tool downloaded from the GLI that lets me generate the GLI spirometry predicted values and the NHANESIII reference equations I decided to take a closer look at their predicted FEV1/FVC ratios and their LLNs.

Continue reading


A couple weeks ago I was asked whether it was safe for a patient with an abdominal aortic aneurysm (AAA) to have pulmonary function testing. My first thought was that it was probably unsafe but after a moment or two of thought I realized that I hadn’t reviewed the subject for a long time. When I checked the 2005 ATS/ERS general testing guidelines (there are no contraindications in the 2005 spirometry guidelines) I found that AAA wasn’t mentioned at all. In fact, the only absolute contraindication mentioned was that patients with a recent myocardial infarction (<1 month) should not be tested. Some relative contraindications were mentioned:

  • chest or abdominal pain
  • oral or facial pain
  • stress incontinence
  • dementia or confusional state

and activities that should be avoided prior to testing include:

  • smoking within 1 hour of testing
  • consuming alcohol within 4 hours of testing
  • performing vigorous exercise within 30 minutes of testing
  • wearing clothing that restricts the chest or abdomen
  • eating a large meal with 2 hours of testing

but these were factors where test results were likely to be suboptimal and not actually contraindications.

This got me curious since I thought that pulmonary function testing was contraindicated for more conditions than just an MI. I reviewed the 1994 and and then the 1987 ATS statements on spirometry but again found no mention of contraindications. Ditto on the 1993 ERS statement on spirometry and lung volumes. Finally, in the 1996 AARC clinical guidelines for spirometry I found a much longer list of contraindications:

  • hemoptysis of unknown origin
  • pneumothorax
  • recent mycardial infarction
  • recent pulmonary embolus
  • thoracic, abdominal or cerebral aneuysms
  • recent eye surgery
  • presence of an acute disease process that might interfere with test performance (e.g. nausea, vomiting)
  • recent surgery of thorax or abdomen

So where did the AARC’s list of contraindications come from? And why is there such a discrepancy between the ATS/ERS and the AARC guidelines?

Continue reading

Assessing post-BD improvement in FEV1 and FVC as a percent of the predicted

The 2005 ATS/ERS standards for assessing post-bronchodilator changes in FVC and FEV1 have been criticized numerous times. A recent article in the May issue of Chest (Quanjer et al) has taken it to task on two specific points:

  • the change in FVC and FEV1 has to be at least 200 ml
  • the change is assessed based on the percent change (≥12%) from the baseline value

The article points out that the 200 ml minimum change requires a proportionally larger change for a positive bronchodilator response in the short and the elderly. Additionally, by basing the post-BD change on the baseline value it lowers the threshold (in terms of an absolute change) for a positive bronchodilator response as airway obstruction become more severe. As a way of mitigating these problems the article recommends looking at the post-bronchodilator change as a percent of predicted rather than as a percent of baseline.

The article is notable (and its authors are to be commended) because it studied 31,528 pre- and post-spirometry records from both clinical and epidemiological sources from around the world. For the post-bronchodilator FEV1 and FVC:

  • the actual change in L
  • the percent change from baseline
  • the change in percentage of predicted
  • the Z-score

were determined.

Continue reading