This is the time of the year when it’s traditional to review the past. That’s what “Auld lang syne”, the song most associated with New Year’s celebrations, is all about. I too have been thinking about the past but it’s not been about absent friends, it’s been about trend reports and assessing trends.
In the May 2017 issue of Chest, Quanjer et al reported their study on the post-bronchodilator response in FEV1. I’ve discussed this previously and they noted that the current ATS/ERS standard for a significant post-bronchodilator change of ≥12% and ≥200 ml penalized the short and the elderly. Their finding was that a significant change was better assessed by the absolute change in percent predicted (i.e. 8%) rather than a relative change.
I’ve thought about how this could apply to assessing changes in trends ever since then. The current standards for a significant change in FEV1 over time (also discussed previously) is anything greater than:
which is good in that it is a way to reference changes over any arbitrary time period but it also looks at it as a relative change (i.e. ±15%). A 15% change however, comes from occupational spirometry, not clinical spirometry, and the presumption, to me at least, is that it’s geared towards individuals who have more-or-less normal spirometry to begin with.
A ±15% change may make sense if your FEV1 is already near 100% of predicted but there are some problems with this for individuals who aren’t. For example, a 75 year-old 175 cm Caucasian male would have a predicted FEV1 of 2.93 L from the NHANESIII reference equations. If this individual had severe COPD and an FEV1 of 0.50 L (17% of predicted), then a ±15% relative change in FEV1 would ±0.075 L (75 ml). That amount of change is half the acceptable amount of intrasession repeatability (150 ml) in spirometry testing and it’s hard to consider a change this small as anything but chance or noise. It’s also hard to consider this a clinically significant change. Continue reading
The 2005 ATS/ERS standards for assessing post-bronchodilator changes in FVC and FEV1 have been criticized numerous times. A recent article in the May issue of Chest (Quanjer et al) has taken it to task on two specific points:
- the change in FVC and FEV1 has to be at least 200 ml
- the change is assessed based on the percent change (≥12%) from the baseline value
The article points out that the 200 ml minimum change requires a proportionally larger change for a positive bronchodilator response in the short and the elderly. Additionally, by basing the post-BD change on the baseline value it lowers the threshold (in terms of an absolute change) for a positive bronchodilator response as airway obstruction become more severe. As a way of mitigating these problems the article recommends looking at the post-bronchodilator change as a percent of predicted rather than as a percent of baseline.
The article is notable (and its authors are to be commended) because it studied 31,528 pre- and post-spirometry records from both clinical and epidemiological sources from around the world. For the post-bronchodilator FEV1 and FVC:
- the actual change in L
- the percent change from baseline
- the change in percentage of predicted
- the Z-score
I had finished reviewing a pre- and post-BD spirometry report yesterday and was about to toss it on my out pile when I noticed something a bit odd about the post-BD results. I pulled it back and spent some time trying to decide if the interpretation needed to be changed but after a lot of internal debate I finally let it go as it was. I’ve continued to think about it however, and although I’m not sure that was the right decision I still haven’t come up with a clear answer.
Here’s what I saw:
The reported pre-BD and post-BD results were from good quality tests and met the criteria for repeatability. My problem is that the baseline results were normal but if I had seen the post-BD results by themselves I would have considered them to show mild airway obstruction.
I’ve been reviewing the literature on PFT interpretation lately and in doing so I ran across one of the issues that’s bothered me for a while. Specifically, my lab has been tasked with following the 2005 ATS/ERS guidelines for interpretation and using this algorithm these results:
would be read as mild airway obstruction.
Although it’s seems odd to have to call a normal FEV1 as obstruction I’ve been mostly okay with this since my lab has a number of patients with asthma whose best FVC and FEV1 obtained at some point in the past were 120% of predicted or greater but whose FEV1 frequently declines to 90% or 100% of predicted. In these cases since prior studies showed a normal FEV1/FVC ratio then an interpretation of a mild OVD is probably correct even though the FEV1 itself is well above the LLN, and this is actually the situation for this example.
Cigarette smoking raises the probability that an individual will get lung cancer, chronic bronchitis and/or emphysema (among many other things). Nicotine is addictive and smokers often need significant motivation in order to quit. Lung age is a tool that was designed to give smokers an additional incentive to do this. The concept is fairly simple and that is by reformulating an FEV1 reference equation it is possible to take an individual’s actual FEV1 and estimate the age of their lungs (ELA). Because cigarette smoking can cause airway obstruction it tends to mimic premature lung aging which means that when a smoker’s FEV1 is used to calculate an ELA it can be significantly greater than their real or chronological lung age (CLA).
This idea was first proposed by Morris and Temple in 1985. Using Morris et al’s 1971 spirometry reference equations they studied the effect of calculating an estimated lung age (ELA) using observed FVC, FEV1 and FEF25-75 values both singly and in combinations and found that the FEV1 had the lowest standard error. The ELA calculation based on Morris et al’s FEV1 reference equations has achieved a degree of popularity and is available on at least one personal spirometer (Pulmolife, sold by Carefusion, MDSpiro and Vitalograph) and as an on-line calculator from a couple different websites (Chestx-ray.com and Lung Foundation of Australia).
Interestingly, the effectiveness of ELA towards quitting smoking has been studied only a handful of times. One often-quoted study of smoking cessation (Parkes et al) saw double the quit rate (13.6% vs 6.4%) when ELA was used as an intervention but the study’s methodology has since been criticized and it’s results have not been duplicated.
Recently my lab has had some turnover with a couple of older staff leaving and new staff coming on board. While reviewing reports I’ve found a number of instances where the incorrect FVC and FEV1 were reported. Taking these as “teachable moments” I’ve been annoying the staff with emails whenever I find something notably wrong. I had thought that our rules for selecting the best FVC and FEV1 were fairly straightforward but given the number of corrections I’ve made lately it seemed like it would be a good idea to revisit our policy on this subject.
The process I’ve used for selecting the best FVC and FEV1 has evolved over the years. Initially I was told to select the single spirometry effort that had the largest combined FVC and FEV1. Later on test quality became a factor (not that is wasn’t in the beginning but there aren’t a lot of quality indicators for a pen trace on kymograph paper). How to juggle the different quality rules wasn’t altogether clear however (they seemed to change a bit with whichever physician was reviewing PFTs at the time), and I was still supposed to somehow select just a single spirometry effort.
Most recently this was simplified by only having to select the largest FVC (regardless of test quality) from any spirometry effort and then the largest FEV1 as long as it came from a spirometry effort with good quality. This is pretty much in accord with the ATS/ERS spirometry standards but with one important difference, and that is that we use use Peak Expiratory Flow (PEF) as an indicator of test quality.
Strictly speaking the ATS/ERS standards state that
“The largest FVC and the largest FEV1 (BTPS) should be recorded after examining the data from all of the usable curves, even if they do not come from the same curve.”
There are, of course, a number of quality indicators for spirometry efforts that are used to indicate whether a curve is “usable”. These include things like back-extrapolation, expiratory time, terminal expiratory flow rate and repeatability but the one thing they do not include is PEF.
Despite not being within the ATS/ERS standards the reason that we use PEF in the selection process is found in the phrase “maximal forced effort” that is part of the ATS/ERS definition for FVC and FEV1. It has long been recognized (certainly since the early 1980’s and most likely earlier) that the FVC and FEV1 from a submaximal spirometry effort were often higher than the FVC and FEV1 from a maximal effort. So, is the largest FEV1 correct (as long as it meets the basic ATS/ERS criteria) or should it be the FEV1 from the effort with the highest PEF?
These two efforts from the same patient testing session highlight this dilemma. Both meet the ATS/ERS criteria for the start of the test which is what primarily applies to FEV1 (and PEF).
Everyone uses the FEV1/FVC ratio as the primary factor in determining the presence or absence of airway obstruction but there are differences of opinion about what value of FEV1/FVC should be used for this purpose. Currently there are two main schools of thought; those that advocate the use the GOLD fixed 70% ratio and those that instead advocate the use the lower limit of normal (LLN) for the FEV1/FVC ratio.
The Global Initiative for Chronic Obstructive Lung Disease (GOLD) has stated that a post-bronchodilator FEV1/FVC ratio less than 70% should be used to indicate the presence of airway obstruction and this is applied to individuals of all ages, genders, heights and ethnicities. The official GOLD protocol was first released in the early 2000’s and was initially (although not currently) seconded by both the ATS and ERS. The choice of 70% is partly happenstance since it was one of two fixed FEV1/FVC ratio thresholds in common use at the time (the other was 75%) and partly arbitrary (after all why not 69% or 71% or ??).
The limitations of using a fixed 70% ratio were recognized relatively early. In particular it has long been noted that the FEV1/FVC ratio declines normally with increasing age and is also inversely proportional to height. For these reasons the 70% threshold tends to over-diagnose COPD in the tall and elderly and under-diagnose airway obstruction in the short and young. Opponents of the GOLD protocol say that the age-adjusted (and sometimes height-adjusted) LLN for the FEV1/FVC ratio overcomes these obstacles.
Proponents of the GOLD protocol acknowledge the limitation of the 70% ratio when it is applied to individuals of different ages but state that the use of a simple ratio that is easy to remember means that more individuals are assessed for COPD than would be otherwise. They point to other physiological threshold values (such as for blood pressure or blood sugar levels) that are also understood to have limitations, yet remain in widespread use. They also state that it makes it easier to compare results and prevalence statistics from different studies. In addition at least two studies have shown that there is a higher mortality of all individuals with an FEV1/FVC ratio below 70% regardless of whether or not they were below the FEV1/FVC LLN. Another study noted that in a large study population individuals with an FEV1/FVC ratio below 70% but above the LLN had a greater degree of emphysema and more gas trapping (as measured by CT scan), and more follow-up exacerbations than those below the LLN but above the 70% threshold.
Since many of the LLN versus GOLD arguments are based on statistics it would be useful to look at the predicted FEV1/FVC ratios in order to get a sense of how much under- and over-estimation occurs with the 70% ratio. For this reason I graphed the predicted FEV1/FVC ratio from 54 different reference equations for both genders and a variety of ethnicities. Since a number of PFT textbooks have stated that the FEV1/FVC ratio is relatively well preserved across different populations what I initially expected to see was a clustering of the predicted values. What I saw instead was an exceptionally broad spread of values.
[more] Continue reading
The patients whose reports I review have always been very accommodating. An issue of one kind or another catches my attention and before I know it I find several more reports that are similarly involved. Thanks to our patients I’ve had a number of reports come across my desk recently that showed a combination of restrictive and obstructive defects. This particular one may not be the best possible example but it seems to illustrate several points fairly well.
Interpreting results like this as combined (or mixed) defects using the ATS/ERS algorithm seems relatively straightforward.
From Brusasco V, Crapo R, Viegi G. ATS/ERS Task Force: Standardisation of pulmonary function testing. Interpretive strategies for lung function tests. Eur Respir J 2005; 26, page 956
The algorithm starts by using the FEV1/FVC ratio to determine whether obstruction is present and only then considers whether or not the FVC and TLC are normal. It occurred to me however, that this assumes that the normal range of the FEV1/FVC ratio is preserved when TLC decreases below normal. Given the markedly different causes of restrictive lung disease it would seem that saying that the FEV1/FVC ratio should remain within the normal range over a relatively broad range of lung capacities (and without necessarily knowing the cause for any reduction) seems a bit far-fetched. Interestingly enough however, it actually turns out to be reasonably true.
The current ATS/ERS standards for a positive bronchodilator response are an increase in FEV1 or FVC of ≥ 12% and ≥ 200 ml. These standards are largely based on the ability to detect a change that is far enough above the normal variability in FEV1 and FVC to be considered significant. One problem with this is that the amount of variability that is considered to be “normal” is overly influenced by a relatively small number of subjects that have a high degree of variability.
At least one group of investigators has suggested that a way around this is to subject all of an individual’s pre- and post-bronchodilator spirometry to statistical analysis in order to determine their coefficient of variability. Once this is known, the pre- and post-bronchodilator efforts can be assessed as a group to determine whether whether there has been a statistically significant change. Using this approach they were able to show that a rather large number of subjects that did not meet the ATS/ERS criteria did have a statistically significant improvement in FEV1.
But an increase that is statistically significant or one that is greater than normal variability is not the same thing as clinical significance. Numerous investigators have noted that patient can have a post-bronchodilator clinical improvement as shown by a decrease in dyspnea or an increase in exercise capacity without any notable change in FEV1 or FVC. Clinical significance is hard to measure however, particularly since which criteria should be used to measure it are unclear.
Long-term survival is certainly clinically significant and a recent article in Chest (Ward et al) has linked the increase in post-bronchodilator FEV1 to this fact. What these investigators have been able to show was that individuals with a post-bronchodilator increase in FEV1 that was 8% of predicted or greater showed a significantly better long-term survival than individuals with a smaller increase.
There are a couple of different ways to assess changes in FEV1 from one patient visit to another. For several decades my lab has used a change of >=10% and >=200 ml as the threshold for a significant change. Recently the ATS released standards for occupational spirometry that included an age-adjusted change in FEV1 of >=15% as the threshold for significant change. For the time being we have continued to use the 10% threshold when comparing results that are relatively close in time and are using the 15% threshold when they are separated by a much longer period. Since we haven’t actually gotten around to defining what is recent and what isn’t there is still a bit of uncertainty in how we apply this but even though there are differences in thresholds and how the numbers are calculated both approaches are essentially numerical. Recently a couple of reports crossed my desk that have caused me to wonder whether a qualitative change should also be a consideration.
In the 14 years between these two tests the FEV1 has decreased by 0.56 L or -12.6%. By the 10% threshold criteria this is a significant change but I think that 14 years is a reasonably long period of time and the age-adjusted change is only 5.1% which indicates this change is not significant.
In the year between these two tests the FEV1 has decreased by 0.22 L or 7.0%, which doesn’t meet either criteria for a significant change.
But what has changed between these tests is that in both instances the spirometry went from normal to showing mild obstruction. This is a qualitative change and I think it is likely significant.