A spirometry quality grading system. Or is it?

A set of guidelines for grading spirometry quality was included with the recently published ATS recommendations for a standardized pulmonary function report. These guideline are similar to others published previously so they weren’t a great surprise but as much as I may respect the authors of the standard my first thought was “when was the last time any of these people performed routine spirometry?” The authors acknowledge that the source for these guidelines is epidemiological and if I was conducting a research study that required spirometry these guidelines would be useful towards knowing which results to keep and which to toss but for routine clinical spirometry, they’re pretty useless.

I put these thoughts aside because I had other projects I was working on but I was reminded of them when I recently performed spirometry on an individual who wasn’t able to perform a single effort without a major errors. The person in question was an otherwise intelligent and mature individual but found themselves getting more frustrated and angry with each effort because they couldn’t manage to perform the test right. I did my best to explain and demonstrate what they were supposed to do each time but after the third try they refused to do any more. About the only thing that was reportable was the FEV1 from a single effort.

This may be a somewhat extreme case but it’s something that those of us who perform PFTs are faced with every day. There are many individuals that have no problems performing spirometry but sometimes we’re fortunate to get even a single test effort that meets all of the ATS/ERS criteria. The presence or absence of test quality usually isn’t apparent in the final report however, and for this reason I do understand the value in some kind of quality grading system. But that also implies that the grading system serves the purpose for which it is intended.

In order to quantify this I reviewed the spirometry performed by 200 patients in my lab in order to determine how many acceptable and reproducible results there were. To be honest, as bad as I thought the quality problem was, when I looked at the numbers it was worse than I imagined.

The spirometry quality grading system is:

Grade: Criteria:
A ≥3 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
B ≥2 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
C ≥2 acceptable tests with repeatability within 0.200 L (for age 2–6, 0.150 L ), or 10% of highest value, whichever is greater
D ≥2 acceptable tests with repeatability within 0.250 L (for age 2–6, 0.200 L ), or 10% of highest value, whichever is greater
E 1 acceptable test
F No acceptable tests

It’s important to note that this grading system is based primarily on the reproducibility of acceptable tests. Acceptable tests are:

  1. A good start of exhalation with extrapolated volume , <5% of FVC or 0.150 L, whichever is greater.
  2. Free from artifacts
  3. No cough during first second of exhalation (for FEV 1 )
  4. No glottis closure or abrupt termination (for FVC)
  5. No early termination or cutoff (for FVC)
  6. Maximal effort provided throughout the maneuver
  7. No obstructed mouthpiece

There were 703 spirometry tests from the 200 patients for an average of 3.5 tests per patient. The lowest number of tests performed was 3, the maximum was 7. Out of 200 patients, 50 patients (25%) were unable to perform a single, acceptable test and would have received an ‘F’ quality grade. Another 51 patients (26%) were able to perform one acceptable test and would have received an ‘E’ quality grade. Only 38 patients (19%) were able to perform three (or more) acceptable tests and receive a ‘A’ quality grade. The remaining 61 patients would have gotten a ‘B’, ‘C’ or ‘D’ quality grade.

The distribution of errors were (some efforts had more than one error):

Expiratory time < 6 seconds: 314
End-of-test: 268
FVC > 0.15 L or 10%: 201
FEV1 > 0.15 L or 10%: 126
PEF < 20% max 117
Back-extrapolation: 45
Pauses that affected FVC or FEV1: 43

It’s apparent from this that the biggest problem most patients have is with the length of their exhalation (EOT criteria, expiratory time and FIVC > FVC) and that this primarily impacts the FVC and not the FEV1. The number of factors that affect the FEV1 (back-extrapolation, peak flow, pauses) are a lot smaller. To some extent this doesn’t surprise me since I’ve always felt that in spirometry testing the FEV1 was more reliable than the FVC.

There is an additional point the quality grading system does not address, and that is composite results. Specifically, reporting the highest FVC (regardless of which effort it came from) along with the highest FEV1 which is allowed and even encouraged by the ATS/ERS spirometry standards. Composite results were reported for 69 out of the 200 patients (35%). I did not try to analyze these closely but I can say that 22 out of these 69 (32%) had no acceptable test efforts. Some fraction of these however, combined an effort with an acceptable FEV1 and an effort with an acceptable FVC but the grading system would still have given them an ‘F’.

Note: I didn’t try to correlate the number or type of spirometry errors with the technicians that performed their tests. Partly because I wasn’t interested, partly because which patient you get is usually the luck of the draw and partly because in the past when I was the lab manager I always took the toughest patients and probably would have had one of the highest error rates so there isn’t necessarily any correlation here.

I can’t prove it but I think that these statistics are reasonably representative of the experience in most PFT labs. Some labs are going to be better, some are going to be worse. I like to think that my lab is better than most but that’s purely subjective and regardless of how good (or bad) a lab’s staff are, in the final analysis it comes down to the patient’s ability to perform spirometry and that really isn’t as good as you might think it ought to be. To (badly) paraphrase Clauswitz, “even though spirometry is simple, when testing humans even the simple is very difficult.”

In the ICU there’s something called alarm fatigue where alarms are going off more or less continuously because a patient moved or because of bad connections or because the alarm limits are set too stringently (or whatever). Medical staff often become deaf to these alarms and stop paying attention to them, sometimes with adverse consequences for their patients.

So, the problem is that over 50% of my lab’s patients would have gotten an ‘E’ or and ‘F’ grade. If you were interpreting reports, how quickly would you get ‘alarm fatigue’ if those were the most common quality grades you saw? For that matter, how long would it take you to get the idea that your PFT lab was mostly staffed with incompetents?

I’m sure the authors of the quality grading system would argue that the results should be used as part of a quality improvement plan, and although I would agree with the sentiment, the reasons for suboptimal test quality (probably partly psychological, partly physiological and partly medical) are not easily quantifiable. In addition, what’s labeled a spirometry quality grading system is really a reproducibility grading system for ‘acceptable’ quality tests. I’m not going to say that this doesn’t serve a useful purpose but it should be labeled for what it is.

A problem that everyone who interprets pulmonary function results faces (with varying degrees of success since it is usually only acquired from experience) is assessing suboptimal quality tests in order to determine what parts are meaningful and informative, and what parts aren’t. Given that over half our patients would only have gotten and ‘E’ or an ‘F’ grade what would have been far more useful than a grading system would be official guidelines for determining the information content of suboptimal quality tests. A spirometry effort that doesn’t meet acceptability criteria may still have something useful to say about expiratory volume or flow rates. This in turn could be used to say something useful about the probable presence or absence of airway obstruction and restriction and allow us to at least salvage something out of suboptimal spirometry test quality.


Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26(2): 318-339.

Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Interpretive strategies for lung function tests. Eur Respir J 2005; 26(6): 948-968.

Graham BL, Coates AL, Wanger J et al. Recommendations for a standardized pulmonary function report. An official American Thoracic Society technical statement. Am J Respir Crit Care Med 2017; 196(11): 1463-1472

Creative Commons License
PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Lic

11 thoughts on “A spirometry quality grading system. Or is it?

  1. Good article.

    I have been in spirometry sales for 36+ years and coaching is the key. People just seem not to care in getting a proper effort out of the patient. Spirometers have become little more than a revenue enhancement device. The doctors don’t seem to care either.

    With regards to the grading system being A-F, SMH, bad choice. Americans see the ‘E’ or ‘F’ and they think something is wrong. Perhaps a 1-5 grading system would be better. In any event, it is not going to change the fact you have people that JUST DON’T CARE. They go to work to collect a paycheck.

    • William –

      I’ve visited a number of PFT labs and wouldn’t paint quite such a gloomy picture but I know little about what’s going on with office spirometry. You do put your finger on an important issue, though. Performing good quality spirometry is harder than it looks but there are no particular incentives towards putting any effort into it. There are also no particular penalties for performing poor quality spirometry and given the problems that patients can have with it anyway this can be hard to detect.

      I’m sure that office spirometry is being abused to some extent. Several of the spirometer companies have sections on their websites that explains what CPT codes should be used for billing along with how to calculate the payback period for the spirometer (which is usually only a couple months). There are probably many office practices that look at spirometry as a cash cow and could care less about test quality. There are other office practices however, that really are trying to serve their patients but don’t realize what goes into using and maintaining a spirometer properly (something too many spirometer companies downplay) and despite their best intention are also performing poor quality spirometry.

      I don’t see any easy solution since nobody is going to push to require certification or licensure in order to perform office spirometry. It’s possible that some umbrella organization like the AMA or the insurance companies might eventually create guidelines for office spirometry that would suggest/require some kind of certification (AARC or NIOSH) but I’m not going to hold my breath. Even then, as you point out, you can’t make people care about their job.

      Regards, Richard

  2. Richard,
    Thanks for talking about this.
    It appears that the major issue in your lab, I and I suspect it will be true for any lab using automated software to determine grading, has to with determining the EOT. It was your top two reasons for a unacceptable result ~ 52% of the total errors or at least 45% of all tests. Computer software can see if the person exhaled for 6 seconds(or 3 for kids) and if they reached a plateau (less than 25mL/sec) What that software can not detect is the FIRST ATS/ERS criteria for end of test…
    “End of test criteria
    It is important for subjects to be verbally encouraged to
    continue to exhale the air at the end of the manoeuvre to obtain
    optimal effort, e.g. by saying ‘‘keep going’’. EOT criteria are
    used to identify a reasonable FVC effort, and there are two
    recommended EOT criteria, as follows. 1) The subject cannot or
    should not continue further exhalation. Although subjects
    should be encouraged to achieve their maximal effort, they
    should be allowed to terminate the manoeuvre on their own at
    any time, especially if they are experiencing discomfort. The
    technician should also be alert to any indication that the patient
    is experiencing discomfort, and should terminate the test if a
    patient is becoming uncomfortable or is approaching syncope.
    2) The volume–time curve shows no change in volume
    (1 s, and the subject has tried to exhale for
    >3 s in children aged 6 s in subjects aged
    >10 yrs.”

    This means when we invalidate a test because there was no plateau, they did not exhale for 6 sec we are wrong if it was all the patient could do and was giving good effort. This is our job. We are the ones determining if there really was good effort or not. I could have a pt. that achieves all the other criteria for EOT but it was not a good effort and the computer would give it a good grade. It would be wrong. I can exhale all my volume out in about 4.5 seconds. I’ve got decades of results to show it from bio-QC. The computer gives me a failing grade every time even though I am completely empty. Until we as technologist have the ability to correct the EOT pass/fail to allow for this part of the EOT criteria we’ll always have this problem with any grading system that includes EOT. So, take heart, I believe it is highly unlikely that your lab is only putting out good quality test 55% of the time and what you are seeing is this “bias” or inability of software to judge good patient effort. I think it’s job security for us.

    • Ralph –

      You’re preaching to the choir. I may not have addressed it as such but the quality grading system was primarily for ‘acceptable’ tests and acceptability was rigidly defined. It specifically included the EOT and expiratory time criteria which is why I included it in the analysis even though I know it may have made my lab look bad. I know perfectly well there are many people whose spirometry does not meet the criteria for an acceptable test and yet it’s not only the best they are capable of, it’s clearly and reliably informative at the same time. I am sure that many of the paper’s authors are aware of this, yet for whatever reason they still went with what I presume to be the party line.

      Regards, Richard

  3. Dear Richard,

    It is nice to read someone who goes beyond the “black & white” of the text and uses critical thinking.

    As a vendor, I have actually “never” have tested anyone – but only been an observer to RT’s and other people during their trials and tribulations while trying to collect their FVC’s with their patients. I have seen the “alarms” that go off after every trial that does not meet “this” or “this” criteria.

    As some people look at me with frustration on not getting “A’s” I try and tell them that ATS criteria will not know when their patient is “GREEN” in the face and about to pass out. I remind them to please use their own professional judgement with the results – and to always take care of the patient first!

    We have definitely become a world of grading systems everywhere you go. As you said we are all trying to collect the best data possible. Each piece (FVC, FEV1, FIVC, PEF, TLC, FRC, etc) can give value IF there is confidence in the number i.e. is my TLC larger than my VA from the DLCO test? As you said – the FEV1 is probably the most important and repeatable part of an FVC tets, so why not just shift completely to perfomimg 2 repeatable SVC’s and then collect at least 3 acceptable FEV1’s. Most of the software these days will calculate FEV1/SVC. Looks like most mid-flow measurements have been put to the side when reading the most recent “Reporting” standards” anyway.

    So Richard, please keep reminding people to “think crtically” – it should make us all better at what we do.

  4. What about the comparability of Laboratory PFTs with handheld office spirometry like In2itive or the EasyOne?

    • Anthony –

      Both spirometers claim to meet or exceed the 2005 ATS/ERS spirometer requirements. The NDD EasyOne was evaluated relatively rigorously in 2008 (Barr RG et al, Reproducibility and validity of a handheld spirometer, Respir Care 2008; 53(4): 433-441. I’ve been unable to find a product evaluation for the In2itive but Vitalograph has a good reputation for quality spirometers. It would appear that either would perform spirometry with acceptable quality.

      The EasyOne and In2itive are both handheld models however, and one drawback is that there is little feedback for the technician during the test. Flow-volume loops and volume-time curves can be evaluated afterwards (on small LCD displays). Personally I find being able to watch the flow-volume loop and volume-time curve in real time helps me to be more confident while encouraging the patient and in the results themselves so I would prefer a PC/laptop-based spirometer, but that’s me.

      A bigger and perhaps more important issue is how well the software that comes with either spirometer integrates with any existing test systems you may have and whether you find the database / testing / review / reporting aspects of the software to meet your needs.

      Regards, Richard

      • Very helpful Richard, thank you.

        I was wondering even from a Research perspective, the comparability of parameters like FEV1 in the context of full PFT vs handheld (Esp. if not using EasyOne). The inability to verify the validity testing or to establish if any bias exists much less to what extent

        • Anthony –

          Over the course of a clinical trial or other research study the FEV1 measured by a handheld spirometer should probably be fairly stable. In research it wouldn’t matter so much what the absolute value or percent predicted FEV1 is but how much it changes over time and in which direction. For this reason handheld spirometers are probably adequate and I know of more than one company that provides handheld spirometers with e-diary functions specifically for clinical trials.

          Regards, Richard

  5. Hi Richard
    I am a Respiratory Scientist from Sydney with 18 years experience in PFTs. I also run spirometry training around Australia. I love to drop in and read your blog and have particular interest in improving the quality of spirometry tests and especially how to educate the primary care and occupational health sectors about achieving high quality tests. I support the grading scale for these industries to help them assess the quality and apply a standardised approach, however i also encourage them to record why the grade was given to help the interpreting Drs make use of the information. For example Grade C – due to cough. Or Grade D – due to EOT errors.
    What I am also finding is confusing for them is once the tests have been graded – they want direction in what happens next? In an ideal world I suppose we should repeat the tests with a different operator, however time and funding does not always support this. At this stage I encourage them to use the grading to determine if they should ‘interpret with caution’ or if results are ‘probably clinically reliable’. However I am interested in hearing other suggestions from industry 🙂
    Thanks for a great blog and keeping us thinking critically.

    • Sarah –

      FVC and FEV1 need to be graded using different criteria so a single grade is always going to miss the point in one way or another. Even then, as you’ve found, specific quality factors do not necessarily translate well to the ability to say “probably clinically reliable” or “garbage”. Although I’d like to see the next ATS/ERS spirometry standard include a discussion on quality that included the clinical relevance of suboptimal test quality I doubt this will happen and that is largely because PFT interpretation is actually far more difficult than it looks. For example, I find that the way I judge spirometry from a 20 year-old asthmatic differs from that of a 70 year-old patient with emphysema or, as importantly from that a 40 year old with tracheomalacia and that trying to explain these differences is hard.

      Regards, Richard

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.