PUFF

PUFF : an expert system for interpertation of pulmonary function data

Janice S. Aikins, John C. Kunz, Edward H.
Shortliffe and Robert J. Fallat
CS-TR-82-931
September 1982

Heuristic Programming Project
Departments of Medicine and Computer Science
Stanford University

실세계에 인공지능기술을 적용하여 약속된 연구결과를 가져왔지만 전문가 영역에서 유용하게 사용되지는 못해왔다. 그러나 DENDRAL, MOLGEN 같은 예외도 있었다. 여기서는 폐기능검사 결과로 나온 데이터를 해석하는 프로그램인 PUFF를 소개하는데 큰병원의 폐기능검사실에서 사용된다. 좀더 나은 성공적인 결과가 나오기에는 몇가지 중요한 한계점이 있어 앞으로 더 많은 연구를 요구한다.

Abstract

The application of Artificial Intelligence techniques to real-world problems has produced promising research results, but seldom has a system become a useful tool in its domain of expertise. Notable exceptions are the DENDRAL [I] and MOLGEN [2] systems. This paper describes PUFF, a program that interprets lung function test data and has become a working tool in the pulmonary physiology lab of a large hospital. Elements of the problem that paved the way for its success are examined, as are significant limitations of the solution that warrant further study.

1. Introduction

Researchers in the field of Artificial Intelligence (Al) are just beginning to produce systems that capture the specialized knowledge of experts and that use this knowledge to perform difficult tasks. Although the technology is still rather new, a small set of programs now exist as 뱓ools?useful for building these so-called 밻xpert systems? This paper describes an expert system, called PUFF, which was built using EMYCIN, a generalization of an earlier medical system named MYCIN. The task chosen for PUFF is described briefly, and the rationale for the appropriateness of this choice is presented. PUFF was initially developed on the SUMEX computer, a large research machine at Stanford University, and was later rewritten in a production version to run on the hospital% own mini-computer. We describe here the history of the PUFF project and its current status, including observations about its limitations and successes. We also take a brief look at the knowledge representation and control structure used for the SUMEX version of the system. Finally, the results of a formal evaluation of the production version of PUFF are presented. d

2. Task

PUFF interprets measurements from respiratory tests administered to patients in the pulmonary (lung) function laboratory, at Pacific Medical Center in San Francisco. The laboratory includes equipment designed to measure the volume of the lungs, the ability of the patient to move air into and out of the lungs, and the ability of the lungs to get oxygen into the blood and carbon dioxide out.?The pulmonary physiologist interprets these measurements in order to determine the presence and severity of lung disease in the patient. An example of such measurements and an interpretation statement are shown in Fig. 1. The test measurements listed in the top half of the figure are collected by the laboratory equipment. The pulmonary physiologist then dictates the interpretation statements to be included in a typewritten report. All of the measurements are given as a percent of the predicted values for a normal patient of the same sex, height, and weight. The interpretation and final diagnosis are a summary of the reasoning about the combinations of measurements obtained in the luhg tests.

3. Rationale

PUFF's task is to interpret such a set of pulmonary function (PF) test results, and to produce a set of interpretation statements and a diagnosis for the patient. The problem of developing an automated pulmonary function interpretation system was chosen for several reasons:

(1) The interpretation of pulmonary function tests Is a problem that occurs daily in most hospitals, so a computer program that captures the expertise involved in interpreting these tests, and that can assist in providing interpretations, fills a practical need.

(2) The biome ca researchers at Pacific Medical Center (PMC) were interested in di I the problem and were eager to work with us on developing a solution. It was possible that such a system could enhance the effectiveness of patient care and the pulmonary physician% efficiency. In addition, solution of this relatively simple interpretation problem could identify possibilities for further research into more difficult Interpretation tasks.

(3) PF data interpretation was a problem which the Artificial Intelligence researchers were particularly interested in solving in order to demonstrate the generality and power of expert system techniques. Putting a system into clinical use would contribute to the credibility of those techniques, and also would show their promise and limitations in clinical practice. Earlier Al programs had demonstrated competence, but their use had required large amounts of professional time simply for data input. Puff, however, produced PF interpretations automatically without the necessity for user interaction. Thus we hoped that PUFF would be used by the clinical staff.

(4) PF data in erpretation was a problem which was large enough to be interesting t (the biomedical researchers did not know how to solve it, and the Al researchers did not know whether their techniques would be appropriate) and small enough that a pilot project of several months?duration could concretely demonstrate the feasibility of a longer development effort. Furthermore, the amount of domain-specific knowledge involved in pulmonary function testing is limited enough to make it feasible to acquire, understand, and represent that knowledge.

(6) The domain of pulmonary physiology is a circumscribed field: the data needed to interpret patient status are available from the patient뭩 history and from measurements taken in a single laboratory. Other large bodies of knowledge are not required in order to produce accurate diagnoses of pulmonary disease in the patient. All the da t a used in the laboratory at PMC were already available in a computer; the computer data were known to be accurate, reliable, and relevant to the interpretation task. The clinical staff in the PF lab were already receptive to the use of computers within their clinical routines.

(7) Pulmonary physiologists who interpret test measurements tend to phrase their interpretations similarly from one case to the next. One goal of PUFF was to generate reports from a set of prototypical interpretation statements, thus saving the staff a great deal of tedious work. The staff themselves would not be displaced by this tool because their expertise still would be necessary to verify PUFF뭩 output, to handle unexpected complex cases, and to correct interpretations that they felt were inaccurate.

4. Project History and Status

This research developed from work done on the MYCIN system [3]. That program used a knowledge base of production rules [4] to perform infectious disease consultations. PUFF was initially built using a generalization of the MYCIN system called EMYCIN [6]. EMYCIN, or lBEssentiai MYCIN? consists of the domain-independent features of MYCIN, principally the rule interpreter, explanation, and knowledge acquisition modules [6]. it provides a m뭙chanism for representing domain-specific knowledge in the form of production rules, and for performing consultations in that domain. Just as MYCIN consists of EMYCIN plus a set of facts and rules about the diagnosis and therapy of infectious diseases, PUFF is comprised of the EMYCIN programs plus a pulmonary disease knowledge base.
EMYCIN (and hence the EMYCIN version of PUFF) is written in INTERLISP 173 and runs on a DEC Ki-10 at the Stanford SUMEX-AIM computer facility. in order to run PUFF on a POP-1 1 at Pacific Medical Center, a second version of the program was created after the EMYCIN version had been refined. This was done by translating the production rules into procedures and writing them in the BASIC language. Conversion to BASIC was an advantage because the POP-1 1 was located on the same site as the laboratory, and its schedule could be easily controlled to support production operation by the system users. However, as a result of the conversion, the production and development versions of PUFF became incompatible, and modifications made to one system were sometimes difficult to make in the other.
The POP-1 1 version is now routinely used in the pulmonary function laboratory and provides lung test interpretations for about ten patients daily. Since the system became operational in 1979, it has interpreted the results of over 4000 cases. The BASIC code is currently being converted again so that it will run on a dedicated personal computer.
The form of the interpretations generated by PUFF is shown in Fig. 2. This 몉eport is for the same patient as in Fig. 1, seen several years later. As in the typed report, the pulmonary function test data are set forth, followed by the interpretation statements and a pulmonary function diagnosis. The pulmonary physiologist checks the PUFF report and, if necessary, the interpretation is edited on-line prior to printing the final report for physician signature and entry into the patient record. Approximately 85% of the reports generated are accepted without modifications. The change made to most others simply adds a statement suggesting that the patient뭩 physician compare the interpretation with tests taken during previous visits. For example, statements such as 밫hese test results are consistent wlth those of previous vIsiW or 밫hese test results show considerable-Improvement ovet those in the previous visit?might be made. PUFF was not designed to represent knowledge about multiple visits, so this kind of statement must be added by the - pulmonary physician.

5. Observations

PUFF is a practical assistant to the pulmonary physiologist, and thus is a satisfactory and exciting result of the research done with production rule consultation systems. PUFF's performance is good enough that it is used daily in clinical service, and it has the support of both the hospital staff and its administration. However, limitations are recognized in the following areas:30 representation of prototypical patterns, I addition or modification of rules to represent knowledge not previously encoded, I alteration of the consultation, and order in which information iS requested during the E explanation of system performance.
The first point refers to the fact that many cases can be viewed as relatively simple variations of typical patterns. PUFF does not recognize that a case fits a typical pattern, nor can it recognize that a case differs in some important way from typical patterns. As a result, PUFF뭩 explanations of its diagnoses lack some of the richness of explanation that physicians can use when a case meets, or fails to meet, the expectations of a prototypical case. The medical knowledge in PUFF is encoded as Vuies? Rules encode relatively small end independent bodies of domain knowledge. The rule formalism makes modification of the program뭩 knowledge much easier than when that knowledge is embedded in computer code.
Howe-vet, additions or modifications to the rules as referred to in the second point have caused difficulties because changes to one rule sometimes affect the behavior. of other rules in unanticipated ways. The last two points apply only to the EMYCIN version of PUFF which runs interactively in a consultation style, question and answer mode with the user. in that system, questions are sometimes asked in an unusual order, and explanations of both the final interpretation, and of the questions being asked of the user, need to be improved.
Even though PUFF does exhibit certain limitations, the representation of pulmonary knowledge as production rules allows the encoding of interpretive expertise which previously was difficult to define because it is heuristic knowledge of the expert. EMYCIN on the SUMEX computer provided an excellent environment for acquiring, encoding, and debugging this expertise. However, it would have been inefficient and somewhat impractical to use the EMYCIN version of PUFF in a hospital setting. The simplicity of EMYCiN뭩 reasoning process made the translation into BASIC procedures a feasible task, thus allowing the hospital% own computer staff to take over maintenance of the system. The BASIC version of PUFF runs in 밷atch?mode and does not require interaction with a physician. We-believe that this system was readily accepted by the pulmonary staff for several reasons: First, the program뭩 interpretations are consistently accurate. Second, explanations of diagnoses are appropriately detailed so that the user has confidence in the accuracy of correct diagnoses and enough information with which to recognize and modify incorrect diagnoses. Third, less physician time is required to produce consistently, high quality reports using the system than is required to analyze and dictate case reports without it. Finally, the program is well integrated into the routine of the laboratory; its use requires very little extra technician effort.

6. Overview of EMYCIN-PUFF

6.1 Knowledge Representation

The knowledge base of the EMYCIN-PUFF system consists of (a) a set of 64 production rules dealing with the interpretation of pulmonary function tests and (b) a set of 69 dlnlcal parameters. The production version (BASIC-PUFF) has been extended to include 400 production rules and 76 clinical parameters. The clinical parameters in EMYCIN-PUFF represent pulmonary function test results (e.g., TOTAL LUNG CAPACITY and RESJDUAL VOLUME), patient data (e.g., AGE and REFERRAL DIAGNOSIS), and data which are derived from the rules (e.g., FINDINGS associated with a disease and SUBNPES associated with the disease). There may be auxiliary information associated with the clinical parameters, such as a list of expected values and an English translation used in communicating with the user.
The production rules operate on associative (attribute-object-value) triples, where the attributes are the clinical parameters, the object is the patient, and the values are given by the patient data and lung test results. Questions are asked during the consultation in an attempt to fill in values for the parameters. The production rules consist of one or more clpremiseI?clauses followed by one or more
밶ctiontfl clauses . Each premise is a conjunction of predicates operating on associative triples in the knowledge base. A sample PUFF production rule is shown in Fig. 3. The rules are coded internally in LISP. The user of the system sees the production rules in their English form which is shown first in the figure. The English version is generated automatically from templates, as is described in [S].

6.2 Control Structure

The EMYCIN-PUFF control structure is primarily a goal-directed, backward chaining of production rules. The goal of the system at any time is to determine a value for a given clinical parameter. To conclude a value for that clinical parameter, it tries a pre-computed list of rules whose actions conclude values for the clinical parameter (refer to [6] for details). if the rules fail to conclude a value for a parameter, a question is then asked of the user in order to obtain that value. An exception to this process occurs for parameters labeled ASKFIRST parameters. These represent information generally known by the user, such as results of pulmonary function tests. For these parameters it is more efficient simply to ask a consultation question than to attempt to infer the information by means of ruies.4

7. Evaluation of the BASIC-PUFF Performance System

The knowledge base from the original performance version of PUFF was tested on 107 * cases chosen from files in the pulmonary function laboratory at Pacific Medical Center.
Those 107 cases formed a representative sample of the various pulmonary diseases, their degrees and subtypes. Modifications were made to the knowledge base and the cases were tried again. This iteration continued until our collaborating expert was satisfied that the system뭩 interpretations agreed with his own. At this point the system was 61frozenl~and a new set of 144 cases was selected and interpreted by the system. Ail 144 cases also were interpreted separately by two pulmonary physiologists (the expert working with us and a physician from a different medical center).
The results of the comparison of interpretations by each diagnostician are presented in the table in Fig. 4. The table compares %losei6 agreement in diagnosing the severity of the disease, where 밹iose~~ is defined as differing by at most one degree of severity. Thus, for example, diagnoses of mild (degree=l) and moderate (degree=2) are considered close, while mild and severe (degree=S) are not. Further, a diagnosis of normal is not considered to be close to a diagnosis of a mild degree of any disease.
The table shows that that the overall rate of agreement between the two physiologists on the diagnoses of disease was 92%. The agreement between PUFF and the physician who served as the expert td develop the PUFF knowledge base (MD-1 in the table) was 96%. Finally, the agreement between PUFF and the physician who had no part in the development of the PUFF knowledge base (MD-2) was 89%. Fig. 5 shows the distribution of diagnoses by each diagnostician, The number of diagnoses made by each diagnostician does not total 144 because patients were often diagnosed as having more than one disease.

8. Conclusions

The PUFF research has demonstrated that if the task, domain, and researchers are carefully matched, then the application of existing techniques can result in a system which successfully performs a moderately complicated task of medical diagnosis. Success of the program can be measured not only in terms of the system뭩 technical performance, but equally importantly, by the ease and practicality of the system뭩 day-to-day use in the lab fdr which it was designed, Rule-based representation allowed easy codification and later modification of expertise, and the simplicity of the rule interpreter in the INTERLISP version facilitated translation into BASIC and implementation on the hospital뭩 own POP-1 1 machine.
Using EMYCIN allowed the researchers to move quickly from a point where they found it difficult even to describe the diagnostic process to a point where a simple diagnostic model was implemented. Having a diagnostic model allowed them to focus on individual issues in order to improve that model. Although PUFF does not itself represent new Artificial Intelligence techniques, its success is a testimonial for EMYCIN. In addition, its simplicity has facilitated careful analysis of EMYCIWs rule representation and control structure and has led to other productive research efforts <[8] and [O]).

Acknowledgements

The PUFF research team consists of an interdisciplinary group of physicians and computer scientists. In addition to the authors, these have included Larry Fagan, Ed Feigenbaum, Penny Nii, Dr. John Osborn, Dr. 6. J. Rubin, and Dianne Sierra. We also thank Dr. B.A. Votteri for his help in evaluating PUFF% performance, and Doug Aikins for his editorial help with this paper.
This research was funded in part by NIH grants MB-00134 and GM-24669. Computer facilities were provided-by the SUMEX-AIM facility at Stanford University under NIH grant RR-00786 Dr. Shortliffe is supported by research career development award LM-00048 from the National Library of Medicine. Dr. Aikins was supported by the Xerox Corporation under the direction of the Xerox Palo Alto Research Center.

Legends to Figures

Figure 1. Verbatim copy of pulmonary function report dictated by physician
Figure 2. Pulmonary function report generated by POP-Y 1 version of PUFF
Figure 3. A PUFF production rule in English and LISP versions
Figure 4. Summary of percent agreement in 144 cases
Figure 6. Number of diagnoses by each diagnostician for 144 cases

PRESBYTERIAN HOSPITAL OF PMC
CLAY AND BUCHANAN, BOX 7999
SAN FRANCISCO, CA. 94128
PULMONARY FUNCTION LABI

The vital CAPACITY is low, the residual volume is high as is the total lung capacity, indicating air trapping and overinflation. This is consistent with a moderately severe degree of airway obstruction as indicated by the low FEVl, low peak flow rates and curvature to the flow volume loop. Following isoproterenol aerosol there is virtually no change. The diffusing capacity is low indicating loss of alveolar capillary surface.
Conclusion: Overinflation, fixed airway obstruction and low diffusing capacity would all indicate moderately severe obstructive airway disease of the emphysematous type. Although there is no response to bronchodilators on this one occasion, more prolonged use may prove to be more helpful.
PULMONARY FUNCTION DIAGNOSIS: OBSTRUCTIVE AIRWAY DISEASE, MODERATELY SEVERE, EMPHYSEMATOUS TYPE

FIGURE 1.

PRESBYTERIAN HOSPITAL OF PMC
CLAY AND BUCHANAN, BOX 7999
SAN FRANCISCO, CA. 94128
PULMONARY FUNCTION LAB

., INTERPRETATION: ELEVATED LUNG VOLUMES INDICATE OVERINFLATION. IN ADDITION, THE RV/TLC RATIO IS INCREASED, SUGGESTING A MODERATELY SEVERE DEGREE OF AIR TRAPPING.
THE FORCED VITAL CAPACITY IS NORMAL. THE FEVl/FVC RATIO AND MID-EXPIRATORY FLOW ARE REDUCED AND THE AIRWAY RESISTANCE IS INCREASED, SUGGESTING MODERATELY SEVERE, AIRWAY OBSTRUCTION. FOLLOWING BRONCHODILATION, THE EXPIRED FLOWS SHOW MODERATE IMPROVEMENT. HOWEVER, THE RESISTANCE DID NOT IMPROVE. THE LOW DIFFUSING CAPACITY INDICATES A LOSS OF ALVEOLAR CAPILLARY SURFACE, WHICH IS MILD.
CONCLUSIONS: THE LOW DIFFUSING CAPACITY, IN COMBINATION WITH OBSTRUCTION AND A HIGH TOTAL LUNG CAPACITY IS CONSISTENT WITH A DIAGNOSIS OF EMPHYSEMA. ALTHOUGH BRONCHODILATORS WERE ONLY SLIGHTLY USEFUL IN THIS ONE CASE, PROLONGED USE MAY PROVE TO BE BENEFICIAL TO THE PATIENT.
PULMONARY FUNCTION DIAGNOSIS: - 1. MODERATELY SEVERE OBSTRUCTIVE AIRWAYS DISEASE.
EMPHYSEMATOUS TYPE.

FIGURE 2,

RULE811

If: 1) A: The mmf/mmf-predicted ratio is between 35 and 45, and
B: The fvc/fvc-predicted ratio is greater than 88, or

2) A: The mmf/mmf-predicted ratio is between 25 and 35, and
B: The fvc/fvc-predicted ratio is less than 88

Then : 1) There is suggestive evidence (.5) that the degree of obstructive airways disease as indicated by the MMF is moderate, and

2) It is definite (1.8) that the following is one of the findings about the diagnosis of obstructive airways disease: Reduced mid-expiratory flow indicates moderate airway obstruction.

PREMISE: [SAND (SOR (SAND (BETWEEN* (VALl CNTXT MMF) 35 45)
                                             (GREATERP* (VALl CNTXT FVC) 88))
                      (SAND (BETWEEN* (VALl CNTXT MMF) 25 35)
                                              (LESSP* (VALl CNTXT FVC) 881
ACTION : (DO-ALL (CONCLUDE CNTXT DEG-MMF MODERATE TALLY 588)
                      (CONCLUDETEXT CNTXT FINDINGS-OAD
                                               (TEXT $MMF/FVC2) TALLY 1888))

FIGURE 3.

FIGURE 5

Footnotes

1. Measurements include spirometry and, optionally, body plethysmography, singie-breath CO diffusion capacity, and arterial blood gases. Measurements can be made at rest, following inhalation of a bronchodiiator, and during exercise.
2. This was a problem in MYCIN, a related system for determining the diagnosis and therapy for infectious disease cases. The results produced by the system often suffered because it lacked knowledge about related diseases that were also present in the patient.
3. Many of these problems are also present in other rule-based systems; they motivated the development of the experimental CENTAUR system [8].
4. in the BASIC version of PUFF implemented at PMC, all of the test data is known ahead of time so that 밶sking a question?merely entails retrievlng another datum from a stored file.

References

[l] Buchanan, 8. G., and Feigenbaum, E. A. DENDRAL and Meta-DENDRAL: Their Applications Dimension. Art/f/c/al Intelligence 7 I( 1,2) (1978), pp. 6-24.
[2] Friedland, P. Knowledge-based Experiment Design in Molecular Genetics. Proceedings of the Sixth international Joint Conference on Artificial Intelligence, 1979, pp. 285-287.
[3] Shortliffe, E. H. Computer-Based Medical Consultations: MYCIN. New York: American-Eisevier, 1976.
[4] Davis, FL, and King, J. An Overview of Production Systems; In Machine Intelligence (E. W. Elcock and 0. Mlchie, Eds,), Vol. 8, pp.300-332. New York: Wiley & Sons, 1977.
[5] vanhrlelle, W. EMYCIN: A Domain-independent Production-rule System for Consultation Programs. STAN-CS-80-820, Stanford University, June 1980.
[6] Shortliffe, E. H., Davis, R., Buchanan, B., Axline, S., Green, C., Cohen, S. Computer-based Consultations In Clinical Therapeutics -- Explanation and Rule Acquisition Capabilities of the MYCIN System. Computers and Biomedical Research, 8 (19751), pp. 303-320.
[7] Teitelman, W. INTERLISP Reference Manual. Xerox Palo Alto Research Center, Palo Alto, Ca., October 1978.
[8] Aikins, J. S. Prototypes and Product ion Rules: A Knowledge Representation for Computer Consultations. STAN-CS-80-8 14, Stanford University, August 1980.
[9] Smith, 0. E., and Clayton, J. E. A Frame-based Production System Architecture. Proceedings of the First Annual National Conference on Artificial Intelligence, 1980, pp. 164-l 66.