Article Navigation
Article Contents
-
Abstract
-
Approach 1: Conditional Probability Equations
-
Approach 2: Tree Diagrams
-
Approach 3: Natural frequencies
-
Conclusion
-
Funding
-
Acknowledgments
-
References
- < Previous
- Next >
Journal Article Editor's Choice
Annette M. Molinaro Department of Neurological Surgery , University of California , San Francisco, San Francisco, California ; Department of Epidemiology and Biostatistics , University of California , San Francisco, San Francisco, California Corresponding Author: Annette M. Molinaro, PhD, UCSF Department of Neurosurgery, 400 Parnassus Ave A850b, Room A 808, San Francisco CA 94143-0372 (annette.molinaro@ucsf.edu). Search for other works by this author on: Oxford Academic
Neuro-Oncology Practice, Volume 2, Issue 4, December 2015, Pages 162–166, https://doi.org/10.1093/nop/npv030
Published:
07 September 2015
Article history
Received:
12 May 2015
Published:
07 September 2015
- Split View
- Views
- Article contents
- Figures & tables
- Video
- Audio
- Supplementary Data
-
Cite
Cite
Annette M. Molinaro, Diagnostic tests: how to estimate the positive predictive value, Neuro-Oncology Practice, Volume 2, Issue 4, December 2015, Pages 162–166, https://doi.org/10.1093/nop/npv030
Close
Search
Close
Search
Advanced Search
Search Menu
Abstract
When a patient receives a positive test result from a diagnostic test they assume they have the disease. However, the positive predictive value (PPV), ie the probability that they have the disease given a positive test result, is rarely equal to one. To assist their patients, doctors must explain the chance that they do in fact have the disease. However, physicians frequently miscalculate the PPV as the sensitivity and/or misinterpret the PPV, which results in increased anxiety in patients and generates unnecessary tests and consultations. The reasons for this miscalculation as well as three ways to calculate the PPV are reviewed here.
diagnostic tests, false positive rate, positive predictive value, sensitivity, statistics
Prevalence of glioma is 0.003%. A patient comes into the clinic complaining of headaches and memory loss. A new blood test for diagnosis of glioma is available. The patient tests positive. From the literature (see Table 1) you know that the sensitivity of the test is 96.7% and the false positive rate is 4%. What is the probability that this patient who tested positive actually has glioma?
This is an understandably difficult problem, since it pertains to conditional probabilities (sensitivity, specificity, and positive predictive value [PPV]) and varying reference populations (those with disease and those without). Nonetheless, an informed interpretation of diagnostic tests is increasingly important, especially as novel biomarkers are used in the detection of disease. Unfortunately, studies have shown that more than 75% of the doctors answer questions similar to that above incorrectly.1–5
Table 1.
Open in new tab
Fictional table from literature.
Disease Status | Total | ||
---|---|---|---|
Glioma Present | Glioma Absent | ||
Test Result | |||
Positive | 29 | 2 | 31 |
Negative | 1 | 48 | 49 |
Total | 30 | 50 | 80 |
Disease Status | Total | ||
---|---|---|---|
Glioma Present | Glioma Absent | ||
Test Result | |||
Positive | 29 | 2 | 31 |
Negative | 1 | 48 | 49 |
Total | 30 | 50 | 80 |
In this data, the prevalence of disease is P(D) = 30/80 = 0.375; the sensitivity is P(Test positive | Glioma present) = 29/30 = 0.967; the false positive rate is P(Test positive | Glioma absent) = 2/50 = 0.04. See Table 2 for formulas.
Table 1.
Open in new tab
Fictional table from literature.
Disease Status | Total | ||
---|---|---|---|
Glioma Present | Glioma Absent | ||
Test Result | |||
Positive | 29 | 2 | 31 |
Negative | 1 | 48 | 49 |
Total | 30 | 50 | 80 |
Disease Status | Total | ||
---|---|---|---|
Glioma Present | Glioma Absent | ||
Test Result | |||
Positive | 29 | 2 | 31 |
Negative | 1 | 48 | 49 |
Total | 30 | 50 | 80 |
In this data, the prevalence of disease is P(D) = 30/80 = 0.375; the sensitivity is P(Test positive | Glioma present) = 29/30 = 0.967; the false positive rate is P(Test positive | Glioma absent) = 2/50 = 0.04. See Table 2 for formulas.
The goal of this review is to ease the calculation of conditional probabilities (eg, the PPV in the example above) by explaining three ways to solve them: conditional probability equations, tree diagrams (with probabilities), and natural frequencies. You have the option of reviewing all three or just one or two of the approaches. Any of the three will get you to the correct answer. We begin with the calculation via conditional probabilities and follow with building tree diagrams for a visual representation. Subsequently, we illustrate a way to translate this information via natural frequencies for you and your patients so that they too understand the meaning of a positive or negative test result.
Approach 1: Conditional Probability Equations
Conditional probabilities are important in the interpretation of diagnostic tests because the test results influence our understanding of whether the patient has a disease. However, the test results are not synonymous with the presence or absence of disease. The conditional probabilities that we need to understand are sensitivity, specificity, PPV, and negative predictive value (NPV). These probabilities are defined by two events: the presence of disease and a positive test result.
Sensitivity is defined as the probability of a positive test result given the presence of disease, written as: P(positive test | disease present). The vertical line can be read as “given.” Specificity is defined as the probability of a negative test result given absence of disease, ie .PPV is defined as the probability of the presence of disease given a positive test result, ie, .NPV is defined as the probability of the absence of disease given a negative test result, ie, . Given the similarities in calculation between PPV and NPV we will only focus on the former here.
There are two important things to know about conditional probabilities. First, conditional probabilities are not reciprocal, ie,
This is important to note as this means that sensitivity does not equal PPV, ie
This is one of the most common errors that doctors make when calculating PPV – they simply equate it with the test's sensitivity.
Second, you can write a conditional probability as:
The importance of the fraction on the right has to do with how we will connect the sensitivity to PPV and will become clearer when we learn how to rewrite the numerator on the right-hand side. To do so, we need the multiplication rule, which is the probability that both events occur, ie . This can be written as:
or with our events as:
which is equivalent to:
Similarly the probability of a false positive can be written as:
Now we can connect the PPV to the sensitivity:
Expressed as the other form of conditional probability, we can see this as:
And by applying the multiplication rule, we can rewrite this as:
In the denominator, a positive test can come from those patients with the presence of disease (true positives) and those with the absence of disease (false positives). Therefore we can write: The two probabilities on the right were defined above. We can continue the calculation to get the PPV:
In the example of the test for glioma above, we would substitute the values for prevalence, sensitivity, and false positives, and calculate:
Thus, the chance that the patient has glioma given a positive test result is 0.07%.
There are many similarities between a 2 × 2 table (Table 1) and conditional probabilities. You can see from Table 2 how to calculate sensitivity, specificity, and PPV from a 2 × 2 table. However, PPV can only be calculated from a 2 × 2 table if the prevalence [P(Disease present) = number of people with disease/number of people in population (or sample)] in the table is the same as that in the population. Typically the reason the prevalence in a 2 × 2 table does not reflect the population prevalence is because the table is based on case-control data in which a specified number of cases (patients with disease) and controls (patients without disease) are studied for the purpose of finding associations. For example, in Table 1 the hypothetical data are based on a case-control study with 30 cases and 50 controls and thus the prevalence of disease is Using the same calculations as above but with a prevalence of 37.5%, the PPV equals 94%, which is incorrect, as we know the prevalence in the population is 0.003%. Thus, if the prevalence of the disease in a 2 × 2 table is not the same as in the population you cannot calculate the PPV (or NPV).
Table 2.
Open in new tab
A 2 × 2 table with test results in the rows and disease status in the columns
Sensitivity, Specificity, and False positive/negative rate can be calculated from any such 2 × 2 table. Positive and Negative predictive values can only be calculated from a 2 × 2 table if the prevalence of disease in the table is the same as that in the population. It should be noted that the false positive rate is the P(negative test | disease absent) while the false positive in the 2 × 2 table is the P(positive test and disease absent).
Table 2.
Open in new tab
A 2 × 2 table with test results in the rows and disease status in the columns
Sensitivity, Specificity, and False positive/negative rate can be calculated from any such 2 × 2 table. Positive and Negative predictive values can only be calculated from a 2 × 2 table if the prevalence of disease in the table is the same as that in the population. It should be noted that the false positive rate is the P(negative test | disease absent) while the false positive in the 2 × 2 table is the P(positive test and disease absent).
Approach 2: Tree Diagrams
Another way to display the data is in a tree diagram3,6 (Fig. 1). Starting on the left at the “Individual” the first split corresponds to disease status, the patient either has disease or does not. The top line going from “Individual” to “Disease” shows the prevalence of disease while the bottom line shows the probability of not having the disease, . Similar to disease status, the test result can either be positive or negative. The line between “Disease” and “Positive test” displays the sensitivity, ie , whereas the line between “No Disease” and “Negative test” shows the specificity, ie . The conditional probabilities associated with the other two lines, the false positive/negative rates, can be written similarly. Note that the two lines coming from the same box must sum to one, eg . That is also true for sensitivity and the false negative rate as well as the false positive rate and specificity. The four squares of the 2 × 2 table can also be calculated on the far right of the tree diagram by using the multiplication rule, eg
We can display the information from the original question in a tree diagram to help calculate the PPV. In Fig. 2, the known information is in bold and the inferred information is in italic. Note that the people with a positive test are either true positives (disease present and a positive test) or false positives (no disease and a positive test). Because the prevalence in the tree diagram is considered in calculating true positives a simpler way of calculating the PPV is:
Or, as expressed as the other form of conditional probability:
If we substitute numbers from the tree diagram, we can calculate:
Thus, the chance that the patient has glioma given a positive test result is 0.07%. This PPV should be clearly communicated to the patient. As it can be difficult to explain conditional probabilities to patients, we will explore an alternative option.
Fig. 1.
Tree diagram representing all possible outcomes of a diagnostic test. P(A) is the probability of Event A. P(B|A) is the conditional probability of Event B given Event A. FPR is the false positive rate = P(Positive test | Disease absent). FNR is the False negative rate = P(Negative test | Disease present).
Open in new tabDownload slide
Fig. 2.
Tree diagram representing all possible outcomes and condition probabilities given in hypothetical diagnostic test example. Text in bold is given in example. Text in italic is calculated from given information in bold. FPR is the false positive rate = P(Positive test | Disease absent). FNR is the False negative rate = P(Negative test | Disease present).
Open in new tabDownload slide
Approach 3: Natural frequencies
To help patients understand conditional probabilities you can translate them to natural frequencies with or without the use of a tree diagram.1,3 Natural frequencies are the way most people are presented with statistics and, thus, make interpretation simpler. We can directly translate the original question into natural frequencies and illustrate the ease with which the question can be answered.
Three out of every 100 000 people have glioma. A patient comes into the clinic complaining of headaches and memory loss. A new blood test for diagnosis of glioma is available. She tests positive. From the literature you know that of the three people out of 100 000 with glioma, all three will likely have a positive blood test. Of the 99 997 people without glioma, 4000 will still have a positive blood test. Of the patients with a positive blood test, how many actually have glioma?
Now the answer is much more straightforward to calculate: it is Again, this is the PPV, the chance that a patient with a positive test result actually has glioma.
One of the reasons natural frequencies make this problem easier to understand is that they use the same reference group. For example, three patients (with a positive blood test and glioma) and 4000 patients (with a positive blood test and no glioma) both refer to the same group of 100 000 people. In contrast, in the original question the sensitivity refers to the group of three patients with glioma while the specificity refers to the group of 4000 patients without glioma. A pitfall of using natural frequencies is that mistakes can be made in translating the conditional probabilities to frequencies and thus caution must be used.
Conclusion
Positive predictive value is the probability that a person who receives a positive test result actually has the disease. This is what patients want to know. Nonetheless, physicians frequently miscalculate and/or misinterpret the PPV, which results in increased anxiety in patients and generates unnecessary tests and consultations. One of the reasons for miscalculation is that conditional probabilities are not reciprocal, meaning that the , or in our example that sensitivity does not equal PPV. A second reason is that the PPV relies on the prevalence of disease and therefore the PPV cannot be calculated from a data set that does not have the same prevalence as the population. Finally, conditional probabilities can be conceptual and many studies have shown that reframing the problem in natural frequencies (with or without tree diagrams) increases the ability of a physician to correctly calculate the PPV.1,3
Here we have shown three ways to calculate the PPV: conditional probabilities, tree diagrams and natural frequencies. In all three, we show that the PPV of the hypothetical blood test equals 0.07%. The implication of this is crucial but often goes unnoticed. For any rare disease, such as glioma, the percent of false positives tends to be appreciable even though the sensitivity and specificity may be high. The ramification is that the vast majority of positive test results will be false positives. An advantage of a low prevalence of disease is that a patient with a negative test result is very unlikely to have the disease, ie the negative predictive value (NPV) is large. In the hypothetical example the NPV can be calculated similarly to the PPV and shown to equal 99.99%.
Given the current focus on finding novel biomarkers to be used in the detection of disease, an informed interpretation of diagnostic tests is increasingly important. Equally important is the translation of this information to your patients. We hope these tools will be helpful in both understanding and relaying conditional probabilities to your patients.
Funding
This study was supported by R01 CA163687 (Annette M. Molinaro, Principal Investigator).
Acknowledgments
The author would like to thank Jennifer Clarke, David Elson, and Seunggu Han for their input and suggestions on presentation of this material.
Conflict of interest statement. None declared.
References
1
Gigerenzer G Edwards A
Simple tools for understanding risks: from innumeracy to insight
.
Br Med J
.
2003-09-25 21:58:31
,
2003
;
327
(7417)
:
741
–
744
.
2
Casscells W Schoenberger A Graboys TB
Interpretation by Physicians of Clinical Laboratory Results
.
N Engl J Med
.
1978
;
299
(18)
:
999
–
1001
.
3
Friederichs H Ligges S Weissenstein A
Using Tree Diagrams without Numerical Values in Addition to Relative Numbers Improves Students’ Numeracy Skills: A Randomized Study in Medical Education
.
Med Decis Making
.
2014
;
34
(2)
:
253
–
257
.
4
Manrai AK Bhatia G Strymish J Kohane IS Jain SH
Medicine's uncomfortable relationship with math: Calculating positive predictive value
.
JAMA Intern Med
.
2014
;
174
(6)
:
991
–
993
.
5
Eddy D
Probabilistic reasoning in clinical medicine: problems and opportunities
. In: Kahneman D Sloviv P Tversky A
Judgement under uncertainty: Heuristics and Biases
.
Cambridge, UK
:
Cambridge University Press
;
1982
:
249
–
267
.
6
Baldi B Moore DS
The Practice of Statistics in the Life Sciences, 2nd ed
.
New York, NY
:
W. H. Freeman
;
2010
.
OpenURL Placeholder Text
© The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Topic:
- diagnostic techniques and procedures
- false-positive results
- laboratory test finding
Download all slides
Advertisem*nt
Citations
Views
68,156
Altmetric
More metrics information
Metrics
Total Views 68,156
63,982 Pageviews
4,174 PDF Downloads
Since 12/1/2016
Month: | Total Views: |
---|---|
December 2016 | 1 |
January 2017 | 1 |
February 2017 | 8 |
March 2017 | 5 |
April 2017 | 6 |
May 2017 | 4 |
June 2017 | 2 |
July 2017 | 10 |
August 2017 | 19 |
September 2017 | 47 |
October 2017 | 100 |
November 2017 | 136 |
December 2017 | 353 |
January 2018 | 454 |
February 2018 | 580 |
March 2018 | 880 |
April 2018 | 1,207 |
May 2018 | 1,324 |
June 2018 | 1,123 |
July 2018 | 790 |
August 2018 | 802 |
September 2018 | 879 |
October 2018 | 741 |
November 2018 | 721 |
December 2018 | 585 |
January 2019 | 550 |
February 2019 | 605 |
March 2019 | 755 |
April 2019 | 1,076 |
May 2019 | 917 |
June 2019 | 747 |
July 2019 | 757 |
August 2019 | 719 |
September 2019 | 899 |
October 2019 | 1,032 |
November 2019 | 933 |
December 2019 | 632 |
January 2020 | 789 |
February 2020 | 1,012 |
March 2020 | 857 |
April 2020 | 1,822 |
May 2020 | 1,103 |
June 2020 | 1,319 |
July 2020 | 1,036 |
August 2020 | 930 |
September 2020 | 1,295 |
October 2020 | 1,311 |
November 2020 | 1,345 |
December 2020 | 994 |
January 2021 | 946 |
February 2021 | 913 |
March 2021 | 1,008 |
April 2021 | 786 |
May 2021 | 793 |
June 2021 | 525 |
July 2021 | 637 |
August 2021 | 960 |
September 2021 | 2,274 |
October 2021 | 2,131 |
November 2021 | 1,570 |
December 2021 | 1,496 |
January 2022 | 1,343 |
February 2022 | 990 |
March 2022 | 1,116 |
April 2022 | 872 |
May 2022 | 828 |
June 2022 | 531 |
July 2022 | 575 |
August 2022 | 531 |
September 2022 | 723 |
October 2022 | 899 |
November 2022 | 818 |
December 2022 | 561 |
January 2023 | 749 |
February 2023 | 832 |
March 2023 | 713 |
April 2023 | 615 |
May 2023 | 571 |
June 2023 | 513 |
July 2023 | 467 |
August 2023 | 477 |
September 2023 | 550 |
October 2023 | 487 |
November 2023 | 421 |
December 2023 | 452 |
January 2024 | 506 |
February 2024 | 572 |
March 2024 | 918 |
April 2024 | 407 |
May 2024 | 502 |
June 2024 | 365 |
Email alerts
Article activity alert
Advance article alerts
New issue alert
Receive exclusive offers and updates from Oxford Academic
Related articles in PubMed
Citing articles via
Google Scholar
-
Latest
-
Most Read
-
Most Cited
More from Oxford Academic
Clinical Medicine
Medical Oncology
Medicine and Health
Neurology
Books
Journals
Advertisem*nt