Diagnostic tests: how to estimate the positive predictive value (2024)

  • Journal List
  • Neurooncol Pract
  • v.2(4); 2015 Dec
  • PMC6664615

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Diagnostic tests: how to estimate the positive predictivevalue (1)

Link to Publisher's site

Neurooncol Pract. 2015 Dec; 2(4): 162–166.

Published online 2015 Sep 7. doi:10.1093/nop/npv030

PMCID: PMC6664615

PMID: 31386059

Annette M. Molinaro

Author information Article notes Copyright and License information PMC Disclaimer

Abstract

When a patient receives a positive test result from a diagnostic test they assume theyhave the disease. However, the positive predictive value (PPV), ie the probability thatthey have the disease given a positive test result, is rarely equal to one. To assisttheir patients, doctors must explain the chance that they do in fact have the disease.However, physicians frequently miscalculate the PPV as the sensitivity and/or misinterpretthe PPV, which results in increased anxiety in patients and generates unnecessary testsand consultations. The reasons for this miscalculation as well as three ways to calculatethe PPV are reviewed here.

Keywords: diagnostic tests, false positive rate, positive predictive value, sensitivity, statistics

Prevalence of glioma is 0.003%. A patient comes into the clinic complaining ofheadaches and memory loss. A new blood test for diagnosis of glioma is available. Thepatient tests positive. From the literature (see Table ​Table1)1) you know that the sensitivity of the test is 96.7% and thefalse positive rate is 4%. What is the probability that this patient who tested positiveactually has glioma?

Table 1.

Fictional table from literature.

Disease StatusTotal
Glioma PresentGlioma Absent
Test Result
 Positive29231
 Negative14849
Total305080

Open in a separate window

In this data, the prevalence of disease is P(D) =30/80 = 0.375; the sensitivity is P(Test positive | Glioma present) =29/30 = 0.967; the false positive rate is P(Test positive | Glioma absent) = 2/50 =0.04. See Table ​Table22 for formulas.

This is an understandably difficult problem, since it pertains to conditionalprobabilities (sensitivity, specificity, and positive predictive value [PPV]) and varyingreference populations (those with disease and those without). Nonetheless, an informedinterpretation of diagnostic tests is increasingly important, especially as novel biomarkersare used in the detection of disease. Unfortunately, studies have shown that more than 75%of the doctors answer questions similar to that above incorrectly.1–5

The goal of this review is to ease the calculation of conditional probabilities (eg, thePPV in the example above) by explaining three ways to solve them: conditional probabilityequations, tree diagrams (with probabilities), and natural frequencies. You have the optionof reviewing all three or just one or two of the approaches. Any of the three will get youto the correct answer. We begin with the calculation via conditional probabilities andfollow with building tree diagrams for a visual representation. Subsequently, we illustratea way to translate this information via natural frequencies for you and your patients sothat they too understand the meaning of a positive or negative test result.

Approach 1: Conditional Probability Equations

Conditional probabilities are important in the interpretation of diagnostic tests becausethe test results influence our understanding of whether the patient has a disease. However,the test results are not synonymous with the presence or absence of disease. The conditionalprobabilities that we need to understand are sensitivity, specificity, PPV, and negativepredictive value (NPV). These probabilities are defined by two events: the presence ofdisease and a positive test result.

Sensitivity is defined as the probability of a positive test result given thepresence of disease, written as: P(positive test | disease present). Thevertical line can be read as “given.” Specificity is defined as the probabilityof a negative test result given absence of disease, ie P(negativetest | diseaseabsent).PPV is defined asthe probability of the presence of disease given a positive test result, ie,P(disease present |positivetest).NPV is defined asthe probability of the absence of disease given a negative test result, ie,P(diseaseabsent | negativetest). Given the similarities incalculation between PPV and NPV we will only focus on the former here.

There are two important things to know about conditional probabilities. First, conditionalprobabilities are not reciprocal, ie,

P(EventA|EventB)P(EventB|EventA).

This is important to note as this means that sensitivity does not equal PPV,ie

P(positivetest|diseasepresent)P(disease present|positivetest).

This is one of the most common errors that doctors make when calculating PPV – they simplyequate it with the test's sensitivity.

Second, you can write a conditional probability as:

P(EventA|EventB)=P(EventAandEventB)P(EventB).

The importance of the fraction on the right has to do with how we will connect thesensitivity to PPV and will become clearer when we learn how to rewrite the numerator on theright-hand side. To do so, we need the multiplication rule, which is theprobability that both events occur, ie P(EventAandEventB). This can be writtenas:

P(EventAandEventB)=P(EventB)P(EventA|EventB)

orwith our events as:

P(disease presentand positivetest)=P(disease present)P(positivetest\;|\;disease present)

which is equivalent to:

P(True positive)=PrevalenceSensitivity.

Similarly the probability of a false positive can be written as:

P(False positive)=P(diseaseabsentand positivetest)=P(diseaseabsent)P(positivetest |\;diseaseabsent)=(1Prevalence)FalsePositiveRate

Now we can connect the PPV to the sensitivity:

PPV=P(disease present |positivetest)

Expressed as the other form of conditional probability, we can see thisas:

=P(disease presentandpositivetest)P(positivetest)

And by applying the multiplication rule, we can rewrite this as:

=P(disease present)P(positivetest | diseasepresent)P(positivetest)=PrevalenceSensitivityP(positivetest)

In the denominator, a positive test can come from those patients with the presence ofdisease (true positives) and those with the absence of disease (false positives). Thereforewe can write: P(positivetest)=P(true positive)+P(false positive). The two probabilities on theright were defined above. We can continue the calculation to get thePPV:

=PrevalenceSensitivityP(true positive)+P(false positive)

In the example of the test for glioma above, we would substitute the values for prevalence,sensitivity, and false positives, and calculate:

=(0.00003)(0.967)((0.00003)(0.967))+((10.00003)(0.04))=0.000725

Thus, the chance that the patient has glioma given a positive test result is 0.07%.

There are many similarities between a 2 × 2 table (Table ​(Table1)1) and conditional probabilities. You can see from Table ​Table22 how to calculate sensitivity, specificity, andPPV from a 2 × 2 table. However, PPV can only be calculated from a 2 × 2 tableif the prevalence [P(Disease present) = number of people withdisease/number of people in population (or sample)] in the table is the same as that in thepopulation. Typically the reason the prevalence in a 2 × 2 table does not reflect thepopulation prevalence is because the table is based on case-control data in which aspecified number of cases (patients with disease) and controls (patients without disease)are studied for the purpose of finding associations. For example, in Table ​Table11 the hypothetical data are based on acase-control study with 30 cases and 50 controls and thus the prevalence of disease is(30/80)=37.5%. Usingthe same calculations as above but with a prevalence of 37.5%, the PPV equals 94%, which isincorrect, as we know the prevalence in the population is 0.003%. Thus, if the prevalence ofthe disease in a 2 × 2 table is not the same as in the population you cannotcalculate the PPV (or NPV).

Table 2.

A 2 × 2 table with test results in the rows and disease status in the columns

Diagnostic tests: how to estimate the positive predictivevalue (2)

Open in a separate window

Sensitivity, Specificity, and False positive/negative rate can be calculated from anysuch 2 × 2 table. Positive and Negative predictive values can only be calculated froma 2 × 2 table if the prevalence of disease in the table is the same as that in thepopulation. It should be noted that the false positive rate is theP(negative test | disease absent) while the false positive in the 2 ×2 table is the P(positive test and disease absent).

Approach 2: Tree Diagrams

Another way to display the data is in a tree diagram3,6 (Fig. ​(Fig.1).1). Starting on the left at the “Individual” thefirst split corresponds to disease status, the patient either has disease or does not. Thetop line going from “Individual” to “Disease” shows the prevalence of disease while thebottom line shows the probability of not having the disease, 1Prevalence.Similar to disease status, the test result can either be positive or negative. The linebetween “Disease” and “Positive test” displays the sensitivity, ie P(positivetest | disease present), whereas the line between “No Disease” and“Negative test” shows the specificity, ie P(negativetest | diseaseabsent). The conditional probabilitiesassociated with the other two lines, the false positive/negative rates, can be writtensimilarly. Note that the two lines coming from the same box must sum to one, egprevalence+(1prevalence)=1. That is alsotrue for sensitivity and the false negative rate as well as the false positive rate andspecificity. The four squares of the 2 × 2 table can also be calculated on the far right ofthe tree diagram by using the multiplication rule, eg

P(true positive)=P(disease presentand positivetest)=P(disease present)P(positivetest | disease present)=PrevalenceSensitivity

P(false positive)=P(diseaseabsentand positive test)=P(disease absent)P(positive test | diseaseabsent)=(1Prevalence)False positiverate.

We can display the information from the original question in a tree diagram to helpcalculate the PPV. In Fig. ​Fig.2,2, the knowninformation is in bold and the inferred information is in italic. Note that the people witha positive test are either true positives (disease present and a positive test) or falsepositives (no disease and a positive test). Because the prevalence in the tree diagram isconsidered in calculating true positives a simpler way of calculating the PPVis:

PPV=P(Disease|Positivetest)

Open in a separate window

Fig. 1.

Tree diagram representing all possible outcomes of a diagnostic test.P(A) is the probability of Event A.P(B|A) is the conditionalprobability of Event B given Event A. FPR is the false positive rate =P(Positive test | Disease absent). FNR is the False negative rate =P(Negative test | Disease present).

Open in a separate window

Fig. 2.

Tree diagram representing all possible outcomes and condition probabilities given inhypothetical diagnostic test example. Text in bold is given in example. Text in italicis calculated from given information in bold. FPR is the false positive rate =P(Positive test | Disease absent). FNR is the False negative rate =P(Negative test | Disease present).

Or, as expressed as the other form of conditional probability:

=P(DiseaseandPositivetest)P(Positivetest)=P(TruePositive)P(TruePositive)+P(FalsePositive)

If we substitute numbers from the tree diagram, we can calculate:

=(0.000029)(0.000029)+(0.04)=0.000725

Thus, the chance that the patient has glioma given a positive test result is 0.07%. ThisPPV should be clearly communicated to the patient. As it can be difficult to explainconditional probabilities to patients, we will explore an alternative option.

Approach 3: Natural frequencies

To help patients understand conditional probabilities you can translate them to naturalfrequencies with or without the use of a tree diagram.1,3 Naturalfrequencies are the way most people are presented with statistics and, thus, makeinterpretation simpler. We can directly translate the original question into naturalfrequencies and illustrate the ease with which the question can be answered.

Three out of every 100 000 people have glioma. A patient comes into theclinic complaining of headaches and memory loss. A new blood test for diagnosis ofglioma is available. She tests positive. From the literature you know that of the threepeople out of 100 000 with glioma, all three will likely have a positive blood test. Ofthe 99 997 people without glioma, 4000 will still have a positive blood test. Of thepatients with a positive blood test, how many actually have glioma?

Now the answer is much more straightforward to calculate: it is 3/(3+4000)=0.0007. Again, this is thePPV, the chance that a patient with a positive test result actually has glioma.

One of the reasons natural frequencies make this problem easier to understand is that theyuse the same reference group. For example, three patients (with a positive blood test andglioma) and 4000 patients (with a positive blood test and no glioma) both refer to the samegroup of 100 000 people. In contrast, in the original question the sensitivity refers to thegroup of three patients with glioma while the specificity refers to the group of 4000patients without glioma. A pitfall of using natural frequencies is that mistakes can be madein translating the conditional probabilities to frequencies and thus caution must beused.

Conclusion

Positive predictive value is the probability that a person who receives a positive testresult actually has the disease. This is what patients want to know. Nonetheless, physiciansfrequently miscalculate and/or misinterpret the PPV, which results in increased anxiety inpatients and generates unnecessary tests and consultations. One of the reasons formiscalculation is that conditional probabilities are not reciprocal, meaning that theP(B|A)P(A|B), or in our example that sensitivity does not equalPPV. A second reason is that the PPV relies on the prevalence of disease and therefore thePPV cannot be calculated from a data set that does not have the same prevalence as thepopulation. Finally, conditional probabilities can be conceptual and many studies have shownthat reframing the problem in natural frequencies (with or without tree diagrams) increasesthe ability of a physician to correctly calculate the PPV.1,3

Here we have shown three ways to calculate the PPV: conditional probabilities, treediagrams and natural frequencies. In all three, we show that the PPV of the hypotheticalblood test equals 0.07%. The implication of this is crucial but often goes unnoticed. Forany rare disease, such as glioma, the percent of false positives tends to be appreciableeven though the sensitivity and specificity may be high. The ramification is that the vastmajority of positive test results will be false positives. An advantage of a low prevalenceof disease is that a patient with a negative test result is very unlikely to have thedisease, ie the negative predictive value (NPV) is large. In the hypothetical example theNPV can be calculated similarly to the PPV and shown to equal 99.99%.

Given the current focus on finding novel biomarkers to be used in the detection of disease,an informed interpretation of diagnostic tests is increasingly important. Equally importantis the translation of this information to your patients. We hope these tools will be helpfulin both understanding and relaying conditional probabilities to your patients.

Funding

This study was supported by R01 CA163687 (Annette M. Molinaro, Principal Investigator).

Acknowledgments

The author would like to thank Jennifer Clarke, David Elson, and Seunggu Han for theirinput and suggestions on presentation of this material.

Conflict of interest statement. None declared.

References

1. Gigerenzer G, Edwards A. Simple tools for understanding risks: from innumeracy toinsight. Br Med J. 2003-09-2521:58:31,2003;327(7417):741–744. [PMC free article] [PubMed] [Google Scholar]

2. Casscells W, Schoenberger A,Graboys TB. Interpretation by Physicians of Clinical LaboratoryResults. N Engl J Med.1978;299(18):999–1001. [PubMed] [Google Scholar]

3. Friederichs H, Ligges S, Weissenstein A. Using Tree Diagrams without Numerical Values in Addition to RelativeNumbers Improves Students’ Numeracy Skills: A Randomized Study in MedicalEducation. Med Decis Making.2014;34(2):253–257. [PubMed] [Google Scholar]

4. Manrai AK, Bhatia G, Strymish J, Kohane IS, Jain SH. Medicine's uncomfortable relationship with math: Calculating positivepredictive value. JAMA Intern Med.2014;174(6):991–993. [PMC free article] [PubMed] [Google Scholar]

5. Eddy D.Probabilistic reasoning in clinical medicine: problems andopportunities. In: Kahneman D, Sloviv P, Tversky A,eds. Judgement under uncertainty: Heuristics and Biases.Cambridge, UK: Cambridge UniversityPress;1982:249–267. [Google Scholar]

6. Baldi B, Moore DS. The Practice of Statistics in the Life Sciences, 2nded. New York, NY: W. H.Freeman; 2010. [Google Scholar]

Articles from Neuro-Oncology Practice are provided here courtesy of Oxford University Press

Diagnostic tests: how to estimate the positive predictive
value (2024)
Top Articles
Latest Posts
Article information

Author: Domingo Moore

Last Updated:

Views: 6142

Rating: 4.2 / 5 (73 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Domingo Moore

Birthday: 1997-05-20

Address: 6485 Kohler Route, Antonioton, VT 77375-0299

Phone: +3213869077934

Job: Sales Analyst

Hobby: Kayaking, Roller skating, Cabaret, Rugby, Homebrewing, Creative writing, amateur radio

Introduction: My name is Domingo Moore, I am a attractive, gorgeous, funny, jolly, spotless, nice, fantastic person who loves writing and wants to share my knowledge and understanding with you.