Predictive Value of Olfactory and Taste Symptoms in the Diagnosis of COVID-19: A Systematic Review and Meta-Analysis
Article information
Abstract
Objectives
This study evaluated the diagnostic value of various symptoms of coronavirus disease 2019 (COVID-19) in screening for this disease.
Methods
Two authors (working independently) comprehensively reviewed six databases (PubMed, Cochrane Database, Embase, Web of Science, Scopus, and Google Scholar) from their dates of inception until November 2020. The predictive value of patient-reported symptoms, including otolaryngologic and general symptoms, was evaluated in adults who underwent testing for COVID-19. True-positive, true-negative, false-positive, and false-negative data were extracted from each study. The methodological quality of the included studies was evaluated using the quality assessment of diagnostic accuracy studies tool (ver. 2).
Results
Twenty-eight prospective and retrospective studies were included in the meta-analysis. The diagnostic odds ratio (DOR) of a change in olfaction and/or taste was 10.20 (95% confidence interval [CI], 8.43–12.34). The area under the summary receiver operating characteristic curve was 0.8. Olfactory and/or taste changes had a low sensitivity (0.57; 95% CI, 0.47–0.66) but moderate negative (0.78; 95% CI, 0.69–0.85] and positive (0.78; 95% CI, 0.66–0.87) predictive values and a high specificity (0.91; 95% CI, 0.83–0.96). Olfactory and/or taste changes had a higher diagnostic value than the other otolaryngologic symptoms, a higher DOR and specificity, and a similar or higher diagnostic value than the other general symptoms.
Conclusion
Among otolaryngologic symptoms, olfactory and/or taste dysfunction was the most closely associated with COVID-19 and its general symptoms, and should therefore be considered when screening for the disease.
INTRODUCTION
Since its initial outbreak in 2019, coronavirus disease 2019 (COVID-19), the acute respiratory illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, has continued to spread at an exponential rate. The causative agent, SARS-CoV-2, is most frequently transmitted between people through respiratory droplets and aerosols [1]. Influenza-like symptoms or mild pneumonia develop in >80% of COVID-19 patients, and most patients do not need to be hospitalized [2]. However, significant viral transmission has been traced to these mildly symptomatic and non-hospitalized patients [3].
In the absence of a specific treatment and with vaccine trials still underway [4], the rapid and reliable diagnosis of COVID-19 infection followed by the strict isolation of patients is the most effective means of controlling disease spread [5]. Currently, the diagnosis of COVID-19 is mostly made by reverse-transcription polymerase chain reaction (RT-PCR) testing of respiratory samples, with further discriminative features of the disease often apparent on chest computed tomography (CT) scans [6,7]. However, RT-PCR tests are not always readily available, especially in some countries or regions, and the delayed reporting of test results due to a large number of samples in certain institutions may lead to a delay in the proper quarantining of patients. Since the outbreak of the pandemic, the clinical symptoms of COVID-19-positive patients have been described in many reports [8]. Given the limited clinical resources, it is important to identify the most predictive symptoms of COVID-19 infection to ensure the timely quarantining of patients to curtail disease spread [9]. Therefore, we conducted a meta-analysis comparing the diagnostic value of olfactory and/or taste changes as well as other otolaryngologic symptoms and general symptoms with the current reference test (RT-PCR). Furthermore, considering the inclusion of various and heterogeneous studies, the diagnostic accuracy of COVID-19 was sub-analyzed according to the use of validated olfactory and/or taste disorder (OTD) questionnaires or tools, as well as demographic factors and severity of disease.
MATERIALS AND METHODS
Ethical statements
This review study did not treat human participants. Therefore, our Institutional Review Board waived the need for informed consent for a systematic review and meta-analysis.
Literature search
Clinical studies were retrieved from PubMed, the Cochrane Central Register of Controlled Trials, Embase, Web of Science, Scopus, and Google Scholar. The search period was from the date of database inception until November 2020. The search terms were “coronavirus disease 2019,” “severe acute respiratory syndrome coronavirus 2,” “coronavirus,” “COVID-19,” “anosmia,” “ageusia,” “dysgeusia,” “smell,” “taste,” “smell disorders,” “taste disorders,” “PCR,” “diagnostic accuracy,” “signs,” “symptoms,” “cough,” “diarrhea,” “dyspnea,” “fatigue,” “fever,” “headache,” “myalgia,” “fatigue,” and “fever.” Only studies written in English were reviewed. When we performed five database searches, the keywords were used by the combinations (“or”) of all possible keywords ([all fields] and the language limitation such as English (“and”). For very brief and partial example, the following combination of search details was used in Medline: (“COVID-19”[Mesh], or “coronavirus disease 2019” [All Fields], or “severe acute respiratory syndrome coronavirus 2”[All Fields]) AND “diagnosis”[All Fields] AND (“Signs”[All Fields] and “Symptoms”[All Fields] OR (“anosmia”[Mesh] OR “Smell”[Mesh] OR “Olfaction Disorders”[Mesh] OR “Ageusia” [Mesh] OR “Dysgeusia”[Mesh] “Taste”[Mesh] OR “Taste Disorders”[Mesh] OR “Taste and Smell Impairment”[All Fields]) OR (“Cough”[Mesh] OR “Cough”[All Fields] OR “Coughs”[All Fields]) OR (“Diarrhea”[Mesh] OR “Diarrhea”[All Fields] OR “Diarrheas”[All Fields]) OR “Fatigue”[Mesh] OR “Fatigue”[All Fields] OR “Lassitude”[All Fields]) OR (“Fever”[Mesh] OR “Fever”[All Fields] OR “Fevers”[All Fields] OR “Pyrexia”[All Fields] OR “Pyrexias”[All Fields]) OR “Headache”[Mesh] OR “Myalgia”[Mesh] AND “English”[All Fields]). We used similar search words for the other databases. The reference lists of each publication were examined to ensure that no relevant studies had been omitted. All abstracts and titles of candidate studies were assessed by two independent reviewers (DHK and SHH). Studies that did not address smell and taste disorders in the context of COVID-19 were excluded. Detailed search terms and queries was described in Supplementary Table 1.
Selection criteria
The inclusion criteria were (1) English language, (2) prospective or retrospective study protocol, (3) comparison of the prevalence of various symptoms, including smell or taste disorders, in patients or controls tested by PCR via pharyngeal swab, and (4) eligibility in sensitivity and specificity analyses. The exclusion criteria were (1) case report format, (2) review article format, and (3) lack of diagnostic power regarding smell or taste disorders. The search strategy is summarized in Fig. 1.
Data extraction and risk of bias assessment
We compared the results of the various symptoms with the results of the PCR from respiratory secretions. Then, we extracted TP (true positive), FP (false positive), TN (true negative), and FN (false negative) values to calculate diagnostic accuracy, defined as the diagnostic odds ratio (DOR), sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). The calculation was as follows: DOR, (TP/FP)/(FN/TN); sensitivity, TP /(TP+FN); specificity, TN/(TN+FP); NPV, TN/(TN+FN); PPV, TP/(TP+FP). Summary receiver operating characteristic (SROC) curve and the area under the curve (AUC) were also analyzed together [1-6,9-41].
DORs were calculated with 95% confidence intervals (CIs), using random-effects models that considered both within- and between-study variation. DOR values ranged from 0 to infinity, with higher values indicative of a better diagnostic performance. A value of 1 indicated that the presence or absence of disease could not be inferred. The logarithm of each DOR was calculated to obtain an approximately normal distribution [42]. The SROC approach is the method of choice for the meta-analysis of studies reporting both sensitivity and specificity. As the discriminatory power of a test increases, the SROC curve shifts toward the top left-hand corner of the ROC space (i.e., toward the point where both sensitivity and specificity equal 1 [100%]). The AUC can range from 0 to 1, with higher values indicative of a better performance. To be useful, a diagnostic tool must exhibit good reliability; thus, our analysis focused on the reliability of symptoms. As the data were examined by clinicians, the most important type of reliability was interrater agreement, assessed by comparing interpretations of the results between two or more independent assessors. From all studies, data were collected regarding the number of patients, the true-positive, true-negative, false-positive, and false-negative values, which were used to calculate the AUCs and the DORs. Study quality was analyzed using the quality assessment of diagnostic accuracy studies tool (ver. 2; QUADAS-2).
Statistical analysis and outcome measurements
The meta-analysis was conducted using the R statistical software (R Foundation for Statistical Computing, Vienna, Austria). The R package MADA was used to perform the pooling of diagnostic outcomes and generate SROC curves. Pooled sensitivity, specificity, NPV, PPV, DOR outcomes were generated, with 95% CI. Heterogeneity, referring to the variation in study outcomes between studies, was then analyzed using I2. The measure ranged from 0 (no heterogeneity) to 100 (maximum heterogeneity). Those outcomes that did not present a significant level of heterogeneity (I2 <50) were analyzed with the fixed-effects model. It is assumed that all studies come from a common population. By contrast, when significant heterogeneity among outcomes was found (defined as I2 >50), the random-effects model was used. This model assumes that the true effects in individual studies may be different from one another, and that these are normally distributed. Forest plots were drawn for the sensitivity, specificity, and NPV and for the SROC curves.
RESULTS
Thirty-eight studies with 120,256 participants were included in this meta-analysis. The study characteristics are described in Supplementary Table 2, and bias assessment of the studies is shown in Supplementary Table 3.
Diagnostic accuracy of OTD and only olfactory disorder
Eleven prospective and retrospective studies addressing OTD were included. The DOR of OTD was 10.20 (95% CI, 8.43–12.34; I2=64.0%) (Fig. 2). The area under the SROC curve was 0.80 (Fig. 3). OTD had a low sensitivity (0.57; 95% CI, 0.47–0.66; I2=97.5%), but moderate NPV (0.78; 95% CI, 0.69–0.85; I2=98.7%) and PPV (0.78; 95% CI, 0.66–0.87; I2=98.7%) and a high specificity (0.91; 95% CI, 0.83–0.96; I2=99.4%) (Supplementary Fig. 1).
Seventeen prospective and retrospective studies addressed only olfactory disorder (OD). The DOR of OD was 10.37 (95% CI, 6.31–17.05; I2=83.9%) (Fig. 2), and the area under the SROC curve was 0.80 (Fig. 3). An olfactory test alone yielded similar results to OTD with respect to diagnostic accuracy, with a low sensitivity (0.50; 95% CI, 0.34–0.66; I2=97.1%), moderate NPV (0.77; 95% CI, 0.64–0.87; I2=98.8%) and PPV (0.78; 95% CI, 0.66–0.87; I2=93.8%), and a high specificity (0.93; 95% CI, 0.86–0.97; I2=97.2%) (Supplementary Fig. 2). Compared with OTD, OD had a lower sensitivity (0.50 vs. 0.55, P<0.001) but a higher specificity (0.93 vs. 0.91, P=0.0003) and DOR (10.20 vs. 10.37, P<0.001). By contrast, there was no significant difference in NPV (0.77 vs. 0.78, P=0.065) or PPV (0.78 vs. 0.78, P=0.82) between both groups.
Given the statistical heterogeneity in the accuracy of the diagnosis, both the heterogeneity and the diversity of the enrolled studies had to be taken into account to ensure that there were no significant biases. Thus, a subgroup analysis was performed to analyze the effects of the different measurements of olfactory or taste dysfunction (validated instruments vs. non-validated surveys), severity of COVID-19 symptoms (mild to moderate vs. severe), and ethnicity (Asian vs. Caucasian) on the diagnostic efficacy.
For the OTD data, the validated instruments subgroup comprised only one study, such that a subgroup analysis was not possible. For the OD data, the validated instruments subgroup consisted of three studies, which were then subjected to a subgroup analysis. The validated instruments subgroup tended to be less specific (0.92 vs. 0.93), but the sensitivity (0.79 vs. 0.44), NPV (0.83 vs. 0.76), PPV (0.85 vs. 0.75), and DOR (41.30 vs. 9.02) were higher than in the non-validated instruments subgroup. In the subgroup analysis regarding severity of COVID-19 symptoms, the severe subgroup tended to show lower sensitivity (0.37 vs. 0.61; 0.41 vs. 0.59) and less negative and positive predictive (0.52 vs. 0.82; 0.77 vs. 0.78), but a higher specificity (0.98 vs. 0.89; 0.97 vs. 0.87) and negative and positive predictive (0.94 vs. 0.73; 0.85 vs. 0.74) than the mild to moderate subgroup for OTD and OD, respectively. A subgroup analysis regarding ethnicity was not possible, because only one study was conducted among Asian patients.
Diagnostic accuracy of other otolaryngologic symptoms and general symptoms
Other otolaryngologic symptoms, such as nasal symptoms and sore throat, were of low diagnostic accuracy (sensitivity, 20%; specificity, 74%–80%; NPV, 62%–75%; PPV, 22%–30%; AUC, 0.46–0.54). There were no significant associations between these symptoms and the prevalence of COVID-19. However, sore throat (DOR, 0.66; 95% CI, 0.38–1.15) tended to be negatively related to COVID-19 positivity (Table 1).
Among the generalized symptoms (cough, diarrhea, dyspnea, fatigue, fever, headache, and myalgia), diarrhea, fatigue, fever, and myalgia were significantly positively correlated with COVID-19 positivity. Diarrhea and dyspnea had a low sensitivity (0.10–0.20) and PPV (0.20–0.30), and a moderate specificity and NPV (0.70–0.80). Fatigue, fever, and myalgia had a moderate specificity (0.5–0.8) and NPV (0.7–0.8) and low sensitivity (0.4–0.6) and PPV (0.2–0.3). Thus, other symptoms were diagnostically less powerful than OTD (Table 2).
DISCUSSION
The early and accurate diagnosis of SARS-CoV-2 infection is key to halting the COVID-19 pandemic, given the high propagation rate of the virus, the rapid spread of disease worldwide, and the adverse, often fatal consequences of infection [1,6]. The autumn-winter season in the northern hemisphere is generally marked by the circulation of influenza and other respiratory viruses that initially may be difficult to distinguish from COVID-19 [17]. While RT-PCR and thoracic CT scans are definitive diagnostic tools, their accessibility may be limited due to a shortage of medical resources or inefficient policy-making decisions. Thus, controlling the spread of COVID-19 in the community requires that the distinctive clinical features of the disease be readily recognized such that those patients can then be appropriately managed [18].
Currently, the COVID-19 symptoms recognized by the World Health Organization include coughing, fever, fatigue, and difficulty breathing [1]. The U.S. Centers for Disease Control and Prevention initially listed list three major symptoms (fever, cough, and shortness of breath), but as the epidemic progressed added chills, myalgia, headache, sore throat, and the loss of taste and/or smell [14]. However, the clinical manifestation of patients with COVID-19 are often non-specific, resembling those of other influenza-like illnesses and thus complicating the clinical diagnosis of COVID-19. As data regarding the diagnostic power of highly specific symptoms in predicting COVID-19 positivity are limited [1], we quantified the specificity, sensitivity, PPV, and NPV of symptoms reported by the World Health Organization and the health authorities of other countries in a pooled sample of patients who underwent SARS-CoV-2 testing, including those with positive and negative test results. This information is important for both the general public and health care professionals, as it enables faster and more effective isolation procedures and treatment [10].
In our study, OTD had a pooled sensitivity of 0.57, a pooled specificity of 0.91, a pooled NPV of 0.78, a pooled PPV of 0.78, and an AUC of 0.80. The area under the SROC (0.70–0.80) indicated moderate diagnostic accuracy [43]. The sensitivity of OTD for detecting COVID-19 positivity was 56%, which is not high enough for diagnostic purposes. However, the specificity of OTD in estimating COVID-19 negativity was 90%, which is high enough to exclude false-positive COVID-19 diagnoses. In a direct comparison of OTD with OD, OTD was less specific (0.93 vs. 0.91, P<0.001) but more sensitive (0.50 vs. 0.55, P<0.001). These results showed that, for patients with apparent COVID-19, there is no clinical difference between OTD and OD.
Based on the negative and PPVs determined in this study (70%–80%), false-negatives and false-positives would need to be considered in the use of OTD and OD to detect COVID-19. With an NPV of ~80%, a negative test would be a false-negative in 20% of the patients and COVID-19 would therefore go undetected. Conversely, a PPV of ~70% suggested that 30% of the patients would have a false-positive COVID-19 test. Falsepositive results can lead to over-treatment, but false-negative results will prevent patients from receiving essential treatment services in addition to increasing the risk of disease spread in the community. However, these predictions depend on estimates of prevalence. Since prevalence is often highly variable, no meaningful information can be obtained by combining these values. For example, our study estimated a prevalence of olfactory dysfunction ranging from 6% to 84%, whereas for a given diagnostic test, neither sensitivity nor specificity will be affected by the prevalence. Therefore, the importance of sensitivity and specificity should be higher for these measures to improve diagnostic accuracy [44,45].
In a previous meta-analysis on the prevalence of olfactory or taste dysfunction in patients with COVID-19, the correlation between self-reported olfactory function and objective measures was generally poor, which may have caused the significant heterogeneity in the summed prevalence. The study classified the University of Pennsylvania Smell Identification Test, Sniffin’ Sticks, and the questionnaire or reporting tool developed by the American Academy of Otolaryngology-Head and Neck Surgery as validated instruments [20]. In our study, the same classification was applied in a subgroup analysis, which showed that OTD identified with validated instruments was significantly sensitive (~80%) and specific (>90%). These results are consistent with the more accurate diagnostic ability of these instruments than of self-reporting in the diagnosis of OD [46]. While our results suggest that a validated tool for OTD can be used as a screening test, this subgroup analysis included involved only three studies and they were of high heterogeneity. Thus, prospective studies using validated measurement tools in a large number of patients are needed to support our recommendations.
In addition, it has been reported that age, severity of COVID-19 (mild to moderate or severe), and even ethnicity affect the clinical symptoms of COVID-19 [47-49]. We tried to evaluate the effects of these factors on olfactory-related symptoms. However, since the enrolled studies were limited, a subgroup analysis related to age and ethnicity could not be conducted. However, in the subgroup analysis according to severity, OTD tended to be less sensitive, but more specific, in the severe subgroup than in the mild to moderate subgroup (37% vs. 60%; 98% vs. 89%). It was recently reported that olfaction-related symptoms may not be identified or could be neglected in COVID-19 patients with more severe respiratory symptoms (i.e., a higher false-negative rate) [49]. In the context of diagnostic accuracy, low sensitivity can be interpreted as the result of a high falsenegative rate. This tendency is partially consistent with a recent study finding that the overall prevalence of olfactory and taste dysfuction was 31% in patients with severe symptoms, which was lower than that of 67% in mild-to-moderate symptomatic home-isolated patients [20]. In addition, specificity generally tends to be inversely related to sensitivity.
Primary physicians and otolaryngologists are likely to be the first clinicians to encounter patients with symptoms suggesting COVID-19 or who are mildly symptomatic. They should therefore be aware of the predictive value of other common symptoms. However, in our study, nasal symptoms, sore throat, and other otolaryngologic symptoms were of no diagnostic value (low sensitivity and specificity of ~20% and ~80%, respectively) for COVID-19 and were not significantly associated with its prevalence. These findings are consistent, and support those of previous reports showing that, unlike other upper respiratory infections, COVID-19 is likely to present with OD in the absence of other nasal symptoms. This finding suggests direct viral damage to the chemosensory system [5] and is consistent with both the neuro-invasive tendency of the SARS-CoV-2 virus and the ability of olfactory nerve cells to act as a gateway for neuronal invasion [16].
None of the general non-respiratory (fatigue, fever, headache, diarrhea, and myalgia) or respiratory (cough and dyspnea) symptoms were of high diagnostic accuracy (low sensitivity of 20%–60% and moderate specificity of 40%–80%). However, non-respiratory symptoms, including diarrhea, fatigue, fever, and myalgia, were significantly associated with COVID-19 positivity, unlike respiratory symptoms such as cough and dyspnea. Possible reasons for this are as follows. First, most of the enrolled studies were retrospective or cross-sectional with self-reporting questionnaires [20]. Thus, recall and selection bias may have led to the over-presentation of patients with atypical (non-respiratory) symptoms [21]. Second, the enrolled studies were comparative and included all patients with upper respiratory tract infections and RT-PCR tests. Accordingly, respiratory symptoms would have been common among the enrolled patients regardless of their COVID-19 status, rather than being more common in the COVID-19-positive group. Third, most of the enrolled patients had mild to moderate symptoms, whereas dyspnea, as a marker of more severe COVID-19 disease, might not have been captured in the surveyed studies [5,21]. Therefore, these results are relevant for differentiating COVID-19 from other respiratory infections, not from a healthy condition, and should therefore be interpreted with caution.
Although a diagnosis based on symptoms or signs would be difficult and of low diagnostic value compared with the current reference test, this study is the first meta-analysis to synthesize the clinical meanings of otolaryngologic and general symptoms with an eye towards the fact that primary physicians and otolaryngologists are on the front-lines in the era of COVID-19. In particular, since respiratory sampling for RT-PCR requires personal preventive equipment and is practically impossible in primary care clinics, this knowledge could be helpful for developing a presumptive screening questionnaire to prevent clinicians from contacting patients in person. Based on our results, OTD showed higher diagnostic value than other otolaryngologic and general symptoms among patients with upper respiratory symptoms. Furthermore, compared to non-validated instruments, validated olfactory and/or taste questionnaires or tools had a clinically relevant higher diagnostic accuracy.
Our meta-analysis had several limitations. First, due to the significant heterogeneity of the data pooled in this study, a random-effects model and subgroup analysis had to be used. The source of this heterogeneity was likely to be the wide range (6%–84%) of the reported prevalence of olfactory dysfunction [20]. In addition, RT-PCR from nasopharyngeal swabs is the main diagnostic test for COVID-19, but as demonstrated in the present study, its sensitivity is only 56%–83% [2], which may lead to misclassification and diagnostic bias and thus to a heterogeneity similar to that of prevalence [6]. Second, cross-sectional or retrospective studies have inherent limitations. Together, these two factors may have contributed to an under- or overestimation of the actual prevalence. A third limitation was the variability of the tools used to assess olfactory and taste dysfunction, as most were self-reporting olfactory and gustatory dysfunction questionnaires, the weaknesses of which are well-recognized [20].
Considering the limited accessibility of medical resources, including RT-PCR tests, during the COVID-19 pandemic, screening for OTD or OD may be a valuable tool among patients with influenza-like symptoms. Compared to non-validated instruments, validated questionnaires or tools had a high clinical diagnostic accuracy. Prospective studies with larger numbers are needed to confirm our findings.
HIGHLIGHTS
▪ A timely prediction of coronavirus disease 2019 (COVID-19) based on symptoms is important for quarantining patients.
▪ Olfactory and/or taste dysfunction was the symptom most closely associated with COVID-19.
▪ Validated olfactory and/or taste tools have higher diagnostic value for COVID-19.
Notes
Sung Won Kim is an associate editor of the journal but was not involved in the peer reviewer selection, evaluation, or decision process of this article. No other potential conflicts of interest relevant to this article were reported.
Acknowledgements
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1B07045421), the Bio & Medical Technology Development Program of the NRF funded by the Ministry of Science & ICT (2018M3A9E8020856, 2019M3A9H2032424, 2019M3E5D5064110). The sponsors had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Notes
AUTHOR CONTRIBUTIONS
Conceptualization: DHK, SHH. Data curation: SWK, GS, SYL. Formal analysis: SWK, GS, SYL, Funding acquisition: DHK, SHH, SWK. Methodology: SWK, GS, SYL. Project administration: DHK, SHH. Visualization: DHK, SHH. Writing–original draft: DHK, SHH. Writing–review & editing: DHK, SHH, SWK, GS, SYL.
Supplementary Materials
Supplementary materials can be found via https://doi.org/10.21053/ceo.2020.02369.