Skip to main content

Advertisement

Ontological specification of quality of chronic disease data in EHRs to support decision analytics: a realist review

Article metrics

Abstract

This systematic review examined the current state of conceptualization and specification of data quality and the role of ontology based approaches to develop data quality based on "fitness for purpose" within the health context. A literature review was conducted of all English language studies, from January 2000-March 2013, which addressed data/information quality, fitness for purpose of data, used and implemented ontology-based approaches. Included papers were critically appraised with a "context-mechanism-impacts/outcomes" overlay. We screened 315 papers, excluded 36 duplicates, 182 on abstract review and 46 on full-text review; leaving 52 papers for critical appraisal. Six papers conceptualized data quality within the "fitness for purpose" definition. While most agree with a multidimensional definition of DQ, there is little consensus on a conceptual framework. We found no reports of systematic and comprehensive ontological approaches to DQ based on fitness for purpose or use. However, 16 papers used ontology-specified implementations in DQ improvement, with most of them focusing on some dimensions of DQ such as completeness, accuracy, correctness, consistency and timeliness. The majority of papers described the processes of the development of DQ in various information systems. There were few evaluative studies, including any comparing ontological with non-ontological approaches, on the assessment of clinical data quality and the performance of the application.

Background

The growing use of electronic health records (EHRs) raises issues of semantic interoperability and the quality management/improvement of large datasets derived from multiple EHRs. Improved data quality in EHRs can improve the quality of decisions and lead to better policy that actually meet needs, strategies, evidence-based care and patient outcomes.

The acceptable level of data quality is not fixed in the system. Rather health professionals can provide it at different times and data users need to assess that quality contextually, based on the fitness for research, audit and quality assurance purposes (Devillers et al. 2007). It is important to take a user view point of quality because it is the end user who evaluate whether or not data is fit for use. A focus is the quality of patient or disease registers derived from EHRs to support policy and practice. Patients registers need to have a level of completeness and the information contained, need a level of correctness and consistency to be useful for clinical, quality improvement and research purposes (Liaw et al. 2011).

DQ was conceptualised in terms of its "fitness for purpose/use" in a few papers (Wang 1998; Wang et al. 1996). DQ can be described from two perspectives: (1) intrinsic quality of data elements and set of data elements (data set) and (2) how the set meets the user’s needs i.e. fitness for purpose. The commonly approved definition of DQ has been epitomized in the International Standards Organisation definition: "the totality of features and characteristics of an entity that bears on its ability to satisfy stated and implied needs" (ISO 8402-1986, Quality Vocabulary). DQ also can be specified in terms of its "fitness for purpose/use" (Wang 1998; Wang et al. 1996).

Intrinsic DQ refers to the extent that data is free of defects as measured by specific DQ dimensions, including "accuracy, perfection, freshness and uniformity" (Redman 2005) and "completeness, unambiguity, meaningless and correctness" (Choquet et al. 2010; Orme et al. 2007; Wand and Wang 1996; Yao et al. 2005). The Canadian Institute for Health Information recommendations were the basis for an information quality framework comprising 69 quality criteria grouped into 24 quality characteristics, which was further grouped into 6 quality dimensions: accuracy, timeliness, comparability, usability, relevance and privacy & security (Kerr et al. 2007). Research in DQ has tended to focus on the identification of generic quality characteristics such as accuracy, currency and completeness (Orme et al. 2007; Wang et al. 1996) or completeness, correctness, consistency and timeliness (Liaw et al. 2011) as core dimensions of DQ that are relevant across application domains. However, our pervious review shows there is a lack of consensus conceptual framework and definition for DQ (Liaw et al. 2013).

Many studies regularly report a range of deficiencies in the collected information for professional practice (Devillers et al. 2007; Kahn et al. 2002), clinical (Azaouagh and Stausberg 2008; de Lusignan et al. 2010; Mitchell and Westerduin 2008; Moro and Morsillo 2004) and health promotion (Gillies 2000b) purposes. Similar deficiencies exist with information in geographic (Devillers et al. 2007; Ivanova et al. 2013), hospital and general practice (Liaw et al. 2012) information systems, where the lack of coding rules meant that much of the data are often incomplete or in relatively inaccessible text format. The evidence is more encouraging for data for administrative purposes (Lain et al. 2008; Quan et al. 2008). Hybrid record keeping systems in primary care are believed to be more complete than computer-only or paper-only systems (Hamilton et al. 2003).

Relational database models have been prevalent in last few decades, enabling information to be efficiently stored and required within a hierarchical database architecture. On the other hand, ontologies, usually with non-hierarchical databases, have been used in applications that required more flexibility in capturing more semantic meanings. However, there is no well-documented evidence or experiments that suggest that one is better than the other in terms of outputs, data quality and fitness for purpose.

In contrast to our previous review (Liaw et al. 2013), this systematic review will examine the breadth and depth of research into the conceptualization of data quality based on the "fitness for purpose" paradigm, methodologies to specify data quality for implementation, some advantages of ontology-based approaches to develop data quality, and semantic interoperability. This study aims to examine the role of ontology-based approaches to develop data quality based on "fitness for purpose" whereas the previous review focused on data quality as a general concept in health context. This study was broader in the databases searched and the search terms and produced results built on the previous literature review (Liaw et al. 2013) to address the following questions:

  1. 1.

    How is data quality being conceptualized within the "fitness for purpose" definition for a range of uses?

  2. 2.

    What specification methodologies are being used to specify data quality for implementation?

  3. 3.

    What ontology-specified implementations are being used and how do they compare with other methods? and

  4. 4.

    How is the impact of implementing ontology-based specifications for data quality in chronic disease management being measured and evaluated?

Methods

A literature review was conducted of all English language studies, from January 2000-March 2013, which addressed data/information quality, fitness for purpose, used ontology-based approaches and involved healthcare/chronic disease. Inclusion criteria were: (a) conceptualises data quality based on "fitness for purpose"; (b) formal methodologies used to specify data quality for implementation; (c) involved some form of data models and ontologies to improve quality of clinical data in EHRs and patient registers; and (d) used data models and ontology-based approaches in CDM. These papers were screened by title and abstract content for inclusion. The references of the included papers were hand-searched for other eligible papers.

Included papers were critically appraised with a "context-mechanism-impacts/outcomes" framework. Appraised papers were summarized using specifically developed templates and discussed to achieve the final consensus on how it addressed the review questions. The conceptual framework developed for the literature review included:

  • Context: integrated CDM, evidence based practice, evidence-based policy patient or disease registers, "decision analytics";

  • Mechanisms: methods to assess and manage quality of the register/EHR and data quality based on "fitness for purpose", ontology-based approaches;

  • Impacts/outcomes: Measurable impacts outcomes based on improved quality of the register, data quality, "fitness for purpose", "decision analytics".

The search strategy and keywords were organised around the three broad concepts:

  • Context: Diseases (chronic diseases, chronic illnesses, chronic disease management, chronic illness management, electronic health records (EHRs), registers);

  • Mechanisms: Data models and ontology (ontological based models, ontology approaches, ontology based multi agent systems (OBMAS), and ontological framework);

  • Impacts: Data Quality (data quality, information quality, data quality management, data quality assessment, quality of register, fitness for purpose).

The search was repeated three times with the following phrases:

(data quality OR information quality) AND ("fitness for purpose" OR "fitness for use") AND (quality of register* OR quality of electronic health records) AND (decision analytics) in Title, Abstract or Keywords, Subject or MESH

(ontology OR data model*) in Title, Abstract or Keywords, Subject or MESH AND (data quality OR information quality OR quality of register) in Title, Abstract or Keywords, Subject or MESH AND (fitness for purpose OR fitness for use) AND (decision analytics) in Title, Abstract or Keywords, Subject or MESH

((ontology AND traditional data model*) in Title, Abstract or Keywords, Subject or MESH OR (ontology AND SQL) in Title, Abstract or Keywords, Subject or MESH) AND (chronic diseases OR chronic illnesses) in Title, Abstract or Keywords, Subject or MESH AND (data quality OR information quality OR quality of register) in Title, Abstract or Keywords, Subject or MESH.

The initial screening of the articles was based on their abstracts. AR read all abstracts independently and studies without electronic abstracts were excluded. Selection of relevant articles was based on the information obtained from the abstracts and was agreed upon in discussion with co-authors. In the case of differences, the original paper was obtained and agreement was achieved after it was read. We hand-searched the references of the included papers to ensure completeness of the search. Papers that satisfied the inclusion criteria were independently examined by authors and any disagreements resolved by consensus. AR appraised all 52 papers using the realist "context-mechanism-impacts/outcomes" approach using extraction template (see Additional file 1: Figure S1).

The template kept the extracted information consistent and focused on the analysis and synthesis of the literature review by study types, methods, tools, outputs and impacts in terms of: requirements analysis, design and tools development, implementation, deployment and testing, evaluation: descriptive evaluation, comparative and/or contemporary control. The quality appraisal uses traditional methods of critical appraisal for validity (internal and external), reliability, generalizability and relevance of the research methods, tools and measurements. We also classified a paper as having addressed "fitness for purpose" if it a) defined a purpose for the project or dataset and b) assessed whether the data or dataset was fit for the specified purpose.

Results

The main medical, computer and business sciences online databases were searched: MEDLINE (67 papers), the Cochrane Library (18 papers), ISI Web of Knowledge (35 papers), Science Direct (75 papers), Scopus (76 papers), IEEE Xplore (25 papers), and Springer (19 papers). All search strategies have been expanded in the following business databases consisting of (Emerald Fulltext, Business Source Premier, Biotechnology and Bioengineering Abstracts, British Humanities Index: BHI, Proquest Asian Business and Reference) to find more business analytics papers however the result demonstrated insufficient studies and no more paper in this area. Table 1 summarised the sources of the 315 papers found.

Table 1 Online databases used and papers found

In the first iteration, searches using a combination of keywords and controlled vocabulary term searches (specifically in Titles and Subjects fields of all papers) were conducted. The application of Titles and Subjects fields in a user’s search strategy and search limitation in each database has been shown to increase relevance, precision and recall (McJunkin 1995). We screened 315 papers, excluded 36 duplicates, 182 on abstract review and 46 on full-text review; leaving 52 papers for critical appraisal. Of these 6 papers conceptualized data quality within the fitness for purpose definition for a range of uses, 16 used a defined process to specify data quality for implementation, 2 papers used the ontology-specified implementation in DQ improvement compare with other non-ontological approaches, and 28 demonstrated how the impact of implementing ontology-based specifications for data quality in chronic disease management is being measured and evaluated.

It can be seen from the results of the field of publications in Table 1 that 85 papers (26.98%) in the medicine and health areas, 44 papers (13.97%) in computer and IT sciences and also 186 papers (59.05%) in the multi-disciplinary areas which is significantly more than the other two groups.

Figure 1 shows how other eligible papers were included in the second iteration using hand-searching process. The references were retrieved from the papers included in the first iteration. The keywords of references that matched with the search keywords were chosen. Based on their title, keywords, abstract and full text, 7 papers were included from the hand-searching.

Figure 1
figure1

Paper selection process.

It can be seen from the data in Table 2 that most of the papers (54%) show the various roles and impacts of ontology based approaches in CDM and how those approaches can be evaluated.

Table 2 Distribution of papers by review questions

Table 3 presents the analysis of papers by study type and how they contributed to the review questions. The majority (83%) of studies involved design and tools development; 38% implemented/deployed and tested implementations; and 20% conducted a descriptive evaluation. A considerable number of studies (42 papers) demonstrate that the ontological approach was used to address semantic interoperability, data linkage, data integration, remote patient monitoring and reduce complexity of information models and networks. The majority of ontology-specified implementations in this category did not compare the performances and processes between ontology and non-ontology approaches. There were few attempts to conceptualize data quality based on "fitness for purpose" definition in a range of uses and purposes.

Table 3 Distribution of papers by study types and review questions

Figure 2 shows an increase in papers on ontology in CDM and DQ from 2006. There is an increase in studies reporting on the use of "fitness for purpose" when dealing with data quality from 2010 (probably started with the small spike in 2007). This suggests that researchers may be starting to take a more realistic approach to the quality of "big data": the intrinsic data quality is important but it does not need to be prefect to be "fit for purpose".

Figure 2
figure2

Distribution of papers from each category by year.

Figure 3 gives a breakdown of the frequency of the studies conducted in different continents 2006 based on the setting of the studies. Europe is the most profile with 42.6% of the authors affiliated with European universities and institutions. North America is next with 21.3% of the studies followed by Oceania (18%), Asia (13.1%), South America (3.3%) and Africa with 1.7%. Although a paper being affiliated to a particular university in a country does not necessarily mean that the context under study has been in the same country or even continent, it might provide insights to a limited extent. For example, data quality research and ontological frameworks proposed seem to be much higher in the European countries. That might be because of a greater concern with DQ and/or ontologies in Europe. North America, Oceania and Asia stand in the second, third and fourth spot after Europe in terms of the number of studies that have been conducted. South America and Africa have a relatively lower rate of papers than the other continents, which is consistent with the general trends. The distribution of papers by continent might suggest that the topic has grabbed the attention of academics as well as health professionals as a major concern for patients registers.

Figure 3
figure3

Distribution of papers found by continent.

The drivers of ontological approaches for DQ and/or CDM include better software for: (1) quality of care and/or health care issues and (2) the description, assessment and management of DQ in health (e.g. role of clinical guidelines in DQ, effects of quality of information in CISs and networking, defining and describing various attributes of DQ) as well as individual dimensions of DQ (e.g. accuracy, completeness, correctness, and consistency).

Conceptualization of data quality within the "fitness for purpose" paradigm

Table 4 shows a few studies have conceptualized and implemented data quality based on the "fitness for purpose" definition in their data models for a range of uses in health and non-health areas including improved searches for spatial data resources, including in languages other than English (Ivanova et al. 2013), support expert users in the assessment of the fitness for purpose of a given dataset (Devillers et al. 2007), better decision making (Chen 2009), support analyses in comparative effectiveness research (Kahn et al. 2012), support agents to choose how much information to gather (Chen 2009), and for research and clinical purposes (Liaw et al. 2011).

Table 4 Papers where data quality was conceptualized within fitness for purpose paradigm

Many studies regularly report a range of deficiencies in the collected information for professionals requirements (Devillers et al. 2007; Kahn et al. 2002), clinical (Azaouagh and Stausberg 2008; de Lusignan et al. 2010; Mitchell and Westerduin 2008; Moro and Morsillo 2004) and health promotion (Gillies 2000b) purposes. Similar deficiencies exist with information data in geographic (Devillers et al. 2007; Ivanova et al. 2013), hospital and general practice information systems (Liaw et al. 2012), where the lack of coding rules meant that much of the data are often incomplete or in relatively inaccessible text format. The evidence is more encouraging for data for administrative purposes (Lain et al. 2008; Quan et al. 2008). Hybrid record keeping systems in primary care are believed to be more complete than computer-only or paper-only systems (Hamilton et al. 2003).

Methodologies to specify data quality for implementation

Table 5 shows that the majority of studies (81%) reported the design and development of tools to specify data quality for implementation; requirements analysis e.g. literature reviews and qualitative research methodologies (75%); system implementation, deployment and testing of information systems (25%), and descriptive evaluation (12%). There were no outcomes or comparative evaluation of the methodologies used.

Table 5 Methodologies used to specify data quality for implementation

Various qualitative methods such as interview and reports analysis, usually interpreted using grounded theory have been implemented to evaluate usability (Kerr et al. 2007), privacy (Stvilia et al. 2009), comparability (Kerr et al. 2007) and relevance (Kerr et al. 2007). Consistency (Chen et al. 2009) of data has been assessed with concept mapping in non-health contexts. Timeliness (currency) (Huaman et al. 2009; Kerr et al. 2007), accuracy (precision) (Stvilia et al. 2009), reliability (Britt et al. 2007), representativeness (Britt et al. 2007), correctness (Gillies 2000a) and completeness (Kiragga et al. 2011) were assessed with quantitative statistical methods.

Ontology-specified implementation to develop data quality and compare with other models

Table 6 shows two papers found that used ontological and non-ontological approaches to DQ in clinical information systems (CIS). Both papers suggested that ontology-based models had more advantages than other data models in the health domain. For example, Mabotuwana and Warren ( 2009) showed the ontology driven approach to determining patients who needed a follow-up in hypertension management provided more advantages than SQL. They listed the limitations of the traditional SQL-based approach as i) lack of abstract, domain-level query support; ii) lack of the notion of a hierarchy and iii) nature of temporal SQL queries (Mabotuwana and Warren 2009). They used SWRL rules which allow user to write rules to reason about individuals and to infer new knowledge about these individuals. The ontology based approach was sufficiently flexible to enable new audit criteria to be easily added as required, easy visualization of the knowledge base and standardized ways of querying the knowledge based. However, the paper was not explicit about whether was a formal outcome-based comparison of ontological and non-ontological approaches was conducted.

Table 6 Studies that compared ontologies and other data models in specification and implementation

Maragoudakis et al. ( 2008) developed an ontology with 5 domains for a clinical Decision Support System (CDSS) for management of Chronic Obstruction Pulmonary Disease (COPD). The ontology, based on hierarchical Bayesian networks, encoded a domain (COPD) and compared the predictive accuracy of this ontology-based hierarchical Bayesian network method with linear programming and artificial neural network methods (Maragoudakis et al. 2008).

By using 10-fold cross validation and precision and recall metrics, they concluded that the Hierarchical Bayesian method is comparable to Artificial Neural Network (ANN) and far more accurate than linear programming approaches. In addition, their ontology can be easily updated with new elements, while using ANN to do this would be a painstaking laborious process. The most important advantage of such an approach, however, is the ability to shift this model to other domains, incorporating new mobile network appliances - such as GPS - and new hospitals and other health institutes, in an attempt to effectively monitor a patient in different locations.

The impact of ontologies for data quality in CDM and their evaluation

As Table 7 shows, a considerable amount of studies in this category have been published on the application of ontologies in both health and non-health areas. However, they do not compare ontologies with other data models. Studies to demonstrate the impact of ontology-based implementations included clinical decision support systems (Brüggemann and Grüning 2009; Min et al. 2009; Topalis et al. 2011) for information management (O’Donoghue et al. 2009; Young et al. 2009), diagnosis (Nimmagadda et al. 2008), clinical data analysis and management (Li and Ko 2007). A few studies examined ontology-based approaches to support data consistency (Esposito 2008a) and accuracy. However, we found no reports on any systematic and comprehensive ontological approaches to DQ issues or evaluation in the various contexts.

Table 7 The impact of implemented ontologies for the management of data quality

The application of ontological approaches to data quality management addressed the following issues: data quality problems and errors (Brüggemann and Grüning 2009), data heterogeneity problem (Min et al. 2009), semantic decision making (Lee et al. 2009), efficient services (Li and Ko 2007), procedures concerning the acquisition of data (Nimmagadda et al. 2008), classification and identification of specific patients types (Lee et al. 2009; Wang et al. 2007), data collection, data sharing and data integration (Min et al. 2009; O’Donoghue et al. 2009; Perez-Rey et al. 2006; Young et al. 2009). There were no studies that examined efficiency or effectiveness of ontology-based models in DQ management.

As Table 8 represents, the second application is the use of domain ontologies for the assessment of data quality in the querying requirements (Mabotuwana and Warren 2009), extracting knowledge from natural language documents (Valencia-Garcia et al. 2008), and data expression (Preece et al. 2008). The majority of these studies used precision and recall as metrics to assess the accuracy and validate the ontological approaches (Brank et al. 2005; Brewster et al. 2004; Euzenat 2007; Gangemi et al. 2006; Li 2010; Min et al. 2009; Pathak et al. 2012a, 2012b; Spasic and Ananiadou, 2005; Stvilia et al. 2009; Valencia-Garcia et al. 2008; Wang et al. 2007).

Table 8 The impact of implemented ontologies for the assessment of data quality

Despite a growing body of literature on ontology-based approaches in assessing the accuracy of the retrieval of clinical data, none of them have attempted to compare the performance between ontology-based and other (non-ontological) approaches. Most studies have used precision and sensitivity (recall) to assess the accuracy of ontology-based approaches in health domains (Brewster et al. 2004; Euzenat 2007; Gangemi et al. 2006; McGarry et al. 2007; Min et al. 2009; Pathak et al. 2012a, 2012b; Spasic and Ananiadou 2005; Stvilia et al. 2009; Valencia-Garcia et al. 2008; Wang et al. 2007).

Table 9 illustrates various definitions to identify the most common criteria to assess validity of ontologies and data models. Studies have attempted to define criteria such as Flexibility, Reusability, Cohesiveness, Precision, and Recall. However, there are less coordinated attempts to define other criteria such as Scalability, Completeness, Correctness, Extensibility, and Adaptability.

Table 9 Metrics to evaluate and compare ontology and traditional data model approaches

There are overlaps in the definition of criteria such as Flexibility, Scalability, Completeness, Correctness, Extensibility, and Adaptability in both ontological and non-ontological approaches. There were no guidance on the definition and scope of Reusability, Cohesiveness, Precision, and Recall in the data model approaches in the literature. Standardising these metrics can help to standardise the specification of ontologies and data models. This can then standardise the comparison of ontology and non-ontology approaches.

Discussion

This review examined the role of ontology-based approaches to develop data quality based on "fitness for purpose" in the health context. The findings updated and corroborated much of our previous work in this field and added new knowledge to ontology-based approaches to data quality and "fitness for purpose" of information systems.

How is data quality being conceptualized within the "fitness for purpose" definition for a range of uses?

We found few papers on DQ used within the definition of fitness for purpose. There are more studies on the ontologies for management of DQ (26 papers) and assessment of DQ in all contexts (11 papers). These findings support the current perception of DQ as a complex concept with many dimensions, often overlapping conceptually (Wand and Wang 1996). Liaw et al. ( 2011) developed a conceptual framework for DQ that include intrinsic DQ (correctness and consistency) of data elements and fitness for purpose (completeness) of data set for research and clinical purpose.

What specification methodologies are being used to specify data quality for implementation?

The literature on the specification of data quality for implementation is fragmentary and there is not a comprehensive approach. The findings of the current study are consistent with our previous review (Liaw et al. 2013) that the ontological approach to develop DQ is poorly evaluated. However, most agreed that DQ is a multidimensional construct (Devillers et al. 2007; Nimmagadda et al. 2008); with completeness, accuracy, correctness, consistency and timeliness being the most commonly used dimensions. A few studies examined ontology-based approaches to support data consistency and accuracy. However, no research was found that formally and systematically assessed the association between ontologies for DQ and fitness for purpose in various contexts.

What ontology-specified implementations are being used and how do they compare with other methods?

There were few comparative and evaluative studies on assessment of data quality or compared ontological and non-ontological approaches to representing knowledge in clinical information systems. This literature review suggests that, compared to non-hierarchical data models, there may be more advantages and benefits in the use of ontologies to solve semantic clinical data quality issues and improve the validity and reliability of data retrieval, collection, storage, extraction and linkage algorithms and tools. Formal ontological approaches enable the systematic development of automated, valid and reliable methods to assess and manage the DQ and semantic interoperability issues (Lee et al. 2009; Valencia-Garcia et al. 2008; Verma et al. 2009, 2008). The expressiveness of ontology based models can facilitate accuracy and precision compared to non-ontology models and approaches (Esposito 2008a, 2008b; Preece et al. 2008).

Current ontological approaches have limited evaluation. There are little comparative studies in the chronic disease management domain and even less examining data quality. The challenges to the development and validation of an ontology-based model to the assessment and management of DQ include methodological immaturity, an immature knowledge base, and a lack of tools to support ontology-based design of information systems, evaluation of ontological approaches, and engagement of users in design and implementations. There are insufficient studies to define ontology evaluation metrics comprehensively and show practical techniques to evaluate ontological approaches in terms of flexibility, scalability and reusability versus non-ontology based models.

How is the impact of implementing ontology-based specifications for data quality in chronic disease management being measured and evaluated?

Current evidence demonstrates there is a lack of valid and reliable data quality assurance (Arts et al. 2003, 2002b) to ensure fitness for a range of uses by consumers, patients, health providers and professionals. This study has added to our understanding of ontology-based approaches to improve the quality of the data so it is useful for the various purposes such as clinical research, teaching, audit and evaluation. (e.g. quality assurance and clinical decision making). The main advantages of building ontologies for data quality in health are to automate the extraction of data from EHRs into clinical data warehouses; assessment and management of the intrinsic quality and completeness of this "big data" so that they are fit for purposes such as research, quality improvement and health information exchange and sharing; management of controlled vocabularies and optimising semantic interoperability; curation of data for use by human users and applications such as electronic decision support systems; mining of data to discover relationships between the concepts; discovery of new knowledge; and reuse of knowledge in the management of chronic diseases (Abidi 2011; Buranarach et al. 2009; Gedzelman et al. 2005; Gupta et al. 2003; Jara et al. 2009).

Limitations of the review

The majority of studies involved design and tools development for data models and ontologies in health area and chronic diseases rather than implementation, deployment and evaluation of the relevant procedures and tools. The trends are encouraging for ontological approaches. However, there are no formal large scale studies to systematically compare the quality of outputs of ontological to non-ontological approaches to the assessment and management of data quality and fitness for purpose of the implementations. We did not search the grey literature, an important source in this relatively immature field. However, there were also limitations of access to proprietary materials. In future investigations it might be possible to use an ontological approach to develop data quality in different administrative, financial and clinical information systems.

Managerial implication

The findings of this study have several important practical implications for developing enterprise information systems. For instance, a health organisation can determine the current status of advancement of their ontology and information model, to guide the further design of a semantic strategy and to achieve specific goals, given the current data quality in their clinical information systems (CIS). The findings of this study and our previous review may serve as a benchmark for developing an ontology model as a tool for assessing and managing data quality in clinical information systems.

Also, for the development of CIS and clinical data warehouses managers can determine which features or functions of ontology based approaches could support their health professionals and patients better. Additionally, managers can use the ontology model to develop their information system in terms of all dimensions of data quality: it can show them the major strengths and weaknesses of their quality of information in terms of supporting end users in their decision making process. This is the fitness for purpose paradigm.

Conclusion

The understanding of data quality, as a multidimensional concept applied to the data elements (intrinsic DQ) and the set of data elements (extrinsic DQ) is progressing. Ontological approaches are emerging and theoretically important to address the complex relationships among overlapping concepts in this complex area. This review has described the current published literature in this domain and points to number of directions for ongoing research into the use of ontological approaches to managing the fitness for purpose of "big data" from multiple EHRs.

Abbreviations

CDM:

Chronic disease management

DQ:

Data quality

DQM:

Data quality management

ePBRN:

The electronic practice based research network

EHR:

Electronic health records

CIS:

Clinical information system

GPS:

General practice system

OBMAS:

Ontology based multi-agent system

SNOMED CT AU:

Systematised nomenclature of medicine clinical term Australian release

MESH:

Medical subject headings

COPD:

Chronic obstructive pulmonary disease

ANN:

Artificial neural network

SWRL:

Semantic web rule language

SPARQL:

Semantic protocol and RDF query language.

References

  1. Abidi SR: Ontology-based knowledge modeling to provide decision support for comorbid diseases. 2011. Paper presented at the 19th European Conference in Artificial Intelligence. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0–79952016090&partnerID=40&md5=d6e8e7441e3e9118fa395e5fc0b77b95

  2. Arts D, de Keizer N, Scheffer GJ, de Jonge E: Quality of data collected for severity of illness scores in the Dutch National Intensive Care Evaluation (NICE) registry. Intensive Care Medicine 2002a,28(5):656–659. 10.1007/s00134-002-1272-z

  3. Arts DG, de Keizer NF, Scheffer GJ: Defining and improving data quality in medical registries: a literature review, case study, and generic framework. Journal of the American Medical Informatics Association 2002b,9(6):600–611. 10.1197/jamia.M1087

  4. Arts DG, Bosman RJ, de Jonge E, Joore JC, de Keizer NF: Training in data definitions improves quality of intensive care data. Critical Care 2003,7(2):179–184. 10.1186/cc1886

  5. Azaouagh A, Stausberg J: Frequency of hospital-acquired pneumonia–comparison between electronic and paper-based patient records. Pneumologie 2008,62(5):273–278. 10.1055/s-2008-1038099

  6. Brank J, Grobelnik M, Mladenić D: A survey of ontology evaluation techniques. 2005. Paper presented at the Proc. of 8th Int. Multi-Conf. Information Society

  7. Brewster C, Alani H, Dasmahapatra S, Wilks Y: Data Driven Ontology Evaluation. 2004. Paper presented at the International Conference on Language Resources and Evaluation. Retrieved from http://eprints.soton.ac.uk/259062/

  8. Britt H, Miller G, Bayrarn C: The quality of data on general practice - a discussion of BEACH reliability and validity. Australian Family Physician 2007,36(1–2):36–40.

  9. Brüggemann S, Grüning F: Using ontologies providing domain knowledge for data quality management. Studies in Computational Intelligence 2009, 221: 187–203. 10.1007/978-3-642-02184-8_13

  10. Buranarach M, Chalortham N, Chatvorawit P, Thein Y, Supnithi T: An ontology-based framework for development of clinical reminder system to support chronic disease healthcare. 2009.

  11. Chen FH: Modeling the effect of information quality on risk behavior change and the transmission of infectious diseases. Mathematical Biosciences 2009,217(2):125–133. 10.1016/j.mbs.2008.11.005

  12. Chen WL, Zhang SD, Gao X: Anchoring the Consistency Dimension of Data Quality Using Ontology in Data Integration. 2009 Sixth Web Information Systems and Applications Conference, IEEE 2009.

  13. Choquet R, Qouiyd S, Ouagne D, Pasche E, Daniel C, Boussaïd O, et al.: The information quality triangle: A methodology to assess clinical information quality. 2010. Paper presented at the 13th World Congress on Medical and Health Informatics, Medinfo 2010, Cape Town

  14. Cunningham-Myrie C, Reid M, Forrester TE: A comparative study of the quality and availability of health information used to facilitate cost burden analysis of diabetes and hypertension in the Caribbean. West Indian Medical Journal 2008,57(4):383–392.

  15. de Lusignan S, Khunti K, Belsey J, Hattersley A, van Vlymen J, Gallagher H, et al.: A method of identifying and correcting miscoding, misclassification and misdiagnosis in diabetes: a pilot and validation study of routinely collected data. Diabetic Medicine 2010, 27: 203–209. 10.1111/j.1464-5491.2009.02917.x

  16. Devillers R, Bedard Y, Jeansoulin R, Moulin B: Towards spatial data quality information analysis tools for experts assessing the fitness for use of spatial data. International Journal of Geographical Information Science 2007,21(3):261–282. 10.1080/13658810600911879

  17. Esposito M: Congenital Heart Disease: An ontology-based approach for the examination of the cardiovascular system. In Knowledge - Based Intelligent Information and Engineering Systems, Pt 1, Proceedings Vol. 5177 Edited by: Lovrek I. 2008a, 509–516.

  18. Esposito M: An ontological and non-monotonic rule-based approach to label medical images. Los Alamitos: IEEE Computer Soc; 2008b.

  19. Euzenat J: Semantic Precision and Recall for Ontology Alignment Evaluation. 2007. Paper presented at the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07)

  20. Gangemi A, Catenacci C, Ciaramita M, Lehmann J: Modelling ontology evaluation and validation. 2006. Paper presented at the Proceedings of the 3rd European conference on The Semantic Web: research and applications

  21. Gedzelman S, Simonet M, Bernhard D, Diallo G, Palmer P: Building an ontology of cardio-vascular diseases for concept-based information retrieval. Computers in Cardiology 2005, 32: 255–258.

  22. Gillies A: Assessing and improving the quality of information for health evaluation and promotion. Methods of Information in Medicine 2000a,39(3):4.

  23. Gillies A: Assessing and improving the quality of information for health evaluation and promotion. Methods of Information in Medicine 2000b,39(3):208–212.

  24. Gupta A, Ludäscher B, Grethe JS, Martone ME: Towards a formalization of disease-specific ontologies for neuroinformatics. Neural Networks 2003,16(9):1277–1292. 10.1016/j.neunet.2003.07.008

  25. Hamilton WT, Round AP, Sharp D, Peters TJ: The quality of record keeping in primary care: a comparison of computerised, paper and hybrid systems. The British Journal of General Practice 2003,53(497):929–933. discussion 933

  26. Huaman MA, Araujo-Castillo RV, Soto G, Neyra JM, Quispe JA, Fernandez MF, et al.: Impact of two interventions on timeliness and data quality of an electronic disease surveillance system in a resource limited setting (Peru): a prospective evaluation. BMC Med Inform Decis Mak. 2009, 9: 16. 10.1186/1472-6947-9-16

  27. Ivanova I, Morales J, de By RA, Beshe TS, Gebresilassie MA: Searching for spatial data resources by fitness for use. Journal of Spatial Science 2013,58(1):15–28. 10.1080/14498596.2012.759087

  28. Jacquelinet C, Burgun A, Delamarre D, Strang N, Djabbour S, Boutin B, et al.: Developing the ontological foundations of a terminological system for end-stage diseases, organ failure, dialysis and transplantation. International Journal of Medical Informatics 2003,70(2–3):317–328. doi:10.1016/S1386–5056(03)00046–7

  29. Jara AJ, Blaya FJ, Zamora MA, Skarmeta AFG: An Ontology and Rule Based Intelligent Information System to Detect and Predict Myocardial Diseases. New York: IEEE; 2009.

  30. Kahn BK, Strong DM, Wang RY: Information quality benchmarks: product and service performance. Communications of the ACM 2002,45(4):8.

  31. Kahn MG, Batson D, Schilling LM: Data model considerations for clinical effectiveness researchers. Medical Care 2012, 50 Suppl: S60-S67.

  32. Kerr K, Norris A, Stockdale R: Data quality, information and decision making: a healthcare case study. 2007. Paper presented at the 18th Australasian Conference on Information Systems, Toowoomba, Australia

  33. Kiragga AN, Castelnuovo B, Schaefer P, Muwonge T, Easterbrook PJ: Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care. Journal of the International AIDS Society 2011.,14(1):

  34. Lain SJ, Roberts CL, Hadfield RM, Bell JC, Morris JM: How accurate is the reporting of obstetric haemorrhage in hospital discharge data? A validation study. Australian and New Zealand Journal of Obstetrics and Gynaecology 2008,48(5):481–484. 10.1111/j.1479-828X.2008.00910.x

  35. Lee CS, Wang MH, Acampora G, Loia V, Hsu CY: Ontology-based Intelligent Fuzzy Agent for Diabetes Application. New York: IEEE; 2009.

  36. Li Z: An ontology-driven concept-based information retrieveal approach for Web documents. Edmonton, Alberta: University of Alberta; 2010.

  37. Li HC, Ko WM: Automated food ontology construction mechanism for diabetes diet care. New York: IEEE; 2007.

  38. Liaw S, Taggart J, Dennis S, Yeo A: Data quality and fitness for purpose of routinely collected data – a general practice case study from an electronic Practice-Based Research Network (ePBRN). In AMIA 2011 Annual Symposium Improving Health: Informatics and IT Changing the World; October 22–26, 2011. Washington DC, US: AMIA; 2011:785–94.

  39. Liaw ST, Chen HY, Maneze D, Taggart J, Dennis S, Vagholkar S, Bunker J: Health reform: is routinely collected electronic information fit for purpose? Emergency Medicine Australasia 2012,24(1):57–63. 10.1111/j.1742-6723.2011.01486.x

  40. Liaw ST, Rahimi A, Ray P, Taggart J, Dennis S, de Lusignan S, et al.: Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. International Journal of Medical Informatics 2013,82(2):139. 10.1016/j.ijmedinf.2012.12.007

  41. Lima L, Novais P, Costa R, Cruz J, Neves J: Decision Making Based on Quality-of-Information a Clinical Guideline for Chronic Obstructive Pulmonary Disease Scenario. In Distributed Computing and Artificial Intelligence Vol. 79. Edited by: de Leon A, de Carvalho F, Rodríguez-González S, De Paz Santana J, Rodríguez J. Berlin/Heidelberg: Springer; 2010:417–424.

  42. Mabotuwana T, Warren J: An ontology-based approach to enhance querying capabilities of general practice medicine for better management of hypertension. Artificial Intelligence in Medicine 2009,47(2):87–103. 10.1016/j.artmed.2009.07.001

  43. Maiga G, Williams D: A flexible approach for user evaluation of biomedical ontologies. International Journal of Computing and ICT Research 2008,2(2):62–74.

  44. Maragoudakis M, Lymberopoulos D, Fakotakis N, Spiropoulos K: A Hierarchical, Ontology-Driven Bayesian Concept for Ubiquitous Medical Environments- A Case Study for Pulmonary Diseases. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vols 1–8. New York: IEEE; 2008:3807–3810.

  45. McGarry K, Garfield S, Wermter S: Auto-extraction, representation and integration of a diabetes ontology using Bayesian networks. In Twentieth IEEE International Symposium on Computer-Based Medical Systems, Proceedings Edited by: Kokol P, Podgorelec V, MiceticTurk D, Zorman M, Verlic M. 2007, 612–617.

  46. McJunkin MC: Precision and recall in title keyword searches. Information Technology and Libraries 1995,14(3):161–171.

  47. Min H, Manion FJ, Goralczyk E, Wong YN, Ross E, Beck JR: Integration of prostate cancer clinical data using an ontology. Journal of Biomedical Informatics 2009,42(6):1035–1045. 10.1016/j.jbi.2009.05.007

  48. Mitchell J, Westerduin F: Emergency department information system diagnosis: how accurate is it? Emergency Medicine Journal 2008,25(11):784. 10.1136/emj.2007.050104

  49. Moody DL, Shanks GG: Improving the quality of data models: empirical validation of a quality management framework. Information Systems 2003,28(6):619–650. 10.1016/S0306-4379(02)00043-1

  50. Moro ML, Morsillo F: Can hospital discharge diagnoses be used for surveillance of surgical-site infections? Journal of Hospital Infection 2004,56(3):239–241. 10.1016/j.jhin.2003.12.022

  51. Nimmagadda SL, Nimmagadda SK, Dreher H: Ontology based data warehouse modeling and managing ecology of human body for disease and drug prescription management. 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies 2008, 465–473.

  52. O’Donoghue J, Herbert J, O’Reilly P, Sammon D: Towards Improved Information Quality: The Integration of Body Area Network Data within Electronic Health Records. In Ambient Assistive Health and Wellness Management in the Heart of the City, Proceeding Vol. 5597 Edited by: Mokhtari M, Khalil I, Bauchet J, Zhang D, Nugent C. 2009, 299–302.

  53. Orme AM, Yao H, Etzkorn LH: Indicating ontology data quality, stability, and completeness throughout ontology evolution. Journal of Software Maintenance and Evolution-Research and Practice 2007,19(1):49–75. 10.1002/smr.341

  54. Pannarale P, Catalano D, De Caro G, Grillo G, Leo P, Pappada G, et al.: GIDL: a rule based expert system for GenBank intelligent data loading into the molecular biodiversity database. BMC Bioinformatics 2012,13(Suppl 4):S4. 10.1186/1471-2105-13-S4-S4

  55. Pathak J, Kiefer RC, Bielinski SJ, Chute CG: Mining the human phenome using semantic web technologies: a case study for type 2 diabetes. AMIA Annual Symposium Proceedings 2012a, 2012: 699–708.

  56. Pathak J, Kiefer RC, Chute CG: Using semantic web technologies for cohort identification from electronic health records for clinical research. AMIA Summits on Translational Science Proceedings 2012b, 2012: 10–19.

  57. Perez-Rey D, Maojo V, Garcia-Remesal M, Alonso-Calvo R, Billhardt H, Martin-Sanchez F, et al.: ONTOFUSION: ontology-based integration of genomic and clinical databases. Computers in Biology and Medicine 2006,36(7–8):712–730.

  58. Pinto HS: Ontologies: how can they be built? Knowledge and Information Systems 2004,6(4):441–464. 10.1007/s10115-003-0138-1

  59. Preece A, Missier P, Ernbury S, Jin B, Greenwood M: An ontology-based approach to handling information quality in e-science. Concurrency and Computation-Practice and Experience 2008,20(3):253–264.

  60. Quan H, Li B, Saunders LD, Parsons GA, Nilsson CI, Alibhai A, et al.: Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Services Research 2008,43(4):1424–1441. 10.1111/j.1475-6773.2007.00822.x

  61. Redman T: Measuring data accuracy. In Information Quality. Edited by: Rea W. Armonk NY: ME Sharpe Inc.; 2005:21.

  62. Spasic I, Ananiadou S: A flexible measure of contextual similarity for biomedical terms. Pacific Symposium on Biocomputing 2005, 10: 197–208.

  63. Stvilia B, Mon L, Yi YJ: A model for online consumer health information quality. Journal of the American Society for Information Science and Technology 2009,60(9):1781–1791. 10.1002/asi.21115

  64. Topalis P, Dialynas E, Mitraka E, Deligianni E, Siden-Kiamos I, Louis C: A set of ontologies to drive tools for the control of vector-borne diseases. Journal of Biomedical Informatics 2011,44(1):42–47. 10.1016/j.jbi.2010.03.012

  65. Valencia-Garcia R, Fernandez-Breis JT, Ruiz-Martinez JM, Garcia-Sanchez F, Martinez-Bejar R: A knowledge acquisition methodology to ontology construction for information retrieval from medical documents. Expert Systems 2008,25(3):314–334. 10.1111/j.1468-0394.2008.00464.x

  66. Verma A, Kasabov N, Rush A, Song Q: Ontology based personalized modeling for chronic disease risk analysis: an integrated approach. 2008. Paper presented at the The 15th international conference on Advances in neuro-information processing

  67. Verma A, Fiasché M, Cuzzola M, Iacopino P, Morabito P, Kasabov N: Ontology based personalized modeling for type 2 diabetes risk analysis: An Investigated Approach. In ICONIP 2009, Part II. Edited by: Leung CS, Lee M, Chan JH. Berlin: Springer-Verlag; 2009:360–366.

  68. Wand Y, Wang Y: Anchoring data quality dimensions in ontological foundations. Communications of the ACM 1996,36(11):86–95.

  69. Wang R: A product perspective on total data quality management. Communications of the ACM 1998,41(2):58–65. 10.1145/269012.269022

  70. Wang R, Strong D, Guarascio L: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 1996,12(4):5–33.

  71. Wang MH, Lee CS, Li HC, Ko WM: Ontology-based fuzzy inference agent for diabetes classification. New York: IEEE; 2007.

  72. Yao H, Orme A, Etzkorn LH: Cohesion metrics for ontology design and application. Journal of Computer Science 2005,1(1):107–113.

  73. Young L, Tu SW, Tennakoon L, Vismer D, Astakhov V, Gupta A, et al.: Ontology Driven Data Integration for Autism Research. In 2009 22nd IEEE International Symposium on Computer-Based Medical Systems. New York: IEEE; 2009:54–60.

Download references

Acknowledgment

The authors would like to thank Dr Sarah Dennis and Dr Sanjyot Vagholkar for their previous and ongoing contributions in this study.

Author information

Correspondence to Siaw-Teng Liaw.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

STL, AR and PR developed the conceptual framework and templates for the literature review. AR managed the review and appraised all included papers as part of his PhD studies. All authors discussed their appraisals with AR and STL to achieve consensus. AR prepared this paper iteratively with input from all co-authors prior to submission. All authors read and approved the final manuscript.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Data quality
  • Fitness for purpose
  • Data model
  • Ontology development methodology