 Research
 Open Access
 Published:
A Bayesian Networkbased customer satisfaction model: a tool for management decisions in railway transport
Decision Analytics volume 3, Article number: 4 (2016)
Abstract
We formalise and present an innovative general approach for developing complex system models from survey data by applying Bayesian Networks. The challenges and approaches to converting survey data into usable probability forms are explained and a general approach for integrating expert knowledge (judgements) into Bayesian complex system models is presented. The structural complexities of the Bayesian complex system modelling process, based on various decision contexts, are also explained along with a solution. A novel application of Bayesian complex system models as a management tool for decision making is demonstrated using a railway transport case study. Customer satisfaction, which is a Key Performance Indicator in public transport management, is modelled using data from customer surveys conducted by Queensland Rail, Australia.
Introduction
The success of a business is reflected in its established Key Performance Indicators (KPIs). In order to manage the KPIs of a particular business efficiently, it is essential to understand the current performance level of each KPI, the impacts of each KPI on the business objectives, the factors influencing each KPI, and the often complex network of interactions between these factors. Importantly, in order to facilitate timely evidencebased management, this understanding must be not only conceptual, but quantitative and adaptive to new information. Moreover, a manager requires the ability to not only analyse the current health of the system, but also to assess how the future business environment may influence the KPIs and hence the business objectives.
In this article we show how Bayesian Networks (BNs) can be developed and used as an effective management tool for KPI analysis where the KPIs represent the overall performances of various factors with objective and subjective performance measures. For specificity, we focus on one of the most significant and germane KPIs for business management, customer satisfaction (McCollKennedy and Schneider 2000) and base the model on one of the most ubiquitous methods for obtaining information about customer satisfaction, the attitudinal survey.
There are a wealth of methods for analysing customer satisfaction surveys. These include summary statistical evaluations, factor analysis and its variants including customer satisfaction indices (Fornell et al. 1996; Kristensen et al. 2000; Johnson et al. 2001), linear regression and its variants (Ting and Chen 2002; Chatterjee and Hadi 2008), nonparametric nonlinear approaches such as classification and regression trees (Death and Fabricius 2000), latent factor approaches such as structural equation models (Hackl and Westlund 2000), multicriteria approaches (Siskos et al. 1998), and so on. While these approaches offer many insights into the system of interest, they do not focus on modelling the system as a whole (Anderson et al. 2004; Kenett and Salini 2011). In many ways, a Bayesian Network borrows from all of these approaches to create a more flexible modelling environment and a whole system approach.
BNs have been used successfully to model complex systems in diverse fields including ecology and environment (Johnson et al. 2010; Denham et al. 2011), medicine (Donald et al. 2009; McGree et al. 2012; Radice 2012; Medina et al. 2013; Hsieh et al. 2013), finance (Sahin 2006; Sun and Shenoy 2007), business management (Lariviere and Porteus 1999; Anderson et al. 2004; Cai et al. 2011; Dogan 2012), project management (Melo and Sanchez 2008; Cho 2009; Lee et al. 2009), the military (Falzon 2006; Johansson and Falkman 2008) and transport (Ozbay and Noyan 2006; Janssens et al. 2006; Trucco et al. 2008; Ferreiro et al. 2012). BNs have also been used in the past for customer satisfaction modelling (Salini and Kenett 2009; Hsu et al. 2009; Gasparini et al. 2011; Kisioglu and Topcu 2011; Ferreira and Borenstein 2012; Eboli and Mazzulla 2011; Oa et al. 2013; Turkyilmaz et al. 2013; Perucca and Salini 2014). These studies have shown distinct advantages of BNs, including the ability to model complex interrelations between factors, perform scenario analysis, undertake sophisticated interrogations of the system, and include other sources of information in the model, such as observational and experimental data, results from previous experiments, learnings from published literature, expert judgement and so on. BNs have inherent capability to work with interrelated attributes (Yu et al. 2004) as opposed to the attribute exclusivity requirement for some other techniques such as Multicriteria Analysis. BNs also have the ability to show the impacts of lower level factors on intermediate level factors as well as top level factors (final outcomes) where as methods like CBA or MADM tends to consolidate intermediate factors into the final outcome. In this study we analyse customer satisfaction as KPI which is dependent on large number of factors with interrelations among them. BN is the most suitable technique for this study as it can represent the factor relationships of this KPI well and provides us with the ability to perform various analyses (such as whatif, influence, sensitivity) while showing the impacts throughout the complete factor structure.
In this study we introduce a novel general approach for developing Bayesian Networks from survey data. The approach provides practical guidelines for developing complex system models for real world systems in business. We focus on the effective use of survey data and demonstrate the new approach and the capabilities of the developed models in the context of a substantive case study of customer satisfaction in a largescale public transport network operated by Queensland Rail, Australia.
In the following sections we briefly explain the theoretical background of Bayesian Networks and then describe in more detail our general BN development approach. We follow this with a discussion of the case study and an exposition of the corresponding analyses.
Background
In this section we review the two underlying bases of our approach, survey data and Bayesian Networks.
Survey data
We identify five key features of survey questionnaires which can guide model development. First, the questionnaire itself can provide a clear description of the objective of the survey. Second, the questions themselves can assist in the definition of key explicit factors. Third, the grouping or categorisation of questions can assist in the identification of key latent factors. Fourth, a well structured questionnaire can provide guidance about the relationships between these factors. Finally, an attitudinal questionnaire typically facilitates quantification of the corresponding model. For example, responses may be categorical (e.g., gender) and hence proportions of responses in each category can be used as probabilistic estimates in the Bayesian Network. Alternatively, responses with an underlying continuous scale (e.g., age) may be treated as continuous or discretised into a set of ordered categories, such as “young” and “old”, where these are appropriately defined. Attitudinal responses measured on a Likert scale can be similarly assumed to be approximately continuous, depending on the context and scale, or discrete with the number of categories equalling the number of levels of the scale or a smaller number of categories representing, for example, High, Medium and Low, or Good and Bad. Finally, qualitative responses can be generally sorted into categories and included in the model. The BN development approach we describe below can be applied quite generally, for specificity we focus here on survey data obtained from attitudinal questionnaires. It is beyond the scope of this paper to discuss questionnaire design and conduct, but we utilise the features of a wellestablished and widelydeployed questionnaire in the construction of the BN. Although Bayesian Networks can be quantified using continuous probability distributions, here we use discrete (ordered or unordered) distributions. In this manner, more information can be extracted from a survey and incorporated in the BN. Decisions about the number of categories depend on the expected information loss obtained by aggregating responses versus the level of detail that the BN can bear, given the sample size and model structure. This is explained in further detail below.
Bayesian Networks
Bayesian Networks are also known as recursive graphical models, belief networks, causal probabilistic networks, causal networks and influence diagrams among others (Daly et al. 2011). A BN can be expressed as two components, the first qualitative and the second quantitative (Nadkarni and Shenoy 2001, 2004). The qualitative expression is depicted as a directed acyclic graph (DAG), which consists of a set of variables (denoted by nodes) and relationships between the variables (denoted by arcs) (Salini and Kenett 2009).
The quantitative expression comprises probabilities of the variables. Figure 1 shows a Bayesian Network with three variables X, Y and Z. Variables X and Y are parents for variable Z, which indicates that Z is the dependent node. The probability for Z is a conditional probability based on the probabilities of X and Y.
The probabilities in a Bayesian Network are simplified by the DAG structure of the BN, by applying directional separation (dseparation) (Pearl 1988) and a Markov property assumption (FV and Nielsen 2007; Johnson et al. 2010), so that the probability distribution of any variable is solely dependent on its parents. Thus, the probability distribution in a BN with n nodes \((X_1,\dots ,X_n)\) can be formulated as
where \(P_a(X_i)\) is the set of the probability distributions corresponding to the parents of node \(X_i\) (Heckerman et al. 1995; Johnson et al. 2010). For Fig. 1 the above equation can be written as
Model development approach
Here we describe our development approach for constructing Bayesian Network models from survey data in seven stages: identify the objective, develop the BN structure, formalise the structure of the model, quantify the model, interrogate the model, validate the model and communicate the results.
Stage I: identify the objective
From the available survey data identify one top level node, e.g., performance, cost, safety, etc. As discussed above, these are typically directly related to the business’s KPIs. One such objective becomes the top level node in the model. Without loss of generality, in this study we assume that there is one principal objective, hence one top level node (Customer satisfaction), although this node itself could comprise multiple subgoals.
Stage II: develop the network structure
The development of the Bayesian Network’s structure requires identification of a set of factors that influence the objective(s), and the relationships between these factors. It also requires definition of the categories or states for each factor, as discussed above. This stage is thus divided into four substages.
Stage II(a): identify key factors Identify the key factors that affect the principal objective(s) defined in Stage I. Key factors will become parent nodes of the top level node and will have further sub factors influencing them.
Stage II(b): identify remaining factors Identify the various factors influencing the key factors. This stage is applied repeatedly until the lowest level of factors is reached; these are not influenced by any other factors that are considered of interest to include in the model. Note that Stages II(a) and II(b) may be used interchangeably and achieved iteratively; for example, if at Stage II(b) there are several factors which can be combined to form a key factor, then this is included as per Stage II(a) after Stage II(b) is applied. The lowest level factors modelled by simple Conditional Probability Tables in a BN are similar to simple objects in Object Oriented Bayesian Networks (OOBNs) (Koller and Pfeffer 1997) and the BN higher level factors with complex CPTs are similar to OOBN complex objects comprised of several simple objects.
Stage II(c): identify node states The possible probability states for each factor can be identified based on available data and expert opinion. A factor may have binary states such as “Yes” and “No”; or may have multiple states such as “High”, “Medium” and “Low”. For example, in Fig 1, nodes X, Y and Z may represent age, sex and satisfaction rating, with states “\(\le\)30 years” and “30\({+}\) years”, “Male” and “Female”, and “High” and “Low”, respectively. Note that binary classifications are used here for illustration; more states can be defined in practice.
Stage II(d): identify relationships The relationships (shown as arrows in BN models) between the various nodes (objective, key factors, remaining factors) are identified based on their influences on each other. Often the relationships can be identified from the data but domain experts should be consulted for confirmation.
Stage III: formalise the model structure
The model structure depends on how the factors are organized in the model. The model structure is identified based on the decision perspectives. Same factors can be modelled in different structures based on the decision analysis requirements. In some cases the model structure can be learnt from the data (Salini and Kenett 2009) but more often the appropriate model structure is determined by, or at least requires validation from, the decision maker (domain expert).
The Bayesian Network model structure is also dependent on the type of nodes involved in it. Three commonly used BN nodes are the Nature node, Utility node and Decision node (Ticehurst et al. 2007). The nature node describes possible states of a variable and the probability of each state. This type of node could be qualitative or quantitative (discrete or continuous). A utility node is a continuous variable describing the desirability of the consequences of a set of outcomes. The decision node represents a controllable variable providing choice to the decision maker (Robertson 2004).
The ObjectOriented variation of Bayesian Networks is suitable for modelling large complex domains (Uusitalo 2007). OOBNs provide a framework for modelling large complex data structures by simplifying the knowledge representation and facilitating reuse of nodes and network fragments (Johnson et al. 2010). OOBNs provide an ideal platform to apply the object oriented concept of modern software development (Johnson et al. 2010).
The time variant nature of complex system structures can be modelled using a Dynamic Bayesian Network (DBN) which is an extension of a BN with additional capabilities for representing temporal data as time slices (Murphy and Mian 1999; Johnson et al. 2010).
Stage IV: quantify the model
Bayesian Network model quantification involves defining and estimating the probability distributions for each factor. Since we are considering discretised factors, these distributions can be represented as marginal probability tables for the nodes with no parents (terminal nodes), and conditional probability tables (CPTs) for the nodes with parents. For convenience, we denote all probability tables as CPTs. In the following, we use the survey data where possible, although the probabilities could be modified to include other information, for example from other studies or based on expert judgement, if the survey data are not informative about the CPTs or where the other data sources are considered to be more reliable.
The CPTs can be estimated in a number of ways, using direct or indirect methods. We describe four such approaches here.
Stage IV(a): direct quantification of terminal nodes The Conditional Probability Table for a terminal node can be quantified by calculating the relevant proportions in the survey data. For example, referring to Fig. 1 and continuing the example in Stage II(c) above, the CPT for Y (gender) would be estimated by calculating the proportion of male and female respondents in the survey or, if the information is available, in the reference population. Again, it is beyond the scope of this paper to discuss sampling and estimation issues related to surveys; further detail can be found elsewhere (Cochran 1977; Scheaffer et al. 2011).
Stage IV(b): direct quantification of other nodes If the survey data are sufficiently rich in terms of interactions between questions and sample size, Conditional Probability Tables for other nodes can be estimated by crossclassification of relevant questions in the questionnaire. For example, returning to Fig. 1 and the running example, the CPT for Z (satisfaction) can be estimated by creating a \(2\times 2\times 2\) table of satisfaction responses (high/medium/low) by age (young/old) and gender (male/female). More generally, the probability of a combined state \(X = L, Y = L\) can be estimated by cross classification as
where n is the number of observations with \(X = L\) and \(Y = L\), and N is the total number of observations.
Stage IV(c): indirect quantification of other nodes by simulation In more complex networks, it is often possible to estimate part of the Conditional Probability Table by crossclassification but to have no data with which to estimate other cells. In this case, it may be possible to develop a statistical model that allows for simulation of the missing probabilities, given the completed cells. For example, returning again to the running example associated with Fig. 1, suppose that three out of the four cells are known for the age by gender table; given the marginal probabilities for age and gender and making some assumptions about the interaction between the two factors, the missing cell could be estimated in this manner.
Stage IV(d): indirect quantification of other nodes by approximation If the number of parent factors for a node is large enough to make the Conditional Probability Table prohibitive to calculate directly, or if for some other reason the missing data in the CPTs is too pervasive to enable estimation by simulation, then approximate methods may be used to complete the CPT. One such method is to ignore interactions and estimate the CPT using marginal probabilities only. In this case it is only necessary to know the ordering of the marginal probabilities and to obtain a set of weights representing the relative importance of each of the parent nodes. These weights can sometimes be calculated from the data, but can also be specified using other information such as ancillary reports, published literature or expert judgement.
Pursuing the running example, suppose that the survey data did not provide direct information about Z, but assume that it was known that the largest customer satisfaction scores are expected for the ‘old’ \((X=1)\) and ‘female’ \((Y=1)\) group, and lowest responses for ‘young’ \((X=0)\) and ‘male’ \((Y=0)\) group. Further, suppose that age was determined (by previous experience, say) to be twice as important as gender. Then the weights can be specified as \((w_X=2,w_Y=1)\) and the CPT for Z could be approximated by
where \(S_Z\), \(S_X\) and \(S_Y\) represent the states of Z, X and Y, respectively; \(I(X=S_X)\) equals 1 if X is in the ‘old’ state and 0 otherwise; and \(I(Y=S_Y)\) equal 1 if Y is in the ‘female’ state and 0 otherwise. This equation can be modified in an obvious manner for more than two parent nodes and for nodes with more than two states.
In practice it might be useful to add two terms, \(\delta _0\) and \(\delta _1\) to the above equation to allow respectively a nonzero probability for the worstcase scenario for Z (when all parent nodes are in their worst state) and a probability less than 1 for the bestcase scenario for Z (when all parent nodes are in their best state). The values of \(\delta _0\) and \(\delta _1\) could again be elicited from experts, the literature or other studies where experts may need to validate the values.
Stage V: interrogate the Bayesian Network model
Once the Bayesian Network has been constructed and quantified, it can be interrogated for the purposes of obtaining results of interest and also for validation; see Stage VI below. We describe here four examples of interrogation, recognising that other results and inspections may be of interest to the business analyst and manager.
Stage V(a): current status analysis Initial states of the model are recorded as a base line. This information is used for understanding the impacts of any change in the model by comparing it with the new state.
Stage V(b): sensitivity analysis The responsiveness of each node (variable) in the model is investigated through sensitivity analysis. This analysis is conducted by varying node probabilities in a systematic manner to reveal the trigger points.
Stage V(c): influence analysis The magnitudes of impacts of parent nodes on their respective child nodes are identified through an influence analysis. The analysis results can provide a better understanding of the most significant factors in a decision scenario.
Stage V(d): scenario analysis The scenario analysis or “whatif” analysis is conducted by setting evidence for nodes. It helps in assessing the impacts of past changes that occurred in the model. Future plans can also be developed by assessing the impacts of simulated possible changes.
Stage VI: validate the Bayesian Network model
Having understood the model’s dynamics, the next step is to independently validate its correctness within the application domain using standard validation framework (Pitchforth and Mengersen 2013).
Stage VI(a): validate objective The model’s objective is often based on business objectives defined by domain experts. In some cases the objective may be identified from empirical data but this still requires validation by experts. Importantly, the BN’s outcomes should be reviewed in light of the objective, to ensure that the model is representing the objective adequately and providing outputs that are informative about the objective.
Stage VI(b): validate structure The model’s structure can be examined by domain experts to confirm that the model accurately represents the system of interest. In some cases the network and its subnetworks can be validated against other data or literature.
Stage VI(c): validate quantification The Conditional Probability Tables in the Bayesian Network model can be validated with the help of domain experts and compared with other summarised information or reports if available. The internal consistency of the CPTs can also be evaluated, for example by deleting some nodes and assessing the validity of the collapsed CPTs. If sufficient information is available, the reliability of the CPTs can be assessed using replicated subsamples of the data.
Stage VI(d): validate results The results obtained in the interrogations performed in Stage V can also be validated by domain experts as well as by comparison with other analytical tools if available. As above, if sufficient data are available, cross validation can be performed by utilising subsamples of the data as training and test datasets.
Stage VII: communicate Bayesian Network results
An important step of the Bayesian Network model development approach is to create the modes by which the BN results will be communicated to business managers and other stakeholders. Three such approaches are suggested here. First, the model itself can be used as an interactive software tool. There are now many software packages for developing a BN in this manner, including Genie (2015), BayesiaLab (2015), Netica (2015) and Hugin (2015). Second, general information templates can be constructed and then tailored for specific situations. These templates may be in the form of a report containing the BN model, the CPTs, the analytic results, and the validations. Alternatively it may represent a more concise summary of the outcomes; one option is to create a form of ‘management dashboard’ in which the performances of the key nodes can be depicted numerically and/or visually. For example, a traffic light colour coding system could be used, with red, orange and green indicating respectively poor, moderate and good performance. The general design of such a dashboard is shown in Fig. 2. Third, the BN model can be integrated into existing management tools already used by the business.
Summary of the approach
Our Bayesian Network model development approach is an iterative process, as depicted in Fig. 3, which can be applied for developing models from survey data. Some surveys reveal the key objective of the survey but others keep their purpose hidden. The BN objective as per Stage I can be defined based on the objective shown on the survey but a domain expert should be consulted to find out if there are any other hidden decision objectives.
Key factors and subsequent factors and their relationships as per Stage II can be identified from the survey questionnaires and summarised reports. Often the survey questionnaires are well structured and questions are grouped into categories. A BN model can be structured based on such well structured questionnaires where key factors represent major question groups and lower level factors may represent multiple inner layers in each question group. A well structured questionnaire can also provide the BN modeller clear guidance for identifying relationships between factors as described in Stage IV. In particular the relationships between key factors and their sub factors are clearly evident in grouped questions. All the relationships between factors may not be identifiable from the survey structure alone, hence active involvement of domain experts is strongly suggested.
The survey structure often may help to define the initial BN structure as described in Stage III, but the decision perspective of the BN model needs to be clarified through expert involvement. In the case of surveybased BN modelling, the data quantification described in Stage IV is often based on the survey data. Relative weights for various factors can often be identified from survey data or summarised data. Unknown data may be quantified by simulation and approximation.
The BN model is then interrogated as described in Stage V. The interrogation is done in a systematic manner rather than arbitrarily. The survey structure and expert advise often helps to plan the interrogation planning to test the business requirements. The validation process as described in Stage VI is done based on the survey structure, data and expert consultation. This stage is executed throughout the whole BN development approach. Stage VII encompasses the communications during the complete development process.
Case study
As a concrete illustration of the new approach, here we describe a largescale case study in which a Bayesian Network model was developed from survey data collected by a major public transport utility.
Study objective
Queensland Rail is a large railway transport organisation in Australia which considers Customer Satisfaction as one of its most significant business Key Performance Indicators. In order to gauge current customer satisfaction levels, Queensland Rail utilizes traditional questionnaire based survey methods. Although the surveys provide detailed information, they are rarely translated into a model and hence the inferential capability of the survey data is limited. For example, it is difficult to understand how individual factors affect overall customer satisfaction, make predictions about satisfaction levels, or analyse various cause and effect scenarios in order to plan customer service management. Bayesian Network models can successfully meet these requirements.
Following Stage I of the development approach described above, the principal objective of this model was identified as the level of Customer Satisfaction among Queensland Rail’s customers, which translated into a top level node Customer Satisfaction in the Bayesian Network.
The data made available by Queensland Rail for the purpose of developing the model consisted of a questionnaire. Customers were asked about their experiences regarding various attributes (factors) of customer service. These factors are currently divided into two levels (Table 1). The survey was conducted with a large number of passengers \((1000{+})\) travelling at peak and off peak times. The off peak time is defined as 9:00 am to 3:30 pm and after 7:00 pm until 2:00 am the following day on weekdays and all day on weekends and gazetted public holidays. All other times are considered as peak time. For each service factor, the customer responses were categorised as Positive, Neutral or Negative experiences.
Model development
As per Stage II(a), two key factors influencing the customer satisfaction level were identified, Journey Components and Holistic Components. Journey Components refers to the factors which have direct impact on the travel experience. Holistic Components are indirect factors about the organisation and environment that have significant impacts on customer satisfaction.
Following Stage II(b), the remaining factors relevant to the model were identified using Table 1 and a series of interviews of experts in Queensland Rail. The Journey Component was deemed to be influenced by four factors, (a) Carriage, (b) Station Facility, (c) Operation Information and (d) Other. The Holistic Component has two influencing factors, (a) Service Factors and (b) Passenger Factors. A similar iterative process was applied to identify remaining lower level factors.
As per Stage II(c), each node was designed with three probability states, Positive, Neutral and Negative. The definitions of these states differed for each factor and were determined in collaboration with Queensland Rail experts and in light of the available data.
Stage II(d) was then applied to establish the relationships between the factors in the model. The complete model with all the factors and their relationships is shown in Fig. 4.
Formalisation of model structure
We developed the model structure based on the decision perspectives. Examining the model shown in Fig. 4, node Station Facilities has two influencing factors, CBD Facilities and ‘Suburban Facilities, each of which has three parent nodes, Station, Platform and Access to Station. The current structure is appropriate if the manager (decision maker) is interested in the status of current facilities in the city and suburban locations. However, if the manager requires knowledge about the current status of platforms, the station or access to the station, then this model will need to be restructured. A possible alternative model structure for Station Facilities in this style is shown in Fig. 5.
Software tool (Genie 2015) was used to develop the Bayesian Network models in this study. The nodes were first drawn based on the objective and factors identified in Stages I, II(a) and II(b). Next the relationship arrows were drawn as per the relationships identified in Stage II(d). Once the relationships are established, Genie automatically creates blank Conditional Probability Tables, which were then populated based on the information from Stage III(c).
Model quantification
Following Stage IV of the model development approach, we constructed the Conditional Probability Tables for each node and estimated the probabilities. For the nodes without parents (terminal nodes), the probability tables were quantified using the proportion of customers with Positive, Neutral and Negative satisfaction responses in the survey data; see Table 2. Weights for these nodes were determined as 0.52, 0.16 and 0.32; these were provided by Queensland Rail experts based on past analytical knowledge. The weights were used to approximate the Conditional Probability Table for the node CBD Facilities, as shown in Table 3. The CPTs for each of the nodes in the BN were calculated in a similar manner. The fully quantified model is shown in Fig. 6.
Model interrogation
We then tested the complete model in Fig. 6 by interrogating it.
Current status assessment Figure 6 shows the initial probability settings for all the nodes in the model. After propagating the probabilities through the BN, the top most node Customer Satisfaction has its positive/neutral/negative probabilities equal to 0.31/0.56/0.14, and the two key nodes Journey Component and Holistic Component have respective positive/neutral/negative probabilities 0.32/0.52/0.16 and 0.30/0.60/0.10. In the model, the “neutral” state generally has higher probability in all nodes. According to experts in Queensland Rail, a “neutral” response generally means the service is at an acceptable level and reducing the probability of a “negative” score is their key focus.
Sensitivity analysis We conducted a sensitivity analysis comprised of repeated modifications of all of the CPTs in order to determine the relative influences of the changes on the key nodes in the BN. The analyses showed greatest sensitivity for the nodes Service Factors and Suburban Facilities.
Influence analysis An influence analysis for this model revealed the following:

1.
The Journey Components node has a stronger influence on the Customer Satisfaction node than the Holistic Components node.

2.
The Station Facilities node has the strongest influence on the Journey Components node.

3.
The Ticketing, Bus Connection and Affordability nodes have the strongest influences on their upper level nodes Operation Information, Other and Passenger Factors respectively.
Figure 9 in “Appendix” shows the results of influence analysis (thicker lines indicates stronger influence) on the Queensland Rail Bayesian Network.
Scenario analyses We conducted a series of analyses by changing values of the CPTs according to scenarios defined by Queensland Rail managers and observing the consequent changes in probabilities in the model. Five such scenarios are presented below. Figure 7 shows the initial values for the three top level nodes; see Fig. 6 for the complete set of values for all the nodes.
Scenario 1: increased train frequency Assume that Queensland Rail has added a number of new trains to increase their frequency of operation and that with a new survey, there would be a significant increase in Positive responses for Frequency. We introduced the evidence for the node Frequency into the initial model as positive/neutral/negative equals 0.74/0.23/0.03. The observed changes in the top nodes Customer Satisfaction, Journey Components and Holistic Components as shown in Fig. 8a are 0.33/0.54/0.13, 0.32/0.52/0.16 and 0.35/0.56/0.09, respectively. The result shows that the Journey Component node was unaffected as the Frequency node does not have any influence on it. There were positive impacts on Holistic Components and Customer Service. Complete results for the model are provided in Fig. 10 in “Appendix”.
Scenario 2: fare increases Consider that due to a recent increase in travel costs affordability has decreased, leading to a substantial increase in Negative responses for this component of the survey. The evidence for Affordability was set as positive/neutral/negative equal 0.07/0.13/0.8. The observed changes in the top nodes Customer Satisfaction, Journey Components and Holistic Components as shown in Fig. 8b are 0.3/0.53/0.17, 0.32/0.52/0.16 and 0.27/0.52/0.21, respectively. The evidence shown significant Negative impacts on the Holistic Components node and moderate negative impacts on Customer Satisfaction. The result is complimentary to the Influence Analysis results (see Fig. 9 in “Appendix”) which indicated affordability as a high impacting factor. Impacts of the change for the complete model are shown in Fig. 11.
Scenario 3: infrastructure improvements Assume that some significant improvements were made to the suburban stations which has boosted customer satisfaction; the probability for the Station Sub node is therefore set at positive/neutral/negative equals 0.8/0.16/0.04. The observed changes in the top nodes Customer Satisfaction, Journey Components and Holistic Components as shown in Fig. 8c are 0.33/0.54/0.13, 0.34/0.51/0.15 and 0.3/0.6/0.1, respectively. The changes to Station Sub have produced small positive increases on Customer Satisfaction and Journey Components. This result highlights the fact that node Station Sub has less influence on higher level factors, as can be verified from the Influence Analysis results shown in Fig. 9 in “Appendix”. The results for the complete model are shown in Fig. 12.
Scenario 4: improved infrastructure but decreased safety In this scenario we assume that several different factors have changed. Assume that passenger safety has deteriorated but that there are new upgrades to the CBD stations and the new ticketing system has improved customer service. The evidence is thus set for Station CBD as positive/neutral/negative equals 0.86/0.14/0.0, Ticketing as 0.71/0.28/0.01 and Safety as 0.09/0.73/0.18. The observed changes in the top nodes Customer Satisfaction, Journey Components and Holistic Components as shown in Fig. 8d are 0.36/0.51/0.13, 0.4/0.45/0.15 and 0.28/0.62/0.1, respectively. The result shows a significant positive increase on Journey Components and Customer Satisfaction but a slight decrease in Holistic Components. The results for the complete model are shown in Fig. 13 in “Appendix”.
Scenario 5: improving customer satisfaction Rather than changing lowlevel factors and seeing what the effect is, in this scenario we choose a target for a highlevel factor and determine how to achieve it. Assume that the manager aims to increase positive responses for the business objective (top level node) Customer Satisfaction and would like to know what should be improved in order to achieve this aim. The target evidence for Customer Satisfaction is set to positive/neutral/negative equals 1/0/0 and the model is interrogated in an inverse manner. As shown in Fig. 8(e), the probabilities that would be required for the Journey Components and Holistic Components factors are 0.45/0.48/0.08 and 0.36/0.57/0.07, respectively. The results indicate that in order to achieve significant improvements with Customer Service, the Journey Components factor requires major improvements. This complements the results of he Influence Analysis where Journey Components was found to be more influential than Holistic Components. Results for the complete model can be viewed in Fig. 14 in “Appendix”.
Validation
In order to validate the model’s structure, we conducted a series of interviews with Queensland Rail experts in the Customer Service department. Confirmation was obtained that the factors (nodes) and their relationships represent the functional structure in the Queensland Rail Customer Service department. The Bayesian Network’s structure closely matched the structure of Queensland Rail’s survey questionnaire. Experience with the questionnaire by Queensland Rail experts further validates the appropriateness of the developed BN model.
The conditional probability tables represent the quantified model. CPTs were developed based on actual survey data currently used at Queensland Rail for performance evaluations. Expert confirmation was also obtained regarding the conformity of the model data with current practices at Queensland Rail.
The results of the BN analyses were validated using two different approaches: expert confirmation, and comparison with an existing tool. Experts with many years of experience in Queensland Rail’s Customer Service department confirmed that the Sensitivity Analysis and Influence Analysis results were as they expected. The Scenario Analysis results were also confirmed to be within the expert’s expected ranges. A spreadsheetbased tool is currently being used to calculate overall Customer Satisfaction at Queensland Rail. The results obtained from our new BN model were similar to that of the current tool but provided more detailed information. In general, our model was found to meet the customer satisfaction analysis requirements at Queensland Rail.
Information communication
We communicated the results of this study to Queensland Rail as a software tool developed using (Genie 2015). The tool showed interactive features of the model and gave Queensland Rail managers an opportunity to use the models in practice. A detailed report was also provided explaining the model’s development process along with various analysis results. Throughout the various model development stages information was freely exchanged between Queensland Rail stakeholders and the researchers.
Conclusion
The innovation of this paper is the formalisation and presentation of a general approach for developing Bayesian Network models based on survey data and expert opinion. The approach will provide practical guidance to both researchers and industry practitioners in developing BN complex system models. The resulting models allow the factors influencing enterpriselevel Key Performance Indicators to be understood, analysed and compared on a uniform measurement scale. The public transport case study described herein demonstrates that the approach can be applied to largescale business operations.
The survey data based models produced by our approach can be used by managers as a powerful decision support aid. For example, the high impact factors in a Bayesian Network can be identified using an influence analysis; sensitivity and influence analyses can help in understanding the responsiveness of BN nodes; and scenario analyses can provide the capacity for deeper understanding of potential responses of the system of interest to changes in the business environment.
For the Queensland Rail case study, through scenarios 1, 2 and 3 we demonstrated the effects on the generated Bayesian model when a single factor is changed. The results showed that any probability change in a particular node will affect all the subsequent child nodes but other parallel nodes and their child nodes will remain unaffected. In scenario 4 we showed the model’s capabilities in analysing changes in multiple nodes. This analytical capacity makes BN models a powerful tool for decision support. A manager can use such a tool in two ways:

(a)
As a planning tool, whereby the manager can create hypothetical scenarios and simulate the outcomes before finalising an action plan. This will provide the manager with quantitative and visual comparisons between decision options.

(b)
As a performance management tool, whereby the manager evaluates the changes in the overall performance based on completed actions. This provides the manager with the capacity to compare planned and achieved goals.
Through scenario 5 we then demonstrated the versatile usage of BN models in decision support. With the inverse interrogation capacity, the manager has the opportunity to set the business goals to be achieved, based on business constraints such as time, cost, etc. The model can then be investigated to identify the most fruitful areas for improvement in order to achieve the predefined business goals.
References
Anderson RD, Mackoy RD, Thompson VB, Harrell G. A Bayesian network estimation of the serviceprofit chain for transport service satisfaction. Decis Sci. 2004;35(4):665–89.
Bayesia: BayesiaLab software home page: http://www.bayesia.com/en/index.php. 2015.
Cai Z, Sun S, Si S, Yannou B. Identifying product failure rate based on a conditional bayesian network classifier. Exp Syst Appl. 2011;38(5):5036–43. doi:10.1016/j.eswa.2010.09.146.
Chatterjee S, Hadi AS. Computational considerations. New York: Wiley; 2008. p. 281–296. http://dx.doi.org/10.1002/9780470316764.ch9.
Cho S. A linear bayesian stochastic approximation to update project duration estimates. Eur J Oper Res. 2009;196(2):585–93.
Cochran WG. Sampling techniques. 3rd ed. New York: Wiley; 1977.
Daly R, Shen Q, Aitken S. Learning Bayesian networks: approaches and issues. Knowl Eng Rev. 2011;26(02):99–157.
Death G, Fabricius KE. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology. 2000;81(11):3178–92.
Decision Systems Laboratory: Genie software home page. http://genie.sis.pitt.edu/. 2015.
de Melo ACV, Sanchez AJ. Software maintenance project delays prediction using bayesian networks. Exp Syst Appl. 2008;34(2):908–19. doi:10.1016/j.eswa.2006.10.040.
Denham R, Falk M, Mengersen K. The Bayesian conditional independence model for measurement error: applications in ecology. Environ Ecol Stat. 2011;18:239–55.
de Oa J, de Oa R, Eboli L, Mazzulla G. Perceived service quality in bus transit service: a structural equation approach. Transp Policy. 2013;29:219–26.
Dogan I. Analysis of facility location model using bayesian networks. Exp Syst Appl. 2012;39(1):1092–104. doi:10.1016/j.eswa.2011.07.109.
Donald M, Cook A, Mengersen K. Bayesian networks for risk of diarrhea associated with the use of recycled water. Risk Anal. 2009;29(12):1672–85.
Eboli L, Mazzulla G. A methodology for evaluating transit service quality based on subjective and objective measures from the passenger’s point of view. Transp Policy. 2011;18(1):172–81.
Falzon L. Using bayesian network analysis to support centre of gravity analysis in military planning. Eur J Oper Res. 2006;170(2):629–43.
Ferreira L, Borenstein D. A fuzzybayesian model for supplier selection. Exp Syst Appl. 2012;39(9):7834–44. doi:10.1016/j.eswa.2012.01.068.
Ferreiro S, Arnaiz A, Sierra B, Irigoien I. Application of bayesian networks in prognostics for a new integrated vehicle health management concept. Exp Syst Appl. 2012;39(7):6402–18. doi:10.1016/j.eswa.2011.12.027.
Fornell C, Johnson MD, Anderson EW, Cha J, Bryant BE. The American customer satisfaction index: nature, purpose, and findings. J Mark. 1996;60(4):7–18.
Gasparini M, Pellerey F, Proietti M. Bayesian hierarchical models to analyze customer satisfaction data for quality improvement: a case study. Appl Stoch Models Bus Ind. 2011;28(6):571–84.
Hackl P, Westlund AH. On structural equation modelling for customer satisfaction measurement. Total Qual Manag. 2000;11(4–6):820–5.
Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn. 1995;20:197–243.
Hsieh JCF, Cramb SM, McGree JM, Baade PD, Dunn NA, Mengersen KL. Bayesian spatial analysis for the evaluation of breast cancer detection methods. Aust N Z J Stat. 2013;55(4):351–67.
Hsu CI, Shih ML, Huang BW, Lin BY, Lin CN. Predicting tourism loyalty using an integrated bayesian network mechanism. Exp Syst Appl. 2009;36(9):11760–3. doi:10.1016/j.eswa.2009.04.010.
Hugin Expert A/S: Hugin software home page. http://www.hugin.com/. 2015.
Janssens D, Wets G, Brijs T, Vanhoof K, Arentze T, Timmermans H. Integrating bayesian networks and decision trees in a sequential rulebased transportation model. Eur J Oper Res. 2006;175(1):16–34.
Jensen FV, Nielsen TD. Bayesian networks and decision graphs. 2nd ed. Berlin: Springer; 2007.
Johansson F, Falkman G. A bayesian network approach to threat evaluation with application to an air defense scenario. In: 2008 11th International conference on information fusion, 2008; p. 1–7.
Johnson S, Fielding F, Hamilton G, Mengersen K. An integrated Bayesian network approach to lyngbya majuscula bloom initiation. Mar Environ Res. 2010;69(1):27–37.
Johnson MD, Gustafsson A, Andreassen TW, Lervik L, Cha J. The evolution and future of national customer satisfaction index models. J Econ Psychol. 2001;22(2):217–45.
Johnson S, Mengersen K, de Waal A, Marnewick K, Cilliers D, Houser AM, Boast L. Modelling cheetah relocation success in southern Africa using an iterative Bayesian network development cycle. Ecol Model. 2010;221(4):641–51.
Kenett RS, Salini S. Modern analysis of customer satisfaction surveys: comparison of models and integrated analysis. Appl Stoch Models Bus Ind. 2011;27(5):465–75.
Kisioglu P, Topcu YI. Applying bayesian belief network approach to customer churn analysis: a case study on the telecom industry of turkey. Exp Syst Appl. 2011;38(6):7151–7. doi:10.1016/j.eswa.2010.12.045.
Koller D, Pfeffer A. Objectoriented Bayesian networks. In: Proceedings of the thirteenth conference on uncertainty in artificial intelligence. UAI’97, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA; 1997. p. 302–13. http://dl.acm.org/citation.cfm?id=2074226.2074262.
Kristensen K, Martensen A, Gronholdt L. Customer satisfaction measurement at Post Denmark: results of application of the European Customer Satisfaction Index methodology. Total Qual Manag. 2000;11(7):1007–15.
Lariviere MA, Porteus EL. Stalking information: Bayesian inventory management with unobserved lost sales. Manag Sci. 1999;45(3):346–63.
Lee E, Park Y, Shin JG. Large engineering project risk management using a bayesian belief network. Exp Syst Appl. 2009; 36(3, Part 2), 5880–7. doi:10.1016/j.eswa.2008.07.057
McCollKennedy J, Schneider U. Measuring customer satisfaction: why, what and how. Total Qual Manag. 2000;11(7):883–96.
McGree JM, Drovandi CC, Thompson MH, Eccleston JA, Duffull SB, Mengersen K, Pettitt AN, Goggin T. Adaptive Bayesian compound designs for dose finding studies. J Stat Plan Inference. 2012;142(6):1480–92.
Medina LA, Jankovic M, Kremer GEO, Yannou B. An investigation of critical factors in medical device development through bayesian networks. Exp Syst Appl. 2013;40(17):7034–45. doi:10.1016/j.eswa.2013.06.014.
Murphy K, Mian S. Modelling gene expression data using dynamic Bayesian networks. Technical report, University of California, Berkeley. 1999.
Nadkarni S, Shenoy PP. A Bayesian network approach to making inferences in causal maps. Eur J Oper Res. 2001;128(3):479–98.
Nadkarni S, Shenoy PP. A causal mapping approach to constructing Bayesian networks. Dec Support Syst. 2004;38(2):259–81.
Norsys Software Corp: Netica software home page. http://www.norsys.com/netica.html. 2015.
Onsel Sahin S, Ulengin F, Ulengin B. A bayesian causal map for inflation analysis: the case of turkey. Eur J Oper Res. 2006;175(2):1268–84.
Ozbay K, Noyan N. Estimation of incident clearance times using Bayesian networks approach. Accid Anal Prev. 2006;38(3):542–55.
Pearl J. Probabilistic reasoning in intelligent systems. San Francisco: Morgan Kaufmann Publishers Inc; 1988.
Perucca G, Salini S. Travellers’ satisfaction with railway transport: a bayesian network approach. Qual Technol Quant Manag. 2014;11(1):71–84.
Pitchforth J, Mengersen K. A proposed validation framework for expert elicited bayesian networks. Exp Syst Appl. 2013;40(1):162–7. doi:10.1016/j.eswa.2012.07.026.
Radice R. A bayesian approach to modelling reticulation events with application to the ribosomal protein gene rps11 of flowering plants. Aust N Z J Stat. 2012;54(4):401–26.
Robertson D, Wang QJ. W.Q.J.: Bayesian networks for decision analyses—an application to irrigation system selection. Aust J Exp Agric. 2004;44(2):145–50.
Salini S, Kenett RS. Bayesian networks of customer satisfaction survey data. J Appl Stat. 2009;36(11):1177–89.
Scheaffer RL, Mendenhall WI, Ott RL, Gerow K. Elementary survey sampling. 7th ed. Boston: Cengage Learning; 2011.
Siskos Y, Grigoroudis E, Zopounidis C, Saurais O. Measuring customer satisfaction using a collective preference disaggregation model. J Glob Optim. 1998;12:175–95.
Sun L, Shenoy PP. Using bayesian networks for bankruptcy prediction: some methodological issues. Eur J Oper Res. 2007;180(2):738–53.
Ticehurst JL, Newham LTH, Rissik D, Letcher RA, Jakeman AJ. A Bayesian network approach for assessing the sustainability of coastal lakes in new south wales, australia. Environ Model Softw. 2007;22(8):1129–39.
Ting SC, Chen CN. The asymmetrical and nonlinear effects of store quality attributes on customer satisfaction. Total Qual Manag. 2002;13(4):547–69.
Trucco P, Cagno E, Ruggeri F, Grande O. A Bayesian belief network modelling of organisational factors in risk analysis: a case study in maritime transportation. Reliab Eng Syst Saf. 2008;93(6):845–56.
Turkyilmaz A, Oztekin A, Zaim S, Demirel OF. Universal structure modeling approach to customer satisfaction index. Ind Manag Data Syst. 2013;113(7):932–49.
Uusitalo L. Advantages and challenges of Bayesian networks in environmental modelling. Ecol Model. 2007;203(3–4):312–8.
Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics. 2004;20(18):3594–603.
Authors' contributions
All the authors contributed equally. All authors read and approved the final manuscript.
Acknowledgements
This paper was developed within the CRC for Infrastructure and Engineering Asset Management, established and supported under the Australian Government’s Cooperative Research Centres Programme. The authors gratefully acknowledge the financial support provided by the CRC.
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Decision making
 Bayesian modelling
 Management tools
 Customer satisfaction
 Transportation