Supervised classification with interdependent variables to support targeted energy efficiency measures in the residential sector
 Mariya Sodenkamp^{1}Email author,
 Ilya Kozlovskiy^{1} and
 Thorsten Staake^{1, 2}
DOI: 10.1186/s4016501500182
© Sodenkamp et al. 2016
Received: 7 November 2015
Accepted: 21 November 2015
Published: 27 January 2016
Abstract
This paper presents a supervised classification model, where the indicators of correlation between dependent and independent variables within each class are utilized for a transformation of the largescale input data to a lower dimension without loss of recognition relevant information. In the case study, we use the consumption data recorded by smart electricity meters of 4200 Irish dwellings along with halfhourly outdoor temperature to derive 12 household properties (such as type of heating, floor area, age of house, number of inhabitants, etc.). Survey data containing characteristics of 3500 households enables algorithm training. The results show that the presented model outperforms ordinary classifiers with regard to the accuracy and temporal characteristics. The model allows incorporating any kind of data affecting energy consumption time series, or in a more general case, the data affecting classdependent variable, while minimizing the risk of the curse of dimensionality. The gained information on household characteristics renders targeted energyefficiency measures of utility companies and public bodies possible.
Keywords
Energy consumption Household characteristics Energy efficiency Consumer behaviour Pattern recognition Multivariate analysis Interdependent variablesBackground
Reducing energy consumption is the best sustainable longterm answer to the challenges associated with increasing demand for energy, fluctuating oil prices, uncertain energy supplies, and fears of global warming (European commission 2008; Wenig et al. 2015). Since the household sector represents around 30 % of the final global energy consumption (International Energy Agency 2014), customized energy efficiency measures can contribute significantly to the reduction of air pollution, carbon emissions, and economic growth. There is a wide range of potential courses of action that can encourage energy efficiency in dwellings, including flexible pricing schemes, load shifting, and direct feedback mechanisms (Sodenkamp et al. 2015). The major challenge is however to decide upon the appropriate energy efficiency measures in the circumstances when household profiles are unknown.
In recent years, several attempts have been made toward mechanisms of recognition of energyconsumptionrelated dwelling characteristics. Particularly, unsupervised learning techniques group the households with similar consumption patterns in clusters (Figueiredo et al. 2005; Sánchez et al. 2009). Indeed, interpretation of each cluster by an expert is required. On the other hand, the existing supervised classification of private energy users relies upon the analysis of consumption curves and survey data (Beckel et al. 2013; Hopf et al. 2014; Sodenkamp et al. 2014). Hereby, the average prediction failure rate with the best classifier (support vector machine) exceeds 35 %. The reasons for this low performance seem to stem from the fact that energy consumption is generally assessed in relation to a number of other relevant variables, such as economic, social, demographic and climatic indices, energy price, household characteristics, residents’ lifestyle, as well as cognitive variables, such as values, needs and attitudes (Beckel et al. 2013; Santos et al. 2014; Elias and Hatziargyriou 2009; Santin 2011; Xiong et al. 2014). Practically, one can include all predictionrelevant data, independently on its intrinsic relationships, into a classification task in the form of features. This leads, however, to a high spatiotemporal complexity, and incurs a significant risk of curse of dimensionality.
In predictive analytics, the classification (or discrimination) problem refers to the assignment of observations (objects, alternatives) into predefined unordered homogeneous classes (Zopounidis and Doumpos 2000; Carrizosa and Morales 2013). Supervised classification implies that the function of mapping objects described by the data into categories is constructed based on so called training instances—data with respective class labels or rules. This is realized in a twostep process of, first, building a prediction model from either known class labels or using a set of rules, and then automatically classifying new data based on this model.
In practice, the problem of finding functions with good prediction accuracy and low spatiotemporal complexity is challenging. Performance of all classifiers depends to a large extent on the volume of the input variables and on interdependencies and redundancies within the data (Joachims 1998; Kotsiantis 2007). At this point, dimensionality reduction is critical to minimizing the classification error (Hanchuan et al. 2005).
Feature selection is the first group of dimensionality reduction methods that identify the most characterizing attributes of the observed data (Hanchuan et al. 2005). More general methods that create new features based on transformations or combinations of the original feature set are termed feature extraction algorithms (Jain and Zongker 1997). Indeed, by definition all dimensionality reduction methods result in some loss of information since the data is removed from the dataset. Hence it is of great importance to reduce the data in a way that preserves the important structures within the original data set (Johansson and Johansson 2009).
In environmental and energy studies, the need to analyze large amounts of multivariate data raises the fundamental problem: how to discover compact representations of interacting and highdimensional systems (Roweis and Saul 2000).
The existing prediction methods that treat multidimensional data include multivariate classification based on statistical models (e.g., linear and quadratic discriminant analysis) (Fisher 1936; Smith 1946), preference disaggregation (such as UTADIS and PREFDIS) (Zopounidis and Doumpos 2000), criteria aggregation (e.g., ELECTRE Tri) (Yu 1992; Roy 1993; Mastrogiannis et al. 2009), model development (e.g., regression analysis and decision rules) (Greco et al. 1999; Srinivasan and Shocker 1979; Flitman 1997), among others. The common pitfall of these approaches is that they treat observations independently, and neglect important complexity issues. Correlationbased classification (Beidas and Weber 1995) considers influence between different subsets of measurements (e.g., instead of using both subsets correlation between them is computed and is then used instead), but use them as features only. The correlationbased classifier also does not consider the possibility of the correlation being dependent on class labels.
In this work, we propose a supervised machine learning method called DIDClass (Dependentindependent data classification) that tackles magnitudes of interaction (correlation) among multiple classificationrelevant variables. It establishes a classification machine (probabilistic regression) with embedded correlation indices and a single normalized dataset. We distinguish between dependent observations that are affected by the classes, and independent observations that are not affected by the classes but influence dependent variables. The motivation for such a concept is twofold: first, to enable simultaneous consideration of multiple factors that characterize an object of interest (i.e., energy consumption affected by economic and demographic indices, energy prices, climate, etc.); and second, to represent highdimensional systems in a compact way, while minimizing loss of valuable information.
Our study of household classification is based on the halfhourly readings of smart electricity meters from 4200 Irish households collected within an 18month period, survey data containing energyefficiency relevant dwelling information (type of heating, floor area, age of house, number of inhabitants, etc.), and weather figures in that region. The results indicate that the DIDClass recognizes all household characteristics with better accuracy and temporal performance than the existing classifiers.
Thus, DIDclass is an effective and highly scalable mechanism that renders broad analysis of electricity consumption and classification of residential units possible. This opens new opportunities for reasonable employment of targeted energy efficiency measures in private sector.
The remaining of this paper is organized as follows. “Supervised classification with classindependent data” describes the developed dimensionality reduction and classification method. “Application of DIDClass to household classification based on smart electricity meter data and weather conditions” presents application of the model for the data. The conclusions are in “Conclusion”.
Supervised classification with classindependent data
Problem definition and mathematical formulation
Supervised classification is a typical problem of predictive analytics. Given a training set \( \bar{N} = \left\{ {\left( {\bar{x}_{1} , y_{1} } \right), \ldots ,\left( {\bar{x}_{n} ,y_{n} } \right)} \right\} \) containing n ordered pairs of observations (measurements) \( \bar{x}_{i} \) with class labels \( y_{i} \in J \) and a test set \( \bar{M} = \left\{ {\bar{x}_{n + 1} , \ldots ,\bar{x}_{n + m} } \right\} \) of m unlabeled observations, the goal is to find class labels for the measurements in the test set \( \bar{M} \).
Let \( \bar{X} = \{ \bar{x}_{i} \} \) be a family of observations, and \( Y = \{ y_{i} \} \) be a family of associated true labels. Classification implies construction of a mapping function \( \bar{x}_{i} \mapsto f\left( {\bar{x}_{i} } \right) \) of the input, i.e. vector \( \bar{x}_{i} \in \bar{X} \), into the output, i.e. a class label \( y_{i} \).
A classification can be either done by a single assignment (only one class label \( y_{i} \) is assigned to one sample \( \bar{x}_{i} \)), or as a probability distribution over \( J \) classes. The latter algorithms are also called probabilistic classifiers.
We define a family \( S \) of measurements \( s_{i} \) as independent. Simply put, observations \( s_{i} \) are not influenced by class labels \( y_{i} \). The remaining observations are called dependent, and are defined as \( X = \left\{ {x_{i} } \right\} \) with \( x_{i} = \{ z \in \bar{x}_{i}  z \notin s_{i} \} \).
We implement the given notation of dependent and independent variables in our DIDClass prediction methodology.
The DIDclass model
Step 1: Estimation of interdependencies between the datasets
DIDclass is a method that makes use of the fact of relationships between input variables of the classification. Therefore, once the input datasets are available, it is necessary to test underlying hypotheses about associations between the variables. Throughout this note, the following two assumptions will be made.
Assumption 1
The independent variables \( S \) are statistically independent of the class labels \( Y \), as defined by (1).
In other words, classification based on \( s_{i} \) is random.
Assumption 2
Independent variables \( S \) affect the dependent variables \( X \) and this influence can be measured or approximated.
Correlation coefficients can be found by solving a regression model of \( X \) expressed through regressors \( S \) and class labels \( Y \).
 1.
Model fit Depending on the choice of forecasting model and problem specifics, any appropriate quality measure [e.g., coefficient of determination \( R^{2} \) or Akaike information criterion (D’Agostino 1986)] can be used to estimate the discrepancy between the observed and expected values.
 2.
Generalizing performance The model should be able to classify observations of unknown group membership (class labels) into a known population (training instances) (Lamont and Connell 2008) without overfitting the training data.
 3.
Effect size If the strength of relationships between the input variables is small, then the independent data can be ignored and application of DIDclass is not necessary. The effect size is estimated in relation to the distances between classes using appropriate indices (e.g., Pearson’s \( r \) or Cohen’s \( f^{2} \)).
The functions \( f_{j} \) describe how the dependent variables can be calculated from the independent ones and class labels. These functions are utilized in later steps of the presented algorithm to normalize response measures and eliminate predictor variables. The unknown functions \( f_{j} \) are typically estimated with maximum likelihood approach in an iterative weighted leastsquares procedure, maximum quasilikelihood, or Bayesian techniques (Nelder and Baker 1972; Radhakrishna and Rao 1995). Thus, a unique \( f_{j} \) is set in a correspondence to each class \( j \in J \). A single relationship model can be built for the set of all classes, by adding dummy variables of these classes.
Correlation coefficients \( \alpha \) and \( \beta \) can be calculated using the ordinary least squares method. If the relationships are not linear, then a more complex regression model can be used. For instance, for the polynomial dependency, variables with powers \( S \) can be added on the right side of Eq. (4). In this case, networks with mutually dependent variables can be taken into account, as shown in Fig. 2.
Step 2: Integration of dependent and independent measurements
In order to take the correlation coefficients revealed on the previous step into consideration, and transform the multivariate input data to a lower dimension without loss of classificationrelevant information, we normalize the dependent variables with respect to the independent ones. Normalization means elimination of changes in the dependent measurements that occur due to the shifts in the independent values, and transformation of \( X \) into \( X^{{\prime }} \).
Since the relationships of \( X \) and \( S \) are different for each class, the normalization is also classdependent. Each measurement \( x_{i} \) in the training set \( N \) is normalized according to the corresponding class label \( y_{i} \).
Every \( x_{i}^{{\prime }} \) is the normalized representation of the dependent measurement \( x_{i} \). The term \( f_{{y_{i} }} \left( {s_{\text{i}} } \right)  f_{{y_{i} }} \left( {s_{1} } \right) \) describes the expected difference between \( s_{\text{i}} \) and \( s_{1} \), by the chosen regression model. Hereby, \( s_{1} \) is the default state and no normalization is needed for \( x_{i} \,{\text{with}}\, s_{i} = s_{1} \). As a result, there are \( a \) normalization functions for \( a \) class labels. Without loss of generality, any value can be chosen as the default value. However, a dataspecific value may allow for better interpretation of the results.
Step 3: Classification
Once all classificationrelevant input datasets are integrated into one normalized set, it can be used as an input for a probabilistic classifier (further referred to as \( C \)) that returns distribution probability over the set of class labels.
To enable prediction of the class for a new observation it is necessary to normalize its value. The challenge is however to choose the appropriate normalization function \( f_{j} \) from the \( a \) functions constructed on the step 1. But since the class labels for the test data are unknown, there is no a priori knowledge on which normalization function should be used. It is possible to apply any \( f_{j} \), but the classification is more likely to be successful if the correct \( {\text{f}}_{\text{j}} \) was chosen (i.e., the test data belongs to the class from which the normalization function was derived). Therefore, DIDclass tests all functions for a new observation and chooses the solution with the highest probability for each individual class. After that, the testsetmeasurement is transformed and classified \( {\text{a}} \) times. Finally, the averages of the resulting probabilities for each class are derived. The observation belongs to the class with the highest resulting probability.
The prediction process is formally described below.
Temporal complexity of classification depends on the dimension of included variables (Lim et al. 2000).
Classification training with DIDClass uses normalized set \( N^{{\prime }} \) with \( n \) variables of dimension \( d_{x} \). A classification without DIDClass would use the initial training set \( N \), with the dimension of \( d_{x} + d_{s} \), where \( d_{x} \) and \( d_{s} \) stand for the dimensions of variables in \( X \) and \( S \) respectively.
The prediction using DIDClass is done for \( a \times n \) variables with dimension \( d_{x} \). A classification without DIDClass would be done for only \( n \) variables, but with higher dimension, namely \( d_{x} + d_{s} . \) Hence, DIDClass can perform better or worse than a classifier that analyzes all data in its initial form (without correlation information), depending on the number of categories \( a \) and dimension of independent variables. Particularly, DIDclass is more efficient compared to the algorithms where training complexity is higher than the prediction complexity, which is valid for the majority of commonly used classifiers (e.g., support vector machines (SVM), Adaboost, Random Forest, Linear Discriminant Analysis, etc.).
The algorithm of DIDClass
Input: training data set \( \left\{ {\left( {x_{1} , s_{1} , y_{1} } \right), \ldots ,\left( {x_{n} ,s_{n} ,y_{n} } \right)} \right\} \) and test examples \( \left\{ {\left( {x_{n + 1} , s_{n + 1} } \right), \ldots , \left( {x_{n + m} ,s_{n + m} } \right)} \right\} \) 
Output: probability distributions \( \left\{ {P^{n + 1} , \ldots , P^{n + m} } \right\} \) over the classes 
1. Show that the correlation between independent and dependent measurements exists 
2. Show that the independent measurements are not affected by the class labels 
3. Compute the influence (\( f_{j} \)) of different independent measurements on distinct class labels, by solving the generalized linear model 
4. \( x_{i} = \mathop \sum \limits_{j \in J} d_{ij} f_{j} \left( {s_{i} } \right) + \varepsilon_{i} \) 
5. For each \( i \in \{ 1, \ldots , n\} \) compute the normalized measurements: 
6. \( x_{i}^{{\prime }} = x_{i}  f_{{y_{i} }} \left( {s_{i} } \right) + f_{{y_{i} }} \left( {s_{1} } \right) \) 
7. Create the probabilistic classifier \( C \) with the training set \( \left\{ {\left( {x_{1}^{{\prime }} ,y_{1} } \right), \ldots ,\left( {x_{n}^{{\prime }} ,y_{n} } \right)} \right\} \) 
8. For each \( i \in \{ n + 1, \ldots ,n + m\} \) 
9. For each \( j \in J \) compute the normalized measurements for unlabelled data as each possible class 
10. \( x_{i}^{{j^{{\prime }} }} = x_{i}  f_{j} \left( {s_{i} } \right) + f_{j} \left( {s_{1} } \right) \) 
11. Apply the classifier \( C \) to the normalized measurements 
12. \( \forall k \in J {\text{let}} p_{jk}^{i} : = \) probability of \( x_{i}^{j^\prime} \) belonging to class \( k \) 
13. Let \( P_{l}^{i} = \frac{{\varSigma_{k} p_{kl}^{i} }}{a} \) be the probability of \( x_{i} \) belonging to class \( l \) 
14. Return \( \left\{ {P^{n + 1} , \ldots , P^{n + m} } \right\} \) 
Verification of DIDclass
In this section, we show that the proposed methodology yields linearly separable categories of objects, under the specified conditions.
Theorem 1
Let \( N \) be the training set of a classification problem as described by Eq. (2).
Proof
This means that every normalized measurement for different classes is contained in a convex compact ball of radius \( 2\delta \) centered on the normalized measurements \( x_{{l_{j} }} \). The different balls are disjoint since the distance between the centers of the balls is greater then \( 4\delta \), and therefore the distance between balls is greater than 0. Hence there exists a hyperplane separating any two classes.\( \square \)
An analogous statement can be proven if the model is unknown, but the kind of dependency is known. The estimation of errors in this case is inherent to the model. Theorem 2 proves this statement for the case of linear dependency. Other regression models can be treated in a similar manner.
Theorem 2
Let N be the training set of a classification problem as described by Eq. (2).
Further suppose that the model (3) is unknown (i.e., the functions \( f_{j} \) have to be estimated based on the data), and let \( \delta \) be an upper bound on the error \( \varepsilon_{i} \) in (6).
Proof
This means that every normalized measurement for different classes is contained in a convex compact ball of radius \( 2\sqrt n \delta \) centered on the normalized measurements \( x_{{l_{j} }} \). The different balls are disjoint since the distance between the centers of the balls is greater then \( 4\sqrt n \delta \), and therefore the distance between balls is greater than 0. Hence there exists a hyperplane separating any two classes. \( \square \)
We have shown that if (a) Assumptions 1 and 2 are satisfied, (b) regression model describing relations of input variables is reasonable, and (c) different classes are far from each other, then the normalized training set yielded by DIDClass is a linearly separable set.
Application of DIDClass to household classification based on smart electricity meter data and weather conditions
In this section we present a classification of residential units based on smart electricity meter data and weather variables by DIDClass. Results indicate that DIDClass outperforms the best existing classifier with regard to the accuracy and temporal characteristics.
Data description
 (a)
The power consumption data of 30minutes granularity that originates from the Irish Commission for Energy Regulation (CER) (ISSDA. Data from the commission for energy regulation 2014). It was gathered during a smart metering trial over a 76week period in 2009–2010, and encompasses 4200 private dwellings. It is a dependent data set \( X \), according to the definition given in “Problem definition and mathematical formulation”.
 (b)The respective customer survey data containing energyefficiencyrelated attributes of households (such as type of heating, floor area, age of house, number of residents, employment, etc.). It is a data set of known object categories \( Y \), according to the definition given in “Problem definition and mathematical formulation”. For the classification problem we consider 12 different household properties, which are presented on the left hand side of Table 2. The classification is made for each property individually. The properties can take different values (“class labels”) that are presented on the right hand side of Table 2. For example, each household can be classified as either “electrical” or “not electrical” with respect to the property “type of cooking facility”. The continuous values are divided into discrete intervals [e.g., property “age of building” is expressed by two alternative class labels “old”(>30 years) and “new” (≤30 years)]. For three household properties “age of house”, “floor area” and “number of bedrooms” the discrete class labels were defined according to the training data (surveys). For the properties “number of residents” and “number of devices”, the classes were defined to have a roughly equal distribution of households (Beckel et al. 2013).Table 2
The properties and their class labels
Household property
Classes and their labels
Number of appliances and entertainment devices (N_devices)
Low (<8)
Medium (8–11)
High (>11)
Number of bedrooms (N_bedrooms)
Very low (1–2)
Low (3)
High (4)
Very high (>4)
Type of cooking facility (cooking)
Electrical
Not electrical
Employment of chief income earner (employment)
Employed
Not employed
Family (family)
Family
No family
Floor area (floor_area)
Small (<100 m^{2})
Medium (100–200 m^{2})
Big (>200 m^{2})
Children (children)
Children
No children
Age of building (age_house)
Old (>30 years)
New (<30 years)
Number of residents (N_residents)
Few (<3)
Many (> 2)
Single (single)
Single
Not single
Retirement status of chief income earner (retirement)
Retired
Not retired
Social class of chief income earner according to NRS social grades (social_class)
A or B
C1 or C2
D or E
 (c)
Multivariate weather data, including outdoor temperature, wind speed, and precipitation of 30min granularity in the investigated region provided by the US National Climatic Data Center (NCDC 2014). It is an independent multivariate data set \( S \), where \( S_{1} = outdoor\, temperature \), \( S_{2} = wind\, speed \), and \( S_{3} = precipitation. \)
In the current implementation, we assume that an observation refers to a 1week data trace (including weekend), because it represents a typical consumption cycle of inhabitants. One week of data at a 30min granularity implies that an input trace contains 336 data samples for each variable.
Since the CER data set does not contain any facts about household locations or about the geographical distribution of households, we calculated the average of independent variables over all 25 weather stations in Ireland
Prediction results
In the present study, we split the input data into training and test cases in the proportion 80–20 %. The training instances are used to estimate the interdependencies between electricity consumption and outdoor temperature and then train the classifier. The test instances are then used to evaluate accuracy of the classification results.
Step 1: Influence estimation
First, we check if Assumptions 1 and 2 hold for the given variables.
Assumption 1
The condition of independency of weather (\( S \)) from household properties (\( Y \)) is trivially satisfied, since the weather is equal for all dwellings.
Assumption 2
A regression model must be found to approximate the influence of weather (\( S \)) on energy consumption (\( X \)).
Coefficients in the linear regression model (7)
Coefficient  Estimate  Standard error  Significance index* 

α  0.5477  0.00298  *** 
β  −0.00486  0.000273  *** 
Model (7) has a slight bias for the positive correlation, which can be explained by the fact that values of both variables are lower during the night. This is accounted for by computing the model for each time stamp separately (i.e., “00:00”, “00:30”,…, “23:30”).
Coefficients in model (7) for different times of the day
Time  R^{2}  β  Sig. Ind. 

05:00  0.2142  −0.0016  *** 
09:00  0.3691  −0.0069  *** 
19:00  0.7135  −0.0391  *** 
Summary of model (10)
Estimate  Standard error  Sig. Ind.  

α  0.5013  0.00405  *** 
β_{1}  −0.00589  0.00028  *** 
β_{2}  0.00527  0.00031  *** 
Summary of model (8) for different hours
R^{2}  β_{1}  Sig. Ind. for β_{1}  β_{2}  Sig. Ind. for β_{2}  

09:00  0.369  −0.0039  ***  0  
19:00  0.7387  −0.0223  ***  0.0082  *** 
05:00  0.2337  −0.0009  ***  0.0005  *** 
Result of the complete linear model (9)
Estimate  Standard error  Sig. Ind.  

Α  0.6069  0.00779  *** 
β_{1}  −0.0032  0.0001555  *** 
β_{2}  0.0049  0.0003382  *** 
β_{3}  0.6884  0.6558081 
The coefficient \( \beta_{3} \) for daily precipitation is not significant. The values of daily precipitation vary minimally—between 0 and 0.02 mm per 30 min.
To summarize, we have shown that temperature is a strong predictor of energy consumption. Moreover, the prediction power also depends on the time of the day. Other factors, like wind speed and precipitation, are less relevant in predicting energy demand.
Step 2: Integration of dependent and independent measurements
Based on the results of previous step, we choose predictor (7) and consider different times of the day.
We expect the influence of weather to be notably different for at least some class labels (e.g., households with large floor area or many residents use more heating energy).
We use mean temperature as the default independent measurement (\( s_{1} \)) for normalization. This way the normalized consumption values correspond to the consumption expected at mean temperature.
Step 3: Classification
 1.
Comparison of DIDclass with a baseline classifier.
First, we show the advantages of the DIDClass method over a Naïve classification with respect to the accuracy and runtime complexity. As a Naïve classification, we consider one that simply uses all observation variables as features [i.e., it tries to find the classification function f with Y = f(X,S)]. We have chosen SVM as a baseline classifier for comparison, because it is currently the bestknown algorithm for prediction of energyefficiency related characteristics of residential units (Beckel et al. 2013). To ensure objective comparison, we first tuned SVM to achieve the best accuracy, and only then used this SVM version in the core of DIDclass.
Since the independent variables (weather) are identical for all households for each observation (1 week), considering these variables will not have influence on the singleweek classification results based on a conventional algorithm (with weather taken as features). To cope with this challenge, it is possible to include several observations into analysis. In our experiments, we used the data from three consecutive weeks for the training of both baseline classifier and DIDclass. Moreover, we repeated the comparison for four different timespans to ensure stability of the results. These (calendar) weeks are: 47–49 (November) in 2009, 5–7 (January–February) in 2010, 11–13 (March–April) in 2010, and 31–33 (August) in 2010. The comparison results are shown in Table 8. It can be seen, that DIDClass performs either better or the same on all runs.Table 8Results with SVM (S) and SVMbased DIDclass (SD)
Property
Weeks
Weeks
Weeks
Weeks
47–49/09
05–07/10
11–13/10
31–33/10
S
SD
S
SD
S
SD
S
SD
Single
80.1
81.1
80.6
81.6
80.4
81.4
79
79.8
N_devices
50.7
50.9
50.8
51
50.7
50.8
50.6
50.8
Cooking
73
73
72.7
72.8
72.5
72.5
73.3
73.3
Family
75
75.8
75.7
76.4
75.9
76.8
75.6
76.3
Children
74
74.6
73.7
74.4
74.7
75.3
73.6
74.1
Age_house
56.7
57.9
57.9
59.2
57.4
58.4
57.1
58.4
Social_class
50.5
50.8
51.8
52.3
51.1
51.3
50.5
50.6
Floor area
62.9
64.1
62.8
64.1
63.6
64.6
61.8
63
N_residents
71.4
72.6
71.3
72.4
69.9
71.1
71.9
73.1
N_bedrooms
48.2
49.2
47.8
48.9
48.6
49.7
49.3
50.3
Employment
66.8
67.8
66.9
67.9
65.9
67
66.2
67.2
Retirement
69.9
71.5
70.1
71.7
71.2
72.8
71.2
72.8
For clarity reasons, we calculated the average accuracy on four runs. Table 9 (columns 2–3) indicates that DIDClass achieves better accuracy for 11 out of 12 properties, and the same accuracy for only one property (“cooking”). In other words, DIDclass reduces the error rate by 2.8 % on average, maximum by up to 5.6 % for the property “retirement”.Table 9The average classification results as accuracy in % for each class
Property
Comparison with trivial approach
Comparison with the best known household classifier
SVM
DIDClass with SVM
CLASS
DIDclass with CLASS
Single
79.9
80.8
83.3
83.5
N_devices
50.6
50.8
52.9
53.5
Cooking
72.9
72.9
72.7
73
Family
75.1
75.9
77.1
77.6
Children
73.9
74.5
74.5
74.8
Age_house
56.9
58.1
61.5
61.6
Social_class
50.7
51
50.9
51.3
Floor area
62.9
64.1
63.5
65.2
N_residents
71.7
72.9
72.8
73.8
N_bedrooms
48.3
49.3
49.2
49.5
Employment
66.7
67.8
68.3
69.2
Retirement
70.1
71.7
71.9
72.7
A single experiment computations by SVM took 95 min on average, while the classification by DIDclass on average took 25 min on the same laptop with 1.7 GHz Intel Core i7 CPU and 8 GB 1600 Hz RAM. The asymptotic complexity is O(n^{3}) for both algorithms, but large dimension of the training set raises the runtime of SVM classifier (Sreekanth et al. 2010).
 2.
Comparison of DIDclass with the stateoftheart household classifier called CLASS (Beckel et al. 2013).
In their work, Beckel et al. (2013) applied SVM tuned with feature extraction and selection. The algorithm run on the calendar week number 2 in 2010. For comparison reasons, we use the same data and the same SVM version in the core of DIDClass.
Similarly to the previous case, we repeat the experiment four times and average the results (see Tables 9 and 10). The results shown in Table 8 indicate that DIDClass improves the classification accuracy on 1–3 % compared to the CLASS algorithm. Especially floor area can be predicted more precisely, which seems to be natural because a larger household requires more heating at cold temperatures.Table 10Results with CLASS (C) and CLASSbased DIDclass (DC)
Property
Week
Week
Week
Week
46/09
02/10
16/10
37/10
C
DC
C
DC
C
DC
C
DC
Single
83.3
83.6
84
84.2
82.4
82.6
83.4
83.6
N_devices
53.1
53.9
53.5
54.4
53.8
54.4
51.2
51.4
Cooking
73
73.3
72.2
72.4
73.3
73.6
72.2
72.6
Family
76.4
77
78.8
79.1
77.3
78
76
76.5
Children
74.7
75.2
75.5
75.6
74.1
74.1
73.4
74.1
Age_house
61.1
61.5
61.3
61.2
62
62.2
61.4
61.4
Social_class
51.5
51.9
51.1
51.3
49.7
50.4
51.3
51.5
Floor area
63.5
65.2
63.1
64.7
63.9
65.7
63.6
65
N_residents
72.8
73.7
71
72.2
72.1
72.9
75.2
76.4
N_bedrooms
49.7
50
49.4
49.5
49.5
49.9
48.3
48.5
Employment
68.4
69.2
67.4
68.4
69.2
70
68.4
69.4
Retirement
71.4
72.3
71.5
72.3
72.3
73.1
72.3
73
In both cases, DIDClass performs better.
Conclusion
Findings suggest that targeted feedback doubles the energy savings from smart metering from 3 percent of conventional systems to 6 percent when targeted feedback is used (Loock et al. 2011). This amounts to an additional efficiency gain of about 100 kWh per year and household, with its beauty being the scalability to virtually all households with oftheshelf smart metering systems. Moreover, the tools allow for allocating resources for energy conservation and load shifting campaigns to households given their characteristics are known.
This research goes beyond the stateoftheart by providing a method to effectively reduce the dimensionality of the consumption data time series and additional powerusagerelevant data (e.g., weather, energy price, GDP, holidays and weekends, etc.) while minimizing information losses and enhancing accuracy of results, which forms a corner stone of subsequent policy analysis through personalized smartmeteringbased interventions on a usable level.
The satisfactory performance of the DIDClass method in the validation datasets illustrates its ability to classify potentially any household equipped with smart electricity meter. Additionally, any energyconsumption related data can be encompassed and contribute to the performance elevation.
The developed model could also be used for other classification problems with available “external” information. For instance, license plate recognition based on images from highway cameras, with illumination conditions and current daytime as independent variables, or credit scoring based on customer information, credit history, and loan applications with additional information on economic values like GDP, unemployment rate, price index as independent observations.
Future research can be directed toward extension of the model to the cases with specific nonlinear relationships between the pairs of dependent and independent variables. Additionally future research could enhance DIDclass toward extension of the set of properties, integration of other energy consumption figures (gas and warm water), combinations with other methods (multidimensional scaling, Isomap, diffusion maps, etc. (Lee et al. 2010)), and development of a tool for a realworld setting. In the longterm, empirical validation of targeted interventions made by using the gained information could show the value of the developed methodology and tool.
Nomenclature
 \( a \) :

Number of different class labels
 C :

Classifier
 d :

dummy variables
 f _{ j } :

normalization function for class j
 J :

set of all possible class labels
 \( \bar{M},M \) :

unlabelled test set
 m :

number of elements in the test set
 \( \bar{N},N \) :

labelled training set
 \( N^{{\prime }} \) :

normalized training set
 n :

number of elements in the training set
 \( p_{kl}^{i} \) :

probability of \( x_{i}^{{k^{{\prime }} }} \) to belong to class \( l \)
 \( P_{j}^{i} \) :

probability of \( x_{i} \) to belong to class \( j \)
 S :

family of all independent measurements
 \( s_{i} \) :

independent measurement
 \( \bar{x}_{i} \) :

measurement
 \( x_{i} \) :

dependent measurement
 \( x_{i}^{{\prime }} \) :

normalized measurement \( x_{i} \)
 \( x_{i}^{{j^{{\prime }} }} \) :

measurement \( x_{i} \) normalized as class \( j \)
 \( \bar{X} \) :

Family of all measurements
 X :

family of all dependent measurements
 \( X^{{\prime }} \) :

family of the normalized dependent measurements
 Y :

family of all class labels
 y _{ i } :

class label
Declarations
Authors’ contributions
MS, IK and TS have written the abstract, introduction, and conclusion sections. MS and IK have written the sections 2 and 3 with MS guiding the research and TS providing feedback. The ideas behind the case study in section 3 were developed by MS and TS. The ideas behind the algorithm described in section 2 were developed by MS and IK. The calculations and algorithm implementation were performed by IK. All authors read and approved the final manuscript.
Acknowledgements
The research presented in this paper was financially supported by Swiss Federal Office of Energy (Grant number SI/50105301) and Commission for Technology and Innovation in Switzerland (CTI Grant number 16702.2 PFENES).
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Apadula F, Bassini A, Elli A, Scapin S. Relationships between meteorological variables and monthly electricity demand. Appl Energy. 2012;98:346–56.View ArticleGoogle Scholar
 Beckel C, Sadamori L, Santini S. Automatic socioeconomic classification of households using electricity consumption data. eEnergy’13. 2013.
 Beidas BF, Weber CL. Higherorder correlationbased approach to modulation classification of digitally frequencymodulated signals. IEEE J Select Areas Commun. 1995;13(1):89–101.View ArticleGoogle Scholar
 Besseca M, Fouquaub J. The nonlinear link between electricity consumption and temperature in Europe: a threshold panel approach. Energy Econ. 2008;30(5):2705–21.View ArticleGoogle Scholar
 Carrizosa E, Morales DR. Supervised classification and mathematical optimization. Comput Oper Res. 2013;40(1):150–65.View ArticleGoogle Scholar
 Dobson AJ. An introduction to generalized linear models. USA: CRC Press; 2001.View ArticleGoogle Scholar
 D’Agostino RB. Goodnessoffittechniques, vol. 68. USA: CRC Press; 1986.Google Scholar
 Elias CN, Hatziargyriou ND. An annual midterm energy forecasting model using fuzzy logic. IEEE Trans Power Syst. 2009;24(1):469–78.View ArticleGoogle Scholar
 European commission. Energy efficiency. 2008. http://ec.europa.eu/energy/strategies/2008/doc/2008_11_ser2/energy_efficiency_memo.pdf.
 Figueiredo V, Rodrigues F, Vale Z, Gouveia JB. An electric energy consumer characterization framework based on data mining techniques. IEEE Trans Power Syst. 2005;20(2):596–602.View ArticleGoogle Scholar
 Fisher Ronald A. The use of multiple measurements in taxonomic problems. Annals of eugenics. 1936;7(2):179–88.View ArticleGoogle Scholar
 Flitman AM. Towards analysing student failures: neural networks compared with regression analysis and multiple discriminant analysis. Comput Oper Res. 1997;24(4):367–77.View ArticleGoogle Scholar
 Greco S, Matarazzo B, Slowinski R, Zanakis S. Rough set analysis of information tables with missing values. In: Proceedings of the Fifth International Conference of the Decision Sciences Institute. 1999. pp. 1359–1362.
 Hanchuan P, Long F, Ding C. Feature selection based on mutual information criteria of maxdependency, maxrelevance, and minredundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–38.View ArticleGoogle Scholar
 Hopf K, Sodenkamp M, Kozlovskiy I, Staake T. Feature extraction and filtering for household classification based on smart electricity meter data. Computer ScienceResearch and Development. 2014. pp. 1–8.
 ISSDA. Data from the commission for energy regulation. 2014. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.
 International Energy Agency. 2014. https://www.iea.org/publications/freepublications/publication/Indicators_2008.pdf.
 Jain A, Zongker D. Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell. 1997;19(2):153–8.View ArticleGoogle Scholar
 Joachims T. Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, editors. Machine learning: ECML98, LNCS, vol 1398. Berlin: Springer. 1998. pp. 137–142.
 Johansson S, Johansson J. Interactive dimensionality reduction through userdefined combinations of quality metrics. IEEE Trans Vis Comput Graph. 2009;15(6):993–1000.View ArticleGoogle Scholar
 Lamont M, Connell M. Assessing the influence of observations on the generalization performance of the kernel Fisher discriminant classifier, PhD Dissertation: Stellenbosch University. 2008.
 Lee JA, Verleysen M. Unsupervised dimensionality reduction: overview and recent advances. In: The 2010 International Joint Conference on Neural Networks (IJCNN). 2010. pp. 1–8.
 Lim TS, Loh WY, Shih YS. A comparison of prediction accuracy, complexity, and training time of thirtythree old and new classification algorithms. Mach Learn. 2000;40(3):203–28.View ArticleGoogle Scholar
 Loock CM, Staake T, Landwehr J. Green IS design and energy conservation: an empirical investigation of social normative feedback. In: ISIS. 2011.
 Mastrogiannis N, Boutsinas B, Giannikos I. A method for improving the accuracy of data mining classification algorithms. Comput Oper Res. 2009;36(10):2829–39.View ArticleGoogle Scholar
 NCDC. National Climate Data Center, NCDC DSI 3505. 2014. https://gis.ncdc.noaa.gov/geoportal/catalog/search/resource/details.page?id=gov.noaa.ncdc:C00532.
 Nelder JA, Baker RJ. Generalized linear models. Encyclopedia of Statistical Sciences. 1972.
 Radhakrishna C, Rao HT. Linear models. 2nd ed. New York: Springer; 1995.Google Scholar
 Roweis Sam T, Saul Lawrence K. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–6.View ArticleGoogle Scholar
 Roy BB. Aide Multicritere ala Décision. Méthodes et Cas. Economica. 1993.
 Santin OG. Behavioural patterns and user profiles related to energy consumption for heating. Energy Build. 2011;43(10):2662–72.View ArticleGoogle Scholar
 Santos I, Souza GP, Sacramento RSW. Principal component analysis to reduce forecasting error of industrial energy consumption in models based on neural networks. Artificial Intelligence and Soft Computing. 2014. pp. 143–154.
 Smith CAB. Some examples of discrimination. Ann Eugen. 1946;13(1):272–82.View ArticleGoogle Scholar
 Sodenkamp M, Kozlovskiy I, Staake T. Gaining is business value through big data analytics: a case study of the energy sector. In: Proceedings of the International Conference on Information Systems. 2015.
 Sodenkamp M, Hopf K, Staake T. Using supervised machine learning to explore energy consumption data in private sector housing. In: Handbook of research on organizational transformations through big data analytics. 2014. pp. 320–333.
 Sreekanth V, Vedaldi A, Zisserman A, Jawahar CV. Generalized RBF feature maps for efficient detection. In: Proceedings of the British Machine Vision Conference, Aberystwyth. 2010.
 Srinivasan V, Shocker AD. Linear programming techniques for multidimensional analysis of preferences. Psychometrika. 1979;38(3):337–69.View ArticleGoogle Scholar
 Suckling PW, Stackhouse LL. Impact of climatic variability on residential electrical energy consumption in the Eastern United States. Arch Met Geoph Biocl Ser B. 1983;33(3):219–27.View ArticleGoogle Scholar
 Sánchez IB, Espinós ID, Sarrión LM, López AQ, Burgos IN. Clients segmentation according to their domestic energy consumption by the use of selforganizing maps. 2009.
 V SB. Supervised machine learning: a review of classification techniques. Informatica. 2007;31:249–68.Google Scholar
 Veit A, Goebel C, Tidke R, Doblander C, Jacobsen HA. Household electricity demand forecasting–benchmarking stateoftheart methods. arXiv preprint. 2014.
 Wenig J, Sodenkamp M, Staake T. Databased assessment of plugin electric vehicle driving. Lecture Notes Comput Sci. 2015.
 Xiong T, Bao Y, Hu Z. Interval forecasting of electricity demand: A novel bivariate EMDbased support vector regression modeling framework. Int J Electric Power Energy Syst. 2014;63:353–62.View ArticleGoogle Scholar
 Yu W. Aide multicritère à la décision dans le cadre de la problématique du tri: concepts, méthodes et applications. 1992.
 Zaki MJ, Meira W Jr. Data mining and analysis: fundamental concepts and algorithms. Cambridge: Cambridge University Press; 2014.Google Scholar
 Zhang ZY, Gong DY, Ma JJ. A study on the electric power load of Beijing and its relationships with meteorological factors during summer and winter. Meteorol Appl. 2014;21(2):141–8.View ArticleGoogle Scholar
 Zhongyi Hu, Bao Yukun, Xiong Tao. Comprehensive learning particle swarm optimization based memetic algorithm for model selection in shortterm load forecasting using support vector regression. Appl Soft Comput. 2014;25:15–25.View ArticleGoogle Scholar
 Zopounidis C, Doumpos M. PREFDIS: a multicriteria decision support system for sorting decision problems. Comput Oper Res. 2000;27(7–8):779–97.View ArticleGoogle Scholar