Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique
- Syed Tanveer Jishan^{1},
- Raisul Islam Rashu^{1},
- Naheena Haque^{1} and
- Rashedur M Rahman^{1}Email author
DOI: 10.1186/s40165-014-0010-2
© Jishan; licensee Springer. 2015
Received: 12 October 2014
Accepted: 18 December 2014
Published: 12 March 2015
Abstract
There is a perpetual elevation in demand for higher education in the last decade all over the world; therefore, the need for improving the education system is imminent. Educational data mining is a newly-visible area in the field of data mining and it can be applied to better understanding the educational systems in Bangladesh. In this research, we present how data can be preprocessed using a discretization method called the Optimal Equal Width Binning and an over-sampling technique known as the Synthetic Minority Over-Sampling (SMOTE) to improve the accuracy of the students’ final grade prediction model for a particular course. In order to validate our method we have used data from a course offered at North South University, Bangladesh. The result obtained from the experiment gives a clear indication that the accuracy of the prediction model improves significantly when the discretization and over-sampling methods are applied.
Keywords
Educational data mining (EDM) Classification Naive Bayes Decision tree Neural network Discretization Equal width binning Over-sampling SMOTE Class imbalanceBackground
Educational Data Mining (EDM) is an interdisciplinary research area that fixates on the utilization of data mining in the educational field. Educational data can be from different sources, but generally from academic institutions, but nowadays, online learning systems are also the incipient environment for acquiring educational data which can be habituated to analyze and extract utilizable information (Romero & Ventura 2010). The goal of the research is to predict the students’ performance using attributes such as Cumulative Grade Point Average, Quiz, Laboratory, Midterm and Attendance marks. However, in order to improve the prediction model we introduced some preprocessing techniques so that the prediction model provides with more precise results which could be used to alert students before the final examination regarding their final outcome.
We received the course data and student information from the North South University. After acquiring the data we preprocessed it and then applied three classification algorithms, e.g., Naïve Bayes, Decision Tree and Neural Network. In order to improve the model we looked into the techniques at the data preprocessing level. At first we discretized the continuous attributes using optimal equal width binning as proposed by Kayah (2008) and then used Synthetic Minority Over-Sampling (SMOTE) technique (Chawla et al. 2002) to increase the volume of the data, provided that there were limited instances in the acquired data. There are four forms of the preprocessed data: normal acquired data, data with discretization technique applied, class balanced data using oversampling and the data where both the discretization and oversampling methods were used. We build twelve models by preprocessing the data in four different ways mentioned and using three classification techniques mentioned earlier. After all the models were built we compared their accuracy, precision, recall and F-measure of the class labels for those models. ROC Curves for each of the models are generated and Area Under the Curves (AUC) are also calculated and compared.
Related works
Educational Data Mining is a vast domain which consists of different applications. Using data mining techniques it is possible to build course planning system, detecting what type of learner a student is, making group of similar types of students, predicting the performance of the students as well as helping instructors to get insight on how to commence the classes (Romero & Ventura 2010). Pal and Pal (2013) conducted studies at the VBS Purvanchal University, Jaunpur, India and used classification algorithms to identify the students who need special advising or counseling from the teachers.
Ayers et al. (2009) used several clustering algorithms such as hierarchical agglomerative clustering, K-means and model based clustering in order to understand skill levels of the students and group them based on their skill sets. Bharadwaj and Pal (2012) found that students’ grade in the senior secondary exam, living location, medium of teaching, mother’s qualification, family annual income, and student’s family status are correlated strongly and help to predict how the students perform academically. In another study Bharadwaj and Pal (2011) used students’ previous semester marks, class test grade, seminar performance, assignment performance, general proficiency, attendance in class and lab work to predict the end of the semester marks.
A comparison of machine learning methods has been carried out to predict success in a course (either passed or failed) in Intelligent Tutoring Systems (Hämäläinen & Vinni 2006). Nebot et al. (2006) used different types of rule-based systems have been applied to predict student performance such as mark prediction in an e-learning environment using fuzzy association rules. Several classification algorithms have been applied in order to group students, such as: discriminant analysis, neural networks, random forests and decision trees for classifying university students into three groups such as low-risk, medium-risk and high-risk of failing (Superby et al. 2006).
Zhu et al. (2007) explains how making a personalized learning recommendation system which will help the learner beforehand what he or she should learn before moving to the next step. Yadav et al. (2012) used students’ attendance, class test grade, seminar and assignment marks, lab works to predict students’ performance at the end of the semester. They used the decision tree algorithms such as ID3, CART and C4.5 and made a comparative analysis. In their study, they achieved 52.08%, 56.25% and 45.83% accuracy of each of these classification techniques respectively.
Prati et al. (2004) discussed about recent works in the field of data mining to overcome the imbalanced dataset problem. They mainly focused in concepts and methods to deal with imbalanced datasets. Chawla et al. (2002) found that majority class and minority class both have to equally represent in classification category for balanced dataset. They used combination of the method of over sampling the minority class and under sampling the majority class to accomplish the better classifier performance in ROC space. They mainly introduced the Synthetic Minority Over-sampling approach which provides the new technique in over sampling and intercourse with the under sampling makes the better result.
Chen (2009) used several re-sampling techniques for finding the maximum accuracy of classification from fully labeled imbalanced training data set. SMOTE, Oversampling by duplicating minority examples, random under sampling, is mainly used to create new training data set. Standard classifiers like Decision Tree, Naive Bayes, Neural Network are trained in this data set and all the techniques show improved accuracy except Naive Bayes. Rahman and Davis et al. (2013) tried to address class imbalance issue in medical datasets. They used undersampling techniques as well as oversampling techniques like SMOTE to balance the classes.
There are some works done using Neural Network to predict students’ grade. Gedeon and Turner (1993) compared different types of neural network models which have been used to predict final student grades primarily; they mainly used backpropagation and feedforward neural networks. Want and Mitrovic (2002) used feedforward and backpropagation to predict the number of errors a student will make. Oladokun et al. (2008) used multilayer perceptron topology for predicting the likely performance of a candidate being considered for admission into the university.
We can notice that there is handful of works on grade prediction models, however, our focus was to address the issue of class imbalance and discretizing the continuous attributes effectively instead of taking an assumption such as, normal distribution. The primary goal was to observe whether synthetic minority oversampling method and optimum equal width binning together will result in better performance of the grade prediction models provided that most of the attributes in course mark sheets or data sets are continuous in nature and the number of instances were low.
Methods
Data selection
The dataset we are using contains 181 instances which is the number of students enrolled in the course during the prior 18 months. This dataset is from a course titled “Numerical Analysis” which is a core course in EEE disciple in North South University, Dhaka, Bangladesh. Originally the dataset had student ID, student name, five quiz marks, midterm marks, attendance, laboratory marks, final marks and final grade as attributes. We have selected the attribute which contains the percentage of marks obtained by the students in quizzes rather than taking all the quizzes into account. Final grade is considered as the class label. The same dataset is used for creating the over-sampled dataset where the number of instances is 360.
Data preparation
Attributes of the dataset
Attributes | Remarks |
---|---|
CGPA | Cumulative Grade Point Average. It ranges from 0.00 to 4.00. This is a measure to evaluate students’ past record |
Quiz marks | Best 4 out of 5 quizzes are counted as per the course policy which was intact throughout the five semesters. The average is taken and is normalized between 0 to 100. |
Midterm marks | Number of midterm examination differed between 1–2 among all the semesters taken into consideration. For the semesters where two midterms were held, average of them is taken. The data is then normalized between 0 to 100. |
Laboratory mark | Weight of the laboratory marks varied from semester to semester, therefore, the marks are normalized between 0 to 100. |
Attendance marks | Ranges from 0 to 100 |
Final grade | This is label our classification models will try to predict. final grade consist of five classes: A,B,C,D,F. |
Balancing the dataset using synthetic minority over-sampling
More details of the SMOTE algorithm could be found in Rahman and Davis et al. (2013), a short description is given below:
The data mining software Weka was used for implementing the SMOTE oversampling technique. The over-sampled data is then randomized twice for class balancing.
Weka (Holmes et al. 1994) is a open source software designed to carry out data analysis. It is widely used for machine learning and data mining purposes.
Handling continuous data using probability distribution function
There are uses of probability distribution function on the continuous attributes in the dataset in the model built using Naive Bayes classification.
Handling continuous data using optimal equal width binning
Naive Bayes for classification
Naive Bayes classifier (Tan et al. 2006) is a probabilistic classifier based on applying Bayes’ theorem. Naive Bayes assumes that all the attributes which will be used for classification are independent of each other. We used Naïve Bayes Classification to create four different models. In the first model we estimated the class labels for continuous attributes using probability distribution function (PDF), in the second model we used optimal equal binning width value, for the third model we over-sampled the data using SMOTE and for the fourth model we used both optimal equal width binning and SMOTE.
C4.5 algorithm for classification
C4.5 is an extended version of Iterative Dichotomiser 3 decision tree algorithm. In this algorithm, we need to calculate entropy of every attribute of the dataset and then we have to split the data set into subsets using the attributes of minimum entropy or maximum information gain. Some of the major extensions of C4.5 from ID3 is that it accepts both continuous and discrete features, handles incomplete data points and different weights can be applied on the features that comprise the training data (Quinlan 1993). We split the data using gain ratio and minimal size for the split was set to 4. Therefore, nodes where the number of subsets is greater than or equal to 4 will be split.
Backpropagation algorithm for classification
Backpropagation is a method of artificial neural network. It is used along with an optimization method called gradient descent. The Backpropagation algorithm is divided into two phases: propagation and weight update (Haykin 2008).
Implementation of the models
Results
Bin width values for the classification methods
Decision tree | Naïve Bayes | Neural network | |
---|---|---|---|
Quiz | 3 | 7 | 6 |
Midterm | 7 | 6 | 8 |
Laboratory | 5 | 6 | 4 |
Attendance | 2 | 2 | 2 |
CGPA | 4 | 4 | 4 |
Bin width values for the classification methods on SMOTE over-sampled data
Decision tree | Naïve Bayes | Neural network | |
---|---|---|---|
Quiz | 8 | 8 | 8 |
Midterm | 3 | 9 | 8 |
Laboratory | 5 | 6 | 6 |
Attendance | 2 | 2 | 2 |
CGPA | 6 | 4 | 4 |
Naive Bayes classification
Detailed analysis of the naive Bayes model
True C | True A | True D | True F | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. C | 33 | 0 | 8 | 2 | 16 | 55.93% |
Pred. A | 0 | 22 | 0 | 0 | 15 | 59.46% |
Pred. D | 6 | 0 | 9 | 6 | 0 | 42.86% |
Pred. F | 0 | 0 | 1 | 1 | 0 | 50.00% |
Pred. B | 9 | 6 | 0 | 1 | 45 | 73.77% |
Class recall | 68.75% | 78.57% | 50.00% | 10.00% | 59.21% | |
F-measure | 61.68% | 67.69% | 46.15% | 16.66% | 65.69% |
Detailed analysis of the naive Bayes model with optimal equal width binning
True C | True A | True D | True F | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. C | 34 | 0 | 6 | 0 | 13 | 64.15% |
Pred. A | 0 | 22 | 0 | 0 | 7 | 75.86% |
Pred. D | 6 | 0 | 9 | 5 | 1 | 42.86% |
Pred. F | 1 | 0 | 3 | 3 | 0 | 42.86% |
Pred. B | 7 | 6 | 0 | 2 | 55 | 78.57% |
Class recall | 70.83% | 78.57% | 50.00% | 30.00% | 72.37% | |
F-measure | 67.32% | 77.19% | 46.15% | 35.29% | 75.34% |
Detailed analysis of the naive Bayes model with SMOTE oversampling
True D | True C | True F | True A | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. D | 64 | 15 | 35 | 0 | 1 | 55.65% |
Pred. C | 8 | 47 | 14 | 0 | 13 | 57.32% |
Pred. F | 0 | 0 | 19 | 0 | 0 | 100.00% |
Pred. A | 0 | 0 | 0 | 65 | 17 | 79.27% |
Pred. B | 0 | 10 | 2 | 5 | 45 | 72.58% |
Class recall | 88.89% | 65.28% | 27.14% | 92.86% | 59.21% | |
F-measure | 68.44% | 61.04% | 42.69% | 85.52% | 65.21% |
Detailed analysis of the naive Bayes model with optimal equal width binning and SMOTE oversampling
True D | True C | True F | True A | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. D | 56 | 12 | 9 | 0 | 2 | 70.89% |
Pred. C | 7 | 49 | 3 | 0 | 14 | 67.12% |
Pred. F | 9 | 4 | 56 | 0 | 2 | 78.87% |
Pred. A | 0 | 0 | 0 | 65 | 13 | 83.33% |
Pred. B | 0 | 7 | 2 | 5 | 45 | 76.27% |
Class recall | 77.78% | 68.06% | 80.00% | 92.86% | 59.21% | |
F-measure | 74.17% | 67.58% | 79.43% | 87.83% | 66.66% |
Decision tree classification
Detailed analysis of the decision tree model
True C | True A | True D | True F | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. C | 29 | 0 | 14 | 7 | 11 | 47.54% |
Pred. A | 0 | 25 | 0 | 0 | 20 | 55.56% |
Pred. D | 2 | 0 | 1 | 1 | 0 | 25.00% |
Pred. F | 0 | 0 | 1 | 2 | 0 | 66.67% |
Pred. B | 17 | 3 | 2 | 0 | 45 | 67.16% |
Class recall | 60.42% | 89.29% | 5.56% | 20.00% | 59.21% | |
F-measure | 53.21% | 68.49% | 9.09% | 30.76% | 62.93% |
Detailed analysis of the decision tree model with optimal equal width binning
True C | True A | True D | True F | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. C | 33 | 1 | 11 | 3 | 17 | 50.77% |
Pred. A | 0 | 19 | 0 | 0 | 4 | 82.61% |
Pred. D | 9 | 0 | 5 | 3 | 1 | 27.78% |
Pred. F | 1 | 0 | 1 | 4 | 0 | 66.67% |
Pred. B | 5 | 8 | 1 | 0 | 54 | 79.41% |
Class recall | 68.75% | 67.86% | 27.78% | 40.00% | 71.05% | |
F-measure | 58.40% | 74.51% | 27.78% | 50.00% | 74.99% |
Detailed analysis of the decision tree model with SMOTE oversampling
True D | True C | True F | True A | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. D | 63 | 23 | 16 | 0 | 6 | 58.33% |
Pred. C | 4 | 33 | 6 | 1 | 8 | 63.46% |
Pred. F | 5 | 4 | 47 | 0 | 0 | 83.93% |
Pred. A | 0 | 1 | 0 | 58 | 8 | 86.57% |
Pred. B | 0 | 11 | 1 | 11 | 54 | 70.13% |
Class recall | 87.50% | 45.83% | 67.14% | 82.86% | 71.05% | |
F-measure | 69.99% | 53.22% | 74.60% | 84.67% | 70.58% |
Detailed analysis of the decision tree model with optimal equal width binning and SMOTE oversampling
True D | True C | True F | True A | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. D | 59 | 23 | 8 | 0 | 7 | 60.82% |
Pred. C | 5 | 39 | 2 | 1 | 14 | 63.93% |
Pred. F | 7 | 2 | 58 | 0 | 0 | 86.57% |
Pred. A | 0 | 0 | 0 | 63 | 20 | 75.90% |
Pred. B | 1 | 8 | 2 | 6 | 35 | 67.31% |
Class recall | 81.94% | 54.17% | 82.86% | 90.00% | 46.05% | |
F-measure | 69.81% | 58.64% | 84.67% | 82.35% | 54.68% |
Classification using neural network
Detailed analysis of the neural network model
True C | True A | True D | True F | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. C | 33 | 0 | 10 | 2 | 14 | 55.93% |
Pred. A | 0 | 18 | 0 | 0 | 6 | 75.00% |
Pred. D | 2 | 0 | 8 | 6 | 0 | 50.00% |
Pred. F | 0 | 0 | 0 | 0 | 0 | 0.00% |
Pred. B | 13 | 10 | 0 | 2 | 56 | 69.14% |
Class recall | 68.75% | 64.29% | 44.44% | 0.00% | 73.68% | |
F-measure | 61.68% | 69.23% | 47.05% | 0.00% | 71.33% |
Detailed analysis of the neural network model with optimal equal width binning
True C | True A | True D | True F | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. C | 38 | 0 | 7 | 1 | 13 | 64.41% |
Pred. A | 0 | 18 | 0 | 0 | 5 | 78.26% |
Pred. D | 4 | 0 | 11 | 7 | 1 | 47.83% |
Pred. F | 0 | 0 | 0 | 0 | 0 | 0.00% |
Pred. B | 6 | 10 | 0 | 2 | 57 | 76.00% |
Class recall | 79.17% | 64.29% | 61.11% | 0.00% | 75.00% | |
F-measure | 71.03% | 70.59% | 53.66% | 0.00% | 75.49% |
Detailed analysis of the neural network model with SMOTE oversampling
True D | True C | True F | True A | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. D | 53 | 12 | 6 | 0 | 3 | 71.62% |
Pred. C | 7 | 44 | 4 | 0 | 10 | 67.69% |
Pred. F | 12 | 6 | 59 | 0 | 0 | 76.62% |
Pred. A | 0 | 0 | 0 | 58 | 12 | 82.86% |
Pred. B | 0 | 10 | 1 | 12 | 51 | 68.92% |
Class recall | 73.61% | 61.11% | 84.29% | 82.86% | 67.11% | |
F-measure | 72.60% | 64.23% | 80.27% | 82.86% | 68.00% |
Detailed analysis of the neural network model with optimal equal width binning and SMOTE oversampling
True D | True C | True F | True A | True B | Class precision | |
---|---|---|---|---|---|---|
Pred. D | 52 | 10 | 3 | 0 | 4 | 75.36% |
Pred. C | 10 | 50 | 6 | 0 | 11 | 64.94% |
Pred. F | 10 | 4 | 59 | 0 | 0 | 80.82% |
Pred. A | 0 | 0 | 1 | 57 | 8 | 86.36% |
Pred. B | 0 | 8 | 1 | 13 | 53 | 70.67% |
Class recall | 72.22% | 69.44% | 84.29% | 81.43% | 69.74% | |
F-measure | 73.75% | 67.11% | 82.51% | 83.82% | 70.20% |
Receiver operating characteristic (ROC) curve comparisons
ROC curve which stands for receiver operating characteristic curve is the graphical representation of the performance of the binary classifier system for varying discrimination threshold (Tan et al. 2006). The horizontal axis represents the fraction of false positives out of total actual negatives (FPR = False positive rate) and the vertical axis represents the fraction of true positives out of total actual positives (TPR = True positive rate).
Since ROC curve is a binary classifier system but we have five class labels for the grade so we are presenting five ROC curves. For each ROC curve one class is considered as True class and the rest of the classes are considered as False class. ROC curves change when over-sampled data was used for classification which are discussed in the Section 4.5.
ROC curve comparisons after oversampling using SMOTE
Summary of the analysis
Analysis of the models
Model | Accuracy | Avg. Precision | Avg. Recall | Avg. F-Measure | Avg. AUC |
---|---|---|---|---|---|
Naive Bayes | 61.11% | 56.40% | 53.30% | 51.58% | 75.6% |
Naive Bayes (optimal binning) | 68.33% | 60.86% | 60.35% | 60.26% | 68.9% |
Naive Bayes (SMOTE) | 66.67% | 72.96% | 66.67% | 64.58% | 81.4% |
Naive Bayes (optimal binning + SMOTE) | 75.28% | 75.30% | 75.58% | 75.13% | 71.8% |
Decision tree | 56.11% | 56.90% | 45.73% | 43.44% | 40.1% |
Decision tree (optimal binning) | 60.56% | 50.56% | 48.96% | 49.54% | 47.9% |
Decision tree (SMOTE) | 70.83% | 72.48% | 70.87% | 70.61% | 64.8% |
Decision tree (optimal binning + SMOTE) | 70.56% | 70.91% | 71.00% | 70.03% | 68.4% |
Neural net | 65.56% | 70.21% | 60.41% | 62.65% | 72.3% |
Neural net (optimal binning) | 68.89% | 66.62% | 69.89% | 67.69% | 73.1% |
Neural net (SMOTE) | 73.61% | 73.54% | 73.38% | 73.59% | 81.3% |
Neural net (Optimal Binning + SMOTE) | 75.28% | 75.63% | 75.42% | 75.48% | 71.6% |
Pearson correlation coefficient for the validation measures
Conclusion
Our primary objective was to improve the models we build through preprocessing and then determining the model which gives the highest accuracy. As the number of instances in the dataset was small, oversampling was imminent. However, in order to distribute the instances we had to randomize the dataset twice. Two of the models which have the highest accuracy of about 75% are Neural Network and Naive Bayes classification with SMOTE oversampling and optimal equal width binning. Misclassification between two neighboring classes was high for the Classes D and F until the dataset was over-sampled and balanced. When Naive Bayes classifier was used on the original data the accuracy was around 61%, which means there was almost 14% increase in accuracy when the discretization method was introduced on the class balanced data. We can observe that Naive Bayes and Neural Network models produced almost similar accuracy level. However, Naive Bayes classification is computationally faster than Neural Network Backpropagation algorithm and so it is the ideal choice. It can also be concluded that accuracy of any prediction system improves significantly when SMOTE oversampling and optimum equal width binning are used together to preprocess dataset which is small in size and contains continuous attributes. Perhaps the level of misclassification error can be minimized if more attributes can be taken into consideration, such as, students’ grades in prerequisite courses. In future, we would also like to explore how the same optimization technique works for other data binning methods for example, binning by frequency, binning by size etc.
Declarations
Acknowledgements
The authors would like to thank the Department of Electrical and Computer Engineering, North South University for providing the students' course data for conducting the research.
Authors’ Affiliations
References
- Ayers, E, Nugent, R, & Dean, N. (2009). A comparison of student skill knowledge estimates. In International Conference On Educational Data Mining, Cordoba, Spain (pp. 1–10).Google Scholar
- Bharadwaj, BK, & Pal, S. (2011). Mining educational data to analyze students’ performance. International Journal of Advance Computer Science and Applications, 2(6), 63–69.Google Scholar
- Bharadwaj, BK, & Pal, S. (2012). Data mining: a prediction for performance improvement using classification. International Journal of Computer Science and Information Security, 9(4), 136–140.Google Scholar
- Chawla, NV, Bowyer, KW, Hall, LO, & Kegelmeyer, WP. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.Google Scholar
- Chen, Y. Learning Classifiers from Imbalanced, Only Positive and Unlabeled Data Sets. Department of Computer Science, Iowa State University, USA (2009), Retrieved July 25, 2014, from https://www.cs.iastate.edu/~yetianc/cs573/files/CS573_ProjectReport_YetianChen.pdf.
- Gedeon, TD, & Turner, HS. (1993). Explaining student grades predicted by a neural network. In International conference on Neural Networks, Nagoya (pp. 609–612).Google Scholar
- Hämäläinen, W, & Vinni, M. (2006). Comparison of machine learning methods for intelligent tutoring systems. In International Conference in Intelligent Tutoring Systems, Taiwan (pp. 525–534).View ArticleGoogle Scholar
- Haykin, S. Neural Networks and Learning Machines, 3rd Edition, Pearson Education Inc., Upper Saddle River, New Jersey, USA, 2008.
- Holmes, G, Donkin, A, & Witten, IH. (1994). Weka: a machine learning workbench. In Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on (pp. 357–361) (p. 357). IEEE.Google Scholar
- Kayah, F. (2008). Discretizing Continuous Features for Naive Bayes and C4. 5 Classifiers. University of Maryland publications: College Park, MD, USA.
- Nebot, A, Castro, F, Vellido, A, & Mugica, F. (2006). Identification of fuzzy models to predict students perfornance in an e-learning environment. In International Conference on Web-based Education, Puerto Vallarta (pp. 74–79).Google Scholar
- Oladokun, VO, Adebanjo, AT, & Charles-owaba, OE. (2008). Predicting student’s academic performance using artificial neural network: a case study of an engineering course. Pacific Journal of Science and Technology, 9(1), 72–79.Google Scholar
- Pal, AK, & Pal, S. (2013). Analysis and Mining of Educational Data for Predicting the Performance of Students. International Journal of Electronics Communication and Computer Engineering, Vol. 4, Issue 5 (pp. 1377–1381).
- Prati, RC, Batista, GE, & Monard, MC. (2004). Class imbalances versus class overlapping: an analysis of a learning system behavior. In MICAI 2004: Advances in Artificial Intelligence (pp. 312–321). Heidelberg: Springer Berlin.View ArticleGoogle Scholar
- Quinlan, JR. (1993). “C4. 5: programs for machine learning”. Morgan kaufmann. Morgan Kaufmann: San Francisco, CA, USA
- Rahman, MM, and Davis, DN. Addressing the Class Imbalance Problem in Medical Datasets, International Journal of Machine Learning and Computing, Vol. 3, No. 2, April 2013, (pp. 224-228).
- Romero, C, & Ventura, S. (2010). Educational data mining: a review of the state of the art. Systems, man, and cybernetics, part C: applications and reviews. IEEE Transactions on, 40(6), 601–618. Chicago.Google Scholar
- Superby, JF, Vandamme, JP, & Meskens, N. (2006). Determination of factors influencing the achievement of the first-year university students using data mining methods. In International Conference on Intelligent Tutoring Systems, Educational Data Mining Workshop, Taiwan (pp. 1–8).Google Scholar
- Tan, P, Kumar, V, & Steinbach, M. (2006). “Introduction to Data Mining”. New Delhi: Dorling Kindersley(India) Pvt. Ltd.Google Scholar
- Want, T, & Mitrovic, A. (2002). Using neural networks to predict student’s performance. In International Conference on Computers in Education, Washington, DC (pp. 1–5).Google Scholar
- Yadav, SK, Bharadwaj, B, & Pal, S. (2012). Data Mining Applications: A Comparative Study for Predicting student’s Performance. arXiv preprint arXiv:1202.4815.Google Scholar
- Zhu, F, Ip, HH, Fok, AW, & Cao, J. (2007). Peres: a personalized recommendation education system based on multi-agents & scorm. In Advances in Web Based Learning–ICWL (pp. 31–42). Heidelberg: Springer Berlin.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.