The early and accurate prediction of defects helps in testing software and therefore leads to an overall higher-quality product. Due to drift in software defect data, prediction model performances may degrade over time. Very few earlier works have investigated the significance of concept drift (CD) in software-defect prediction (SDP). Their results have shown that CD is present in software defect data and tha it has a significant impact on the performance of defect prediction. Motivated from this observation, this paper presents a paired learner-based drift detection and adaptation approach in SDP that dynamically adapts the varying concepts by updating one of the learners in pair. For a given defect dataset, a subset of data modules is analyzed at a time by both learners based on their learning experience from the past. A difference in accuracies of the two is used to detect drift in the data. We perform an evaluation of the presented study using defect datasets collected from the SEACraft and PROMISE data repositories. The experimentation results show that the presented approach successfully detects the concept drift points and performs better compared to existing methods, as is evident from the comparative analysis performed using various performance parameters such as number of drift points, ROC-AUC score, accuracy, and statistical analysis using Wilcoxon signed rank test.
Keywords: concept drift; naive Bayes; random forest; software defect prediction; software quality assurance
To minimize software testing efforts by predicting defect-prone software modules beforehand, many software defect-prediction (SDP) approaches, as described in [[
These prediction models perform well (more accurately) if the software metrics data analyzed are stable. However, the prediction performance of these models may degrade if metrics data evolve dynamically over time. This change is affected by the variation in the underlying properties of the metrics data. This variation in the properties of the data over time is known as concept drift (CD) [[
SDP aims to ease the allocation of constricted Software Quality Assurance (SQA) resources optimally via prior prognosis of the defect-proneness of software components [[
Only a couple of studies [[
In this paper, we present a dynamic approach based on paired learner to detect and adapt to the concept drift in software-defect prediction. Dynamic behavior occurs as the learned experience from the processed modules is used to predict the presence of drift in unseen modules. We investigate the presence of sudden and gradual drifts in software-defect data. The presented approach first divides the validation dataset into assorted separate subsets using a partitioning technique, and then, trains one of the learners using modules in all subsets exercised previously and the other learner from the modules in the recent subset only. Afterwards, both learners are used to disjointedly predict on unseen data modules; then, the difference in predictions of the two learners is compared to a threshold value for the purpose of detecting drift. If drift is detected, the first learner is updated to handle the CD.
The following are the contributions of this study:
- We present a dynamic approach based on paired learner to detect and adapt to CD in SDP.
- An exhaustive assessment of the presented method for several defect datasets collected from different open source data repositories is presented. We evaluate the presented method in terms of the improvement in detection of concept drift points using the prediction model in SDP.
- We present a comparative evaluation of the presented method with the base learning methods used in the study.
In our current study, we aim to answer the following research questions, which were not answered by previous studies:
- RQ1—Available studies on CD in SDP are mainly focused on showing the proneness of defect data to concept drift. Therefore, there is a need to explore how significantly the concept drift points in numbers are present in defect data used for SDP.
- RQ2—Studies in other domains of data analysis proved that CD adaptation is required to prevent performance degradation of the prediction model. No previous studies in SDP ever used paired learner for CD adaptation in SDP. Therefore, how would the proposed approach adapt to the CD discovered in SDP?
In this work, we provide a procedure for concept-drift detection and adaptation in SDP along with a comparative analysis of the number of drift points discovered to answer question 1. In order to answer question 2, we provide an algorithmic procedure to explain the CD adaptation procedure and a comprehensive comparative performance analysis of the results as proof by implementing the window-based approach.
The remainder of this paper is organized into the following sections. The background of the study is presented in Section 2. Section 3 presents the synopsis of the related studies. A detailed description of the proposed approach is presented in Section 4. The experimental settings are presented in Section 5. Section 6 presents the experimentation results and discussion. The comparative analysis results are provided in Section 7. Section 8 describes the threats to validity. Finally, Section 9 concludes the paper.
Concept drift has been previously studied by many researchers in different domains. The authors in [[
If
The change in concept over a period of time is known as concept drift. This concept change may happen in four ways, giving four types of CD categories, defined as follows:
- There is a concept change within a short duration of time (Sudden).
- An old concept is gradually replaced by a new concept over a period of time (Gradual).
- A new concept is incrementally reached over a period of time (Incremental).
- The old concept reoccurs after some time (Reoccurring).
Dong et al. in [[
While predicting defects in software, drift could happen when the trained predictor does not perform well due to the presence of a changing concept. Therefore, these changing concepts must be discovered to update the predictor. Our proposed approach, as described in the following sections, efficiently discovers the concept-drift points in software-defect data.
A few investigations revealing assessments of CD in SDP are found in the literature. Below, we talk about these works, which are summarized in Table 1.
Rathore et al. [[
Hall et al. [[
These exhaustive review studies on SDP showed that the SDP models are trained using the historical metrics data. Thereafter, these models use this learned experience to predict the number of faults or the presence/absence of faults in the currently used software modules. This provided the basis for diagnosing the effects of changing data distributions in SDP processes. Only a few studies in this domain have been presented so far, as reviewed below:
Ekanayake et al. [[
Kabir et al. [[
A recent study by Kwabena E. Bennin et al. [[
The research in these studies did not provide any solution for handling concept drift in SDP, and they did not explore paired learner for CD adaptation in SDP. Therefore, in our current work, we provide the solution to overcoming the listed shortcomings. We proposed a dynamic approach for detecting CD in SDP. Thereafter, we perform a comprehensive comparative analysis of the results using the Wilcoxon statistical test.
In this section, we present the proposed approach to handling CD in software-defect prediction. The concept of paired learner was previously discussed by Bach and Maloof [[
Bach and Maloof [[
Zhang et al. in [[
In this section, we present a PL-based approach in which batches of data examples are processed at a time to predict the presence of defects in the batch. The stable learner is trained using past data, whereas the reactive one always learns from the recent window only. Both the stable and reactive learners of the pair predict at the same time using the software metrics data samples in the currently processed window. The difference in accuracies of the learners in the pair is used to detect drift in the data. If drift is detected, then it is assumed that the past learned experience of the stable learner is causing harm as the reactive one performs better. Therefore, it is time to update the stable learner of the pair. As a consequence, the algorithm forces the stable learner to be updated by making a new instance of it and therefore training the newly created stable learner with the data samples in the currently processed window only. This removes the chances of affecting the performance of the paired learner from the learned experience using old data examples.
Figure 1 below depicts the working of the proposed approach. A batch size of six samples from defect data was selected to depict the pictorial representation of the concept. At any point of time, the stable learner (SL) of the pair is trained using all examples from the beginning or last drift points onward and the reactive learner (RL) is trained using the examples in recent batch. As soon as a new batch of examples arrives, both SL and RL make predictions based on their training until then, as described earlier. The approach uses actual values of a target variable to compute the performances of SL and RL in terms of correct classifications and misclassifications. As RL and SL predict on the same test data, the difference in their performances is used to detect drift. If CD is detected, the approach forces the SL of the pair to be reinitialized by the current learning experience of the RL. Then, both SL and RL are trained using the examples in the currently tested batch and RL is made to forget its learning from the previous batch. Therefore, at any point of time, RL's learning experience is only from the examples in the recent window.
Algorithm 1 presents the algorithmic procedure of the proposed approach. The input to the algorithm (line 1) is a collection of example instances from the defect dataset, window size (k) (line 4), threshold value (
The workings of the algorithm are described in Figure A1. This diagram represents the block diagram showing various steps followed in the proposed approach.
▹ Inputs to the Algorithm 1: 2: : vector of independent variables 3: : The value of dependent variable 4: k: size of the window 5: threshold for drift detection 6: S: stable learner 7: R: reactive learner 8: : k predicted values by stable learner 9: : k predicted values by reactive learner 10: C: count to hold miss-classifications, initially 0 11: index: start counter for stable, initially 1 ▹ Initially Train the models using samples in the first window 12: S.train(, ), R.train(, ) 13: ▹ Make Classification using stable learner 14: ▹ Make Classification using reactive learner 15: ▹ Count misclassifications 16: 17: 18: 19: 20: ▹ Detecting CD in current batch of data samples 21: 22: Output concept drift at i and adapt to drift ▹ New instance of SL 23: 24: index = 25: 26: ▹ Train the learners with corresponding set of data 27: 28: 29:
The study by Žliobaitė et al. [[
For our experimentation, we used paired learners of naive Bayes (NB) and random forest (RF) classifiers (also known as PL-NB and PL-RF). We chose these learners due to the following reasons:
Naive Bayes has been widely used in the literature [[
Let the prior probability for each class
The conditional probability for each class is
Then, the most probable class H is
Previous studies [[
We also evaluated a few available learners including decision tree (DT), K-nearest neighbor (KNN), naive Bayes (NB), random forest (RF), and support vector machine (SVM) used in the defect-prediction literature. While predicting the defects in a software dataset, the accuracy scores reported by these learners are used to select two top-performing classifiers (NB and RF) for our experimental study.
We chose the versions of publicly available NASA (JM1 and KC1) and Jureczko PROP datasets because they have already been used in past research [[
We conducted our experiments on the datasets collected from the PROMISE and SEACraft online public repositories. The JM1 and KC1 datasets consist of Lines of Code, McCabe's [[
The Chidamber & Kemerer object-oriented metrics suite originated in [[
Catal and others in [[
Table 2 and Table 3 present the summary of the attributes used in the selected datasets for our experiment.
Table 4 represents the summary selected datasets for our experiment. There are 22 attribute features in the NASA datasets and 21 attribute features in the Jureczko datasets. The last attribute in these datasets is the target feature.
Previous studies [[
We selected the best fixed, small window based on the ROC-AUC scores and accuracy of prediction for each dataset after repeatedly experimenting with the procedure for different values of the window sizes starting from 100 to 500 samples; refer Table 5. We were successful in detecting the sudden drift in the experimented software defect data.
AUC-ROC: We used the area under the ROC curve to compare the performance of the proposed approach. The accuracies of the PL-NB and PL-RF are computed based on the true and false positive and negative values of prediction. Additionally, the number of misclassifications by the stable learner in the window where the corresponding reactive learner made correct classifications along with a user specified threshold value is used to detect CD. Accuracy alone is not used for drift detection to avoid mis-detection in scenarios where the stable learner's predictions are correct and reactive learner's predictions are wrong for the first half of the window whereas the reactive learner's predictions are correct and stable learner's predictions are wrong for the next half of the window. In such a scenario, as the reactive learner's accuracy increases and the stable learner's accuracy decreases because of a changing concept, the accuracies of both the stable learner and the reactive learner are 50%. Hence, the accuracy measure could not detect the changing concept in such a situation. To avoid such misclassifications, the second proposed measure, i.e., the difference in misclassifications of two learners, is used to detect CD.
The area under the ROC curve corresponds to the confidence in positive classifications. As none of the true positive and false positive rates consider the number of true negatives, the classifier performance is not impacted by the skewed class distribution. Therefore, we chose to use these rates to evaluate the performance of the classification in spite of the accuracy measure, which was the same method employed by Lessmann et al. in [[
Cost–Benefit Analysis: To determine the cost-effectiveness of SDP models, a cost–benefit analysis of the employed CD detection and adaptation technique is undertaken. In the context of SDP, Wagner first developed the concept of a cost–benefit analysis [[
When defect-prediction results are combined with the software-testing process, Equation (
(
(
when:
The definitions of the notations used are the same as those specified in one of the studies by [[
- "Ec: Estimated software defect elimination costs based on software defect prediction results
- NormCost: Normalized defect removal cost of the software when software defect prediction along with CD handling is used
-
-
-
-
-
- FP: False positives in numbers
- FN: False negatives in numbers
- TP: True positives in numbers
-
-
-
The defect identification efficiency of various testing phases is defined as staff hour per defect and is based on research by Jone [[
Wilcoxon statistical test [[
To test the hypothesis, Wilcoxon signed rank test is executed with the zero_method, alternative, correction, and mode parameters in default settings. A significance level of 0.05 with a confidence interval of 95% is chosen. That is, for all of the experiments, the null hypothesis is that there is no significant difference between the two data distributions where the observed significance level is greater than or equal to 0.05. Otherwise, the null hypothesis is rejected and the alternate hypothesis is assumed, declaring the detection of concept drift.
The null hypothesis and the alternate hypothesis for the Wilcoxon statistical test may be stated as follows:
Null Hypothesis
Alternate Hypothesis
The p-values less than alpha (0.05) indicate that the Wilcoxon test rejects the null hypothesis and accepts the alternate hypothesis.
This experimentation describes the presence of CD in software-defect datasets. It also reports the improvements in prediction performance as part of handling the discovered CD. Around 9600 sample instances from the NASA jm1 dataset, 2100 from the NASA KC1 dataset, and 27,750 from the Jureczko prop dataset are used for our experimentation. For the jm1 and prop datasets, initially 500 samples are used to train the model. Thereafter, samples in chunks of 500 are used to detect and handle CD in subsequent runs. For the KC1 dataset, chunks of 100 sample instances are used for the experiment following a similar process. We trained the model by taking all of the samples in one window and the only recently used chunk in another window. We iterated this process until all data samples were experimented upon. We used paired learners of naive Bayes and random forest classifiers for the prediction. A threshold value of 95% confidence interval along with the predictions by stable and reactive learners of the pair was used to detect the change in concept.
Initially, we executed the base versions of the NB and RF classifiers to compute the accuracy and ROC-AUC scores.
Secondly, we executed the paired versions of NB and RF to find the concept-drift points.
Finally, we updated the stable learner of the pair to adapt the detected concept drift and computed the accuracy and ROC-AUC scores using the paired versions.
This experiment first found out the presence of CD in our studied software defect datasets using the proposed PL based method. Afterwards, we handled the discovered drift by updating the prediction model. The error rate between the predictions by stable and reactive learners of every experimental run was monitored to investigate the existence of drift. Table 6 presents the accuracy and ROC-AUC scores obtained after experimenting the PL based method for the jm1, KC1, and 9 versions of the prop datasets.
We monitored the error rate between the classifications using the stable learner on the overall distributions and using the reactive learner on the recent distribution to monitor the changes in recent data distribution. The significant change in error rate between the two distributions indicate the change in concept.
Changes in concept were detected in the experimented prop and jm1 & KC1 defect datasets, discovering drift. Then, the prediction models were updated and obtained better prediction performances, as shown in Table 6.
This section presents the results obtained after conducting experiments on the NASA jm1 and KC1 defect datasets collected from the PROMISE online public data repository. There are 22 features in the dataset, of which 21 are used to represent feature vectors and 22 are used as class variables. A window size of 500 samples for jm1 and 100 samples for the KC1 dataset was used for the experiment.
The AUC-ROC curves, as shown in Figure 2a,b, showed that the ROC-AUC scores of PL-NB and PL-RF are better than the ROC-AUC scores of base NB and base RF (Refer Table 6), respectively. That is to say, the application of PL for CD adaptation in SDP showed significant improvement in the performance of the prediction model.
In this section, we presented the results obtained after conducting experiments on versions of the Jureczko 'prop' dataset collected from the SEACraft [[
The AUC-ROC curves, as shown in Figure 2a,b and Figure 3a–i, showed that the ROC-AUC scores of PL-NB and PL-RF are better than the ROC-AUC scores of base NB and base RF (Refer Table 6), respectively. That is to say, the application of PL for CD adaptation in SDP showed significant improvement in the performance of the prediction model.
SQA is a process that runs concurrently with software development. It aims to improve the software-development process so that issues can be avoided before they become a big problem. SQA is a form of umbrella operation that covers the entire software-development process. Testing is the process of using every function of a product in order to ensure that it meets quality-control requirements. This might include putting the product to use, stress testing it, or checking to see if the actual service results match the predicted ones. This method detects any flaws before a product or service goes live [[
Programming testing is a fundamental activity, and it is over-the-top expensive to test the software defects altogether. An investigation on the Cost of Software Quality (CoSQ) model announced that the expense of software defects in the USA is 2.84 trillion dollars and, furthermore, influenced more than four billion individuals on the planet [[
It is critical for a software manager to know whether they can depend on a bug prediction model as a wrong prediction of the number or the location of future bugs can lead to problems [[
The normalized cost values (NormCost) of the basic and proposed approach are presented in Table 7. The NormCost value from the respective PL for each dataset is reported in the table, and values less than 1.0 indicate that the suggested approach is cost-effective. This means that, if SDP results are combined with CD adaptation, overall testing costs and time can be reduced. Values greater than 1.0, on the other hand, indicate that CD adaptation in SDP in that scenario is ineffective at reducing testing costs and effort; therefore, it is recommended that CD adaptation in SDP models be avoided in such cases. In contrast, the value of 1 for NormCost indicates that the PL-based technique could not reduce the testing costs and efforts. From Table 7, it can be seen that the normalized cost values of PL-NB are less than 1 for the KC1, V40, V192, V236, and V318 datasets. This proves that, by applying the PL-NB approach for SDP, overall testing costs and efforts can be reduced on these datasets. Similarly, PL-RF showed improvements in costs for the jm1, KC1, V40, V85, V192, V236, and V318 datasets.
Comparative analysis primarily involves comparing the performance of the proposed PL-based approach with existing approaches in terms of the number of drift points for defect datasets. The work in [[
We also performed a comparison of the proposed approach with similar work in reference to various parameters, as shown in Table 9.
From the results shown in Table 8 and Table 9, it can be deduced that the proposed PL-based approach performed significantly better than existing works for the detection of concept-drift points in software-defect datasets. Furthermore, to observe the accuracy of PL-based approach in handling the CD, we applied the base versions of naive Bayes and random forest classifiers with the same settings as their paired versions on the datasets selected for the experiments and computed the prediction performances of the base methods in terms of ROC-AUC score and accuracy measures. We then compared the performances of PL-NB and PL-RF with that of the performances of base NB and base RF classifiers, respectively. We performed a comparative analysis using the experimentation results reported in Section 5.
We performed a statistical analysis of the results using Wilcoxon statistical test [[
The Wilcoxon signed rank test statistic is defined as small W+ (sum of positive ranks) and W− (sum of the negative ranks). In our experimental results, all of the values of the AUC scores for PL-NB and PL-RF are higher than or equal to the corresponding AUC scores for basic NB and basic RF. Therefore, the Wilcoxon signed rank test resulted in 0 value of the test statistic.
In this work, we applied the PL approach in SDP and computed ROC-AUC scores for basic as well as paired learners. We selected two NASA datasets and nine versions of the Jureczko prop defect datasets from public online data repositories. We answer the questions raised in Section 1 by referring to the results posted in previous sections.
RQ1- How significantly are the concept-drift points present in defect data used for SDP?
The software-defect data presented in Table 4 comprises both dependent and independent variables of categorical software metrics data collected from various software projects. A defect-prediction model predicts the value of a dependent variable by analyzing the data of the independent variables. The presence of CD in data may lead to worse predictions and therefore may downgrade the model's performance. Paired learners of NB and RF were used for detecting the presence of CD in defect data. The experimental results presented in Table 8 show that the PL-based approach has significantly detected more drift points compared to a similar approach [[
RQ2- How would the proposed approach for CD adaptation in SDP impact the performance of the defect prediction model?
The ROC-AUC measure was used as a performance-evaluation parameter in our current study. The performances of the base learner and its corresponding paired learner on the NASA and Jureczko datasets are presented in Table 6 and in Section 6.1 and Section 6.2. Running the Wilcoxon test on accuracy scores of NB and PL-NB resulted in a rejection of the null hypothesis, with a corresponding p-value of 0.007, whereas that of RF and PL-RF also rejected the null hypothesis, with a corresponding p-value of 0.027. The hypothesis test results are presented in Table 10.
This section describes the threats to the validity of the experimental analyses presented in the current study.
For our experiment, we selected publicly available defect datasets consisting of a large range of metrics data. These datasets have been highly referenced by many previous studies in SDP [[
This paper presents a PL-based approach to detect and adapt CD in the jm1, KC1, and prop software defect datasets. The proposed approach was evaluated and validated using the model's training and predictions following a window-based approach.
The experiments performed assessed the model's response for the accuracy of prediction along with the presence or absence of drift points. The number of drift points detected by PL-NB and PL-RF are 23 and 34 for the jm1 dataset, 36 and 5 for the prop dataset, and 1 and 2 for the KC1 dataset, respectively, which are better than the number of drift points reported by a similar study. The adaptation of concept drift in the experimented defect datasets showed significant improvement in the ROC-AUC scores by PL-NB and PL-RF over the base NB and base RF models, as evidenced by the Wilcoxon statistical test results presented in Table 10. For reference, the AUC scores, as presented in Table 6, by NB and PL-NB on the jm1 and KC1 datasets are 0.50 and 0.56, and 0.95 and 0.97, respectively. In contrast, the ROC-AUC scores by RF and PL-RF on these datasets are 0.83 and 0.87, and 0.81 and 0.84, respectively. This showed that the use of the PL-based approach for CD detection and adaptation in software-defect prediction shows significant improvement in the performance of the prediction model.
As future work, we intend to extend our current study by exploring the paired learners of more classifiers. Another consideration is an assessment of the performance of the proposed approach on real-world software-defect datasets.
Graph: Figure 1 Concept of the proposed approach based on PL.
Graph: Figure 2 ROC curves of NB and PL-NB, and RF and PL-RF for jm1 and KC1 datasets.
Graph: Figure 3 ROC curves of NB and PL-NB, and RF and PL-RF for prop defect datasets.
Table 1 Related works.
Feature [ [ [ [ This Work Prediction CPE, 48 DT DDM, Naive NB, NN, Paired Learners Evaluation ROC, AUC ROC, AUC Accuracy, Recall, Accuracy, Tested Eclipse, Eclipse, jm1 Eclipse, KC1, Tools Weka Weka Statistical Weka Sci-kit Learn, Statistical No Mann–Whitney Chi-square No Wilcoxon signed
Table 2 Attribute details of the NASA datasets.
Sr. No. Metric Type Description Attribute Category 1 McCabe's cyclomatic v(g) numeric 2 McCabe's design iv(g) numeric 3 McCabe's essential ev(g) numeric 4 McCabe's line count loc numeric 5 Halstead count of lOBlank numeric 6 Halstead count of lines lOComment numeric 7 Halstead difficulty D numeric 8 Halstead effort E numeric 9 Halstead effort b numeric 10 Halstead intelligence I numeric 11 Halstead line count lOCode numeric 12 Halstead program L numeric 13 Halstead time T numeric 14 Halstead total Optrs N numeric 15 Halstead volume v numeric 16 LOC LOC lOC & C numeric 17 Operator unique uniq_Op numeric 18 Operands unique uniq_Opnd numeric 19 Operator total ops total_Op numeric 20 Operands total opnds total_Opnd numeric 21 Branch BC of the bcount numeric Class defect/ no T/F Boolean
Table 3 Attribute details of the Jureczko prop dataset.
Sr No Metric Description Attrib Category 1 C & K weighted wmc numeric 2 C & K depth of dit numeric 3 C & K number of noc numeric 4 C & K coupling cbo numeric 5 C & K response for rfc numeric 6 C & K lack of cohesion lcom numeric 7 Martin's afferent couplings ca numeric 8 Martin's efferent couplings ce numeric 9 QMOOD number of npm numeric 10 H-S lack of cohesion lcom3 numeric 11 LOC line of code loc numeric 12 QMOOD data access dam numeric 13 QMOOD measure of moa numeric 14 QMOOD measure of mfa numeric 15 QMOOD cohesion among cam numeric 16 C & K inheritance ic numeric 17 C & K coupling between cbm numeric 18 C & K average method amc numeric 19 McCabe's max cyclomatic max_cc numeric 20 McCabe's avg cyclomatic avg_cc numeric 21 Class number of bug numeric
Table 4 Summary of datasets.
Dataset Modules in Dataset Faulty Modules % of Faulty Modules NASA jm1 9593 1760 18.3% NASA KC1 2109 326 15.5% prop V4 3022 214 07.1% prop V40 4053 337 08.3% prop V44 4620 295 06.4% prop V85 3077 942 30.6% prop V192 3598 85 02.4% prop V236 2231 76 03.4% prop V256 1964 625 31.8% prop V318 2395 364 15.2% prop V355 2791 924 33.1%
Table 5 ROC-AUC scores of the models for various window sizes.
Dataset Window Size Basic-NB PL-NB Basic-RF PL-RF jm1 0.5099 0.5694 0.8356 0.8716 jm1 200 0.5110 0.5318 0.8308 0.8636 jm1 300 0.5108 0.5580 0.8207 0.8611 jm1 400 0.5105 0.5311 0.8201 0.8437 jm1 500 0.5130 0.5239 0.8053 0.8339 KC1 100 0.9347 0.9516 0.8688 0.9185 KC1 200 0.9366 0.9575 0.8466 0.8908 KC1 300 0.89 0.9063 0.838 0.8671 KC1 0.9522 0.9722 0.8193 0.8421 KC1 500 0.951 0.9563 0.8409 0.8589 P-V4 100 0.601 0.592 0.5952 0.59 P-V4 200 0.6065 0.5892 0.5996 0.5856 P-V4 300 0.6054 0.6398 0.5843 0.5860 P-V4 400 0.6074 0.6485 0.5803 0.5851 P-V4 0.5977 0.6323 0.5754 0.5760 P-V40 0.658 0.6794 0.6454 0.6513 P-V40 200 0.6275 0.6506 0.6078 0.6096 P-V40 300 0.6232 0.6454 0.5996 0.5985 P-V40 400 0.5839 0.5909 0.565 0.5601 P-V40 500 0.5834 0.5817 0.5411 0.5411 P-V44 100 0.5986 0.6183 0.643 0.6431 P-V44 200 0.5974 0.6176 0.6248 0.6229 P-V44 300 0.6037 0.5883 0.6202 0.619 P-V44 0.5994 0.5989 0.6276 0.6314 P-V44 500 0.6019 0.6059 0.6329 0.6139 P-V85 100 0.5742 0.5733 0.6977 0.6902 P-V85 200 0.5715 0.5615 0.6928 0.6949 P-V85 300 0.5669 0.5666 0.6915 0.6968 P-V85 0.5579 0.5877 0.6295 0.6309 P-V85 500 0.5695 0.5834 0.7063 0.7084 P-V236 100 0.6346 0.6091 0.5783 0.5721 P-V236 200 0.6439 0.6291 0.5543 0.5681 P-V236 300 0.5977 0.5977 0.5833 0.5836 P-V236 400 0.6401 0.6401 0.5516 0.5698 P-V236 0.6264 0.635 0.5845 0.5852 P-V318 100 0.6653 0.6694 0.702 0.6834 P-V318 200 0.6966 0.6826 0.7232 0.7149 P-V318 300 0.7007 0.6987 0.719 0.6871 P-V318 0.6879 0.6992 0.7198 0.7352 P-V318 500 0.7152 0.7152 0.6839 0.6951
Table 6 Accuracy and ROC-AUC scores.
Dataset Window Accuracy AUC Score Accuracy AUC Score jm1 100 0.81 0.79 0.50 0.83 0.80 0.87 0.56 0.87 KC1 400 0.90 0.64 0.95 0.81 0.94 0.68 0.97 0.84 P-V4 500 0.84 0.92 0.59 0.58 0.81 0.92 0.63 0.58 P-V40 100 0.77 0.92 0.65 0.64 0.78 0.92 0.67 0.65 P-V44 400 0.87 0.92 0.59 0.61 0.83 0.92 0.59 0.62 P-V85 400 0.66 0.73 0.55 0.63 0.67 0.74 0.58 0.63 P-V192 100 0.71 0.98 0.53 0.62 0.78 0.98 0.55 0.62 P-V236 500 0.87 0.97 0.62 0.57 0.89 0.97 0.63 0.57 P-V256 500 0.69 0.85 0.65 0.81 0.69 0.84 0.65 0.82 P-V318 400 0.83 0.87 0.68 0.71 0.84 0.88 0.69 0.73 P-V355 400 0.69 0.81 0.55 0.76 0.70 0.80 0.56 0.76
Table 7 Normalized costs.
Dataset Value of NormCost for PL-NB Value of NormCost for PL-RF jm1 1.02 KC1 P-V4 1.06 1 P-V40 P-V44 1.07 1 P-V85 1 P-V192 P-V236 P-V256 1 1 P-V318 P-V355 1 1
Table 8 Comparative analysis with the existing CD-detection method.
Approach Dataset Used Number of Drift Points Detected [ jm1 2 [ jm1 2 PL—NB jm1 23 PL—RF jm1 34 [ prop 5 [ prop 4 PL—NB prop 35 PL—RF prop 3 PL—NB KC1 1 PL—RF KC1 2
Table 9 Comparison with similar methods.
Work Aim Dataset Evaln Techn- Results in terms [ To show that Eclipse, ROC-AUC, Weka's Change [ To find Eclipse, ROC-AUC, Weka's Detected [ CD jm1, Error Rate NB, DT 1 :-NB(jm1) PL CD jm1, Accuracy, NB, RF 23 :-PL-NB(jm1)
Table 10 Wilcoxon hypothesis test results.
NB and PL-NB RF and PL-RF p-value Test Statistic Critical Value Sig. Diff. Yes Yes
Conceptualization, A.K.G. and S.K.; methodology, A.K.G. and S.K.; software, A.K.G.; validation, A.K.G.; formal analysis, A.K.G. and S.K.; investigation, A.K.G.; resources, A.M. and S.K.; data curation, A.K.G.; writing—original draft preparation, A.K.G. and S.K.; writing—review and editing, S.K. and A.M.; visualization, A.K.G.; supervision, S.K.; project administration, S.K.; funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.
This research received no external funding.
Not applicable.
Not applicable.
The datasets used in this study are publicly available at
The authors declare no conflict of interest.
Area Under the ROC Curve K-Nearest Neighbor Coupling Between Objects Lack of Cohesion of Methods Concept Drift Lines of Code Cost of Software Quality Linear Regression Chidamber and Kemerer Mean Absolute Error Drift-Detection Method Naive Bayes Depth of Inheritance Tree Number of Children Decision Tree Paired Learner False Negative Paired Learner of Naive Bayes False Positive Paired Learner of Random Forest False-Positive Rate Internet of Things Quality of Service Response For a Class Root Mean Square Error Reactive Learner Stable Learner Receiver Operating Characteristics Software Quality Assurance Random Forest Support Vector Machine True Positive True Negative True Positive Rate United States of America Weighted Method Count
The following symbols are used in this manuscript:
Threshold Value Significance Level k Size of the Window C Miss-classification count Feature Vector Y Class Variable S Stable Learner R Reactive Learner Distribution of data at time 't'
The authors acknowledge Santosh S. Rathore, Assistant Professor, IIIT, Gwalior, for his assistance during concept formulation.
Graph: Figure A1 PL-based Approach in SDP for detecting CD.
By Arvind Kumar Gangwar; Sandeep Kumar and Alok Mishra
Reported by Author; Author; Author