PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection
In: 2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE IEEE ACM International Conference on Automated Software Engineering, 2023, S. 1899-1910
Online
unknown
Zugriff:
Bug datasets are vital for enabling deep learning techniques to address software maintenance tasks related to bugs. However, existing bug datasets suffer from precise and scale limitations: they are either small-scale but precise with manual validation or large-scale but imprecise with simple commit message processing. In this paper, we introduce PreciseBugCollector, a precise, multi-language bug collection approach that overcomes these two limitations. PreciseBugCollector is based on two novel components: a) A bug tracker to map the codebase repositories with external bug repositories to trace bug type information, and b) A bug injector to generate project-specific bugs by injecting noise into the correct codebases and then executing them against their test suites to obtain test failure messages. We implement PreciseBugCollector against three sources: 1) A bug tracker that links to the national vulnerability data set (NVD) to collect general-wise vulnerabilities, 2) A bug tracker that links to OSS-Fuzz to collect general-wise bugs, and 3) A bug injector based on 16 injection rules to generate project-wise bugs. To date, PreciseBugCollector comprises 1 057 818 bugs extracted from 2 968 open-source projects. Of these, 12 602 bugs are sourced from bug repositories (NVD and OSS-Fuzz), while the remaining 1 045 216 project-specific bugs are generated by the bug injector. Considering the challenge objectives, we argue that a bug injection approach is highly valuable for the industrial setting, since project-specific bugs align with domain knowledge, share the same codebase, and adhere to the coding style employed in industrial projects.
Titel: |
PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection
|
---|---|
Autor/in / Beteiligte Person: | He, Ye ; Chen, Zimin ; Le Goues, Claire |
Link: | |
Zeitschrift: | 2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE IEEE ACM International Conference on Automated Software Engineering, 2023, S. 1899-1910 |
Veröffentlichung: | 2023 |
Medientyp: | unknown |
ISSN: | 1527-1366 (print) |
DOI: | 10.1109/ASE56229.2023.00163 |
Schlagwort: |
|
Sonstiges: |
|