Machine Learning Based Bug Handling In Large Scale Software Development


Download Machine Learning Based Bug Handling In Large Scale Software Development PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Machine Learning Based Bug Handling In Large Scale Software Development book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.

Download

Machine Learning-Based Bug Handling in Large-Scale Software Development


Machine Learning-Based Bug Handling in Large-Scale Software Development

Author: Leif Jonsson

language: en

Publisher: Linköping University Electronic Press

Release Date: 2018-05-17


DOWNLOAD





This thesis investigates the possibilities of automating parts of the bug handling process in large-scale software development organizations. The bug handling process is a large part of the mostly manual, and very costly, maintenance of software systems. Automating parts of this time consuming and very laborious process could save large amounts of time and effort wasted on dealing with bug reports. In this thesis we focus on two aspects of the bug handling process, bug assignment and fault localization. Bug assignment is the process of assigning a newly registered bug report to a design team or developer. Fault localization is the process of finding where in a software architecture the fault causing the bug report should be solved. The main reason these tasks are not automated is that they are considered hard to automate, requiring human expertise and creativity. This thesis examines the possi- bility of using machine learning techniques for automating at least parts of these processes. We call these automated techniques Automated Bug Assignment (ABA) and Automatic Fault Localization (AFL), respectively. We treat both of these problems as classification problems. In ABA, the classes are the design teams in the development organization. In AFL, the classes consist of the software components in the software architecture. We focus on a high level fault localization that it is suitable to integrate into the initial support flow of large software development organizations. The thesis consists of six papers that investigate different aspects of the AFL and ABA problems. The first two papers are empirical and exploratory in nature, examining the ABA problem using existing machine learning techniques but introducing ensembles into the ABA context. In the first paper we show that, like in many other contexts, ensembles such as the stacked generalizer (or stacking) improves classification accuracy compared to individual classifiers when evaluated using cross fold validation. The second paper thor- oughly explore many aspects such as training set size, age of bug reports and different types of evaluation of the ABA problem in the context of stacking. The second paper also expands upon the first paper in that the number of industry bug reports, roughly 50,000, from two large-scale industry software development contexts. It is still as far as we are aware, the largest study on real industry data on this topic to this date. The third and sixth papers, are theoretical, improving inference in a now classic machine learning tech- nique for topic modeling called Latent Dirichlet Allocation (LDA). We show that, unlike the currently dominating approximate approaches, we can do parallel inference in the LDA model with a mathematically correct algorithm, without sacrificing efficiency or speed. The approaches are evaluated on standard research datasets, measuring various aspects such as sampling efficiency and execution time. Paper four, also theoretical, then builds upon the LDA model and introduces a novel supervised Bayesian classification model that we call DOLDA. The DOLDA model deals with both textual content and, structured numeric, and nominal inputs in the same model. The approach is evaluated on a new data set extracted from IMDb which have the structure of containing both nominal and textual data. The model is evaluated using two approaches. First, by accuracy, using cross fold validation. Second, by comparing the simplicity of the final model with that of other approaches. In paper five we empirically study the performance, in terms of prediction accuracy, of the DOLDA model applied to the AFL problem. The DOLDA model was designed with the AFL problem in mind, since it has the exact structure of a mix of nominal and numeric inputs in combination with unstructured text. We show that our DOLDA model exhibits many nice properties, among others, interpretability, that the research community has iden- tified as missing in current models for AFL.

Machine Learning-Based Bug Handling in Large-Scale Software Development


Machine Learning-Based Bug Handling in Large-Scale Software Development

Author: Leif Jonsson

language: en

Publisher:

Release Date: 2018


DOWNLOAD





This thesis investigates the possibilities of automating parts of the bug handling process in large-scale software development organizations. The bug handling process is a large part of the mostly manual, and very costly, maintenance of software systems. Automating parts of this time consuming and very laborious process could save large amounts of time and effort wasted on dealing with bug reports. In this thesis we focus on two aspects of the bug handling process, bug assignment and fault localization. Bug assignment is the process of assigning a newly registered bug report to a design team or developer. Fault localization is the process of finding where in a software architecture the fault causing the bug report should be solved. The main reason these tasks are not automated is that they are considered hard to automate, requiring human expertise and creativity. This thesis examines the possi- bility of using machine learning techniques for automating at least parts of these processes. We call these automated techniques Automated Bug Assignment (ABA) and Automatic Fault Localization (AFL), respectively. We treat both of these problems as classification problems. In ABA, the classes are the design teams in the development organization. In AFL, the classes consist of the software components in the software architecture. We focus on a high level fault localization that it is suitable to integrate into the initial support flow of large software development organizations. The thesis consists of six papers that investigate different aspects of the AFL and ABA problems. The first two papers are empirical and exploratory in nature, examining the ABA problem using existing machine learning techniques but introducing ensembles into the ABA context. In the first paper we show that, like in many other contexts, ensembles such as the stacked generalizer (or stacking) improves classification accuracy compared to individual classifiers when evaluated using cross fold validation. The second paper thor- oughly explore many aspects such as training set size, age of bug reports and different types of evaluation of the ABA problem in the context of stacking. The second paper also expands upon the first paper in that the number of industry bug reports, roughly 50,000, from two large-scale industry software development contexts. It is still as far as we are aware, the largest study on real industry data on this topic to this date. The third and sixth papers, are theoretical, improving inference in a now classic machine learning tech- nique for topic modeling called Latent Dirichlet Allocation (LDA). We show that, unlike the currently dominating approximate approaches, we can do parallel inference in the LDA model with a mathematically correct algorithm, without sacrificing efficiency or speed. The approaches are evaluated on standard research datasets, measuring various aspects such as sampling efficiency and execution time. Paper four, also theoretical, then builds upon the LDA model and introduces a novel supervised Bayesian classification model that we call DOLDA. The DOLDA model deals with both textual content and, structured numeric, and nominal inputs in the same model. The approach is evaluated on a new data set extracted from IMDb which have the structure of containing both nominal and textual data. The model is evaluated using two approaches. First, by accuracy, using cross fold validation. Second, by comparing the simplicity of the final model with that of other approaches. In paper five we empirically study the performance, in terms of prediction accuracy, of the DOLDA model applied to the AFL problem. The DOLDA model was designed with the AFL problem in mind, since it has the exact structure of a mix of nominal and numeric inputs in combination with unstructured text. We show that our DOLDA model exhibits many nice properties, among others, interpretability, that the research community has iden- tified as missing in current models for AFL.

System-Level Design of GPU-Based Embedded Systems


System-Level Design of GPU-Based Embedded Systems

Author: Arian Maghazeh

language: en

Publisher: Linköping University Electronic Press

Release Date: 2018-12-07


DOWNLOAD





Modern embedded systems deploy several hardware accelerators, in a heterogeneous manner, to deliver high-performance computing. Among such devices, graphics processing units (GPUs) have earned a prominent position by virtue of their immense computing power. However, a system design that relies on sheer throughput of GPUs is often incapable of satisfying the strict power- and time-related constraints faced by the embedded systems. This thesis presents several system-level software techniques to optimize the design of GPU-based embedded systems under various graphics and non-graphics applications. As compared to the conventional application-level optimizations, the system-wide view of our proposed techniques brings about several advantages: First, it allows for fully incorporating the limitations and requirements of the various system parts in the design process. Second, it can unveil optimization opportunities through exposing the information flow between the processing components. Third, the techniques are generally applicable to a wide range of applications with similar characteristics. In addition, multiple system-level techniques can be combined together or with application-level techniques to further improve the performance. We begin by studying some of the unique attributes of GPU-based embedded systems and discussing several factors that distinguish the design of these systems from that of the conventional high-end GPU-based systems. We then proceed to develop two techniques that address an important challenge in the design of GPU-based embedded systems from different perspectives. The challenge arises from the fact that GPUs require a large amount of workload to be present at runtime in order to deliver a high throughput. However, for some embedded applications, collecting large batches of input data requires an unacceptable waiting time, prompting a trade-off between throughput and latency. We also develop an optimization technique for GPU-based applications to address the memory bottleneck issue by utilizing the GPU L2 cache to shorten data access time. Moreover, in the area of graphics applications, and in particular with a focus on mobile games, we propose a power management scheme to reduce the GPU power consumption by dynamically adjusting the display resolution, while considering the user's visual perception at various resolutions. We also discuss the collective impact of the proposed techniques in tackling the design challenges of emerging complex systems. The proposed techniques are assessed by real-life experimentations on GPU-based hardware platforms, which demonstrate the superior performance of our approaches as compared to the state-of-the-art techniques.