Impact of Refactoring on Quality Metrics in Android Applications - By : Oumayma Hamdi, Ali Ouni,

Impact of Refactoring on Quality Metrics in Android Applications

Oumayma Hamdi
Oumayma Hamdi Author profile
Oumayma Hamdi is a Master’s degree student at ÉTS. Her research focuses on the quality of Android applications.

Ali Ouni
Ali Ouni is a professor in the Department of Software Engineering and IT at ÉTS. His work focuses on software quality, maintenance and evolution, and artificial intelligence application techniques to software engineering.

Purchased on Copyright.


Mobile applications must evolve continuously, and sometimes time pressure is such that poor design or implementation choices are made, inevitably resulting in various software quality problems. Refactoring is the widely accepted approach to fix such quality problems. While the impact of refactoring on software quality has been widely studied in object-oriented software, its impact is still unclear in the context of mobile apps. This research reports on the first empirical study to address this gap. We conducted a large empirical study that analyzed the evolution history of 300 open-source Android apps exhibiting 42,181 refactoring operations. We analyzed the impact of these refactoring operations on 10 common quality metrics using a causal inference method based on the Difference-in-Differences (DiD) model. Our results indicate that when refactoring affects the metrics it generally improves them. However, in many cases refactoring has no significant impact on quality metrics, whereas a cohesion metric (LCOM) deteriorates overall as a result of refactoring. These findings provide practical insights into the current practice of refactoring in the context of Android app development. Keywords: Mobile app, refactoring, quality metrics, Android, empirical study


Android applications go through modifications and improvements to cope with the rapid and evolving user requirements. Such maintenance activities can cause a decrease in quality if improperly conducted [1], [2]. In order to facilitate software evolution, developers need to continuously improve software structure. Refactoring is the most common approach to improve the internal structure of software systems without affecting their external behavior [3], [4], [5].

Mobile apps differ significantly from traditional software systems [6], [7], [8], in having to deal with limitations on specific hardware resources like memory, CPU, display size, etc., as well as the highly dynamic nature of the mobile app market and ever-increasing user requirements. These differences can play an important role in mobile app development and evolution. Indeed, unlike object-oriented software systems [4],[9],[10], the impact of refactoring on quality metrics in mobile apps has received little attention. 

To fill this gap, we conducted an empirical study on a dataset composed of 300 open-source Android apps. We analyzed the impact of 10 refactoring operations on 10 quality metrics.

Study Design

To conduct our study, we designed a controlled experiment for which we selected two groups of code changes. The first group consisted of refactoring-related code changes (i.e. treatment group), and a second consisted of non-refactoring code changes (i.e., control group). Afterwards, we investigated the impact of both groups on quality metrics to allow statistical analyses using the Difference-in-Difference (DiD) model. Figure 1 describes the overall process of our study consisting of six main steps: (1) Android apps selection, (2) refactoring extraction, (3) commit extraction, (4) non-refactoring changes extraction, (5) Quality metrics measurement, and (6) refactoring impact analysis.

Empirical study on android apps

Figure 1. Overall process of our empirical study

Step 1: Android apps selection

We targeted open-source Android apps that are freely distributed in the Google Play—with their versioning history hosted on GitHub—by performing a custom search on GitHub. Overall, we obtained 19,212 apps. Afterwards, we applied filters to obtain 1,923 valid apps. Finally, we randomly selected a set of 300 apps that represents 15% of the final set, exhibiting 42,181 refactoring operations.

Step 2: Refactoring detection

In this step, we collected all the refactoring operations applied to the studied apps. We utilized RefactoringMiner, a command-line based open source tool [11]. Overall, our extraction process identified a list of 10 common refactoring types [3]. 

Step 3: Commit changes extraction

After the extraction of all refactoring operations, we collected the IDs of all refactoring commits as well as the IDs of the commits that immediately precede the refactoring commit. The GitHub API facilitated this process; in particular, we used the git clone command to download the source code of each refactoring commit as well as its immediately preceding commit.

Step 4: Non-refactoring changes extraction

In this step, we extracted a set of commits that contain non-refactoring changes. To do this, we randomly selected a set of non-refactoring commits representing our control group. For each commit, we collected its ID as well as the commits that precede it. Afterwards, we performed the same procedure adopted in Step 3 to collect their source code.

Step 5: Quality metrics measurement

We measured a set of quality metrics for each applied refactoring change as well as non-refactoring change. To calculate the values of these metrics we utilized a widely used open source CK Metrics Suite tool, namely, CK-metrics.

Step 6: Refactoring impact analysis

We calculated the differences between their quality metric values before and after the refactoring change, at the class level. Afterwards, we used two statistical methods (1) of statistical significance with the Wilcoxon rank-sum test [12] as well as the non-parametric effect Cliff’s delta [12], and (2) causal inference [13].

Impact on Quality Metrics

1. Refactoring Changes

  1. Coupling: Refactoring has a significant positive impact on coupling in terms of both the CBO and RFC metrics, while no significant impact was found on the NOSI metric. The most influential refactorings promoting low coupling are “Move Method” and “Extract and Move Method”.
  2. Cohesion: Cohesion quality metrics, LCOM, TCC and LCC, tend to exhibit statistically significant variations with attribute and method-level moving-related refactoring operations. The refactorings that most influence cohesion are “Move Attribute”, “Move Method”, and “Extract and Move Method”. However, LCOM tends to be more volatile under refactoring.
  3. Complexity: Several refactoring types tend to improve complexity by decreasing the WMC metric. The most impactful refactorings are “Extract Super Class”, “Extract Method”, and “Move Method”.
  4. Inheritance: Hierarchy-level refactorings tend to improve the inheritance quality attribute (DIT). The most influential refactorings being “Pull Up Attribute”, “Extract Super Class”, and “Push Down Method”.
  5. Design size: Most refactoring types tend to reduce the design size metrics LOC and VQTY. The most influential refactorings are “Extract Superclass”, “Pull UP/Push Down Method”, and “Move Method”.

2. Non-Refactoring Changes

The different quality metrics did not exhibit any significant change with non-refactoring changes (control group), except for the LOC and VQTY metrics that tend to increase after each commit. Indeed, it was normal that the design size-related metrics increased over time as the project evolved. These results provide more evidence that the metrics changes observed in the experiment data are due to refactoring activities and not to chance.


We presented a study to investigate the impact of refactoring on quality metrics in Android apps. Our results show that several refactoring types lead to a positive impact on quality metrics (“Move Method”, “Extract and Move Method”). Moreover, cohesion (LCOM) stood out as the least consistent metric, improving for some refactoring types and disimproving for others. Finally, for the non-refactoring commits, the metrics exhibit no significant change other than, not surprisingly, the design size metrics.

Oumayma Hamdi

Author's profile

Oumayma Hamdi is a Master’s degree student at ÉTS. Her research focuses on the quality of Android applications.

Program : Software Engineering  Information Technology Engineering 

Author profile

Ali Ouni

Author's profile

Ali Ouni is a professor in the Department of Software Engineering and IT at ÉTS. His work focuses on software quality, maintenance and evolution, and artificial intelligence application techniques to software engineering.

Program : Software Engineering 

Author profile

Get the latest scientific news from ÉTS