SCIENTIFIC NEWS AND
INNOVATION FROM ÉTS
Automatic Summarization for Application Programming Interface - By : Amirhossein Naghshzan, Latifa Guerrouj, Olga Baysal,

Automatic Summarization for Application Programming Interface


Amirhossein Naghshzan
Amirhossein Naghshzan Author profile
Amirhossein Naghshzan is a PhD student in the Department of Software Engineering at ÉTS. His research focuses on natural language processing and machine learning.

Latifa Guerrouj
Latifa Guerrouj Author profile
Latifa Guerrouj is an Associate Professor in the Department of Software Engineering at ÉTS. Her work focuses on software analytics, mining software repositories, and empirical software engineering.

Olga Baysal
Olga Baysal Author profile
Olga Baysal is an Associate Professor at the School of Computer Science, Carleton University. Her work focuses on empirical software engineering, data mining, AI/ML for software engineering.

Developer programming

Purchased on Istockphoto.com. Copyright.

SUMMARY

Automated source code summarization is a task that generates summarized information on code entities (e.g. Classes and methods) in the form of natural language descriptions. In this article, we propose an automatic approach in summarizing Android API methods discussed in Stack Overflow. Our approach takes the API method's name as an input and generates a natural language summary based on Stack Overflow discussions of that method. We conducted a survey involving 16 Android developers to evaluate the quality of our generated summaries and compare them with the official Android documentation. Our results demonstrate that while developers find the official documentation more useful in general, the generated summaries are also competitive and can be used as a complementary source in guiding developers in software development tasks. Keywords: code summarization, unsupervised learning, unofficial documentation, survey, professional developers.

Using Stack Overflow as Additional Documentation

In many cases, developers are unaware of the purpose or usage of a code entity. They must examine a large volume of code and/or documentation to grasp the concept related to a code entity. It is therefore interesting to have an automatic approach that can provide them with summaries on the purpose, implementation, and/or usage of code entities that are part of their tasks.  Consider, for instance, a developer trying to fix a bug that was caused by someone else’s task. To understand the bug and how to reproduce it, developers must first read all related bug reports and review previous discussions. Also, a developer who wants to implement a function for the first time needs to read all its related documentation to become familiar with the method and understand how to use it. Although developers use official documentation as their main source of information on code entities [1], researchers have shown that official documentation sometimes lacks completeness, insight, and conciseness [2][3]. As a result, developers may refer to other sources like Stack Overflow which is a question-and-answer website where programmers ask their questions.

To fill this gap, we propose an automatic approach to summarize APIs by leveraging unofficial documentation and unsupervised learning. In this study, we used Stack Overflow as a type of unofficial documentation for our investigation. In addition, we focused on extractive code summarization, which extracts the most important sentences from documents, i.e. Stack Overflow posts in our research.

Generating and Evaluating Summaries

To carry out our study, we divided it into two major parts, generating summaries and evaluating the summaries. For the first part, we collected Stack Overflow’s Android posts from January 2009 up to April 2020 and ended up with 3,084,143 unique posts. Furthermore, we used TextRank, an unsupervised machine learning algorithm, in generating summaries.

TextRank algorithm

Figure 1. Overview of the TextRank algorithm.

For the second part, we asked 28 Android developers to first evaluate the quality of our generated summaries and second, to compare them with official Android documentation. Sixteen developers agreed to participate in our study. We assigned only three APIs to each participant to prevent confusion and fatigue.

A Useful Tool for Developers

We evaluated a total of 3,084,143 unique Stack Overflow posts and summarized the top 15 most popular Android APIs. Following are the most important findings of our survey:

  • All developers involved in this study (100%) agreed that the length of summaries was appropriate.
  • About half of the participants (58%) believed that the automatically generated summaries were coherent.
  • Most of the participants (73%) found that our summaries included accurate information about Android methods.
  • 59% of participants believed that automatically generated summaries contained important and necessary information about Android methods.
  • When comparing the automatically generated summaries with official Android documentation, there are not many differences between the two: almost the same proportions have been obtained for both, except for a slightly smaller difference regarding implementation versus usage.
  • A rate of 4.1 participants out of 5 agreed it would be helpful to have an integrated plugin to show our automatically generated summaries.
Comparison between generated summaries and official documentation

Figure 2. Which one is better – generated summaries or official documentation?

Quality of generated summaries

Figure 3. Developers’ satisfaction with the quality of generated summaries.

Conclusion

We presented a novel code summarization approach for methods based on unofficial documentation and unsupervised learning. We used Stack Overflow Android posts as our dataset and applied the TextRank algorithm as our main technique for summarization. The generated summaries were evaluated by 16 professional developers. We found that our automatically generated summaries could be useful to developers in software development. Additionally, the produced summaries are almost as useful as official documentation in understanding the usage and implementation of Android methods. Moreover, participants agreed that the generated summaries can be used as a complementary source for official documentation.

Additional Information

For more information on this research, please read the following conference paper:

Naghshzan, A.; Guerrouj, L. and Baysal, O. 2021. “Leveraging Unsupervised Learning to Summarize APIs Discussed in Stack Overflow”. IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM). pp. 142-152.

Amirhossein Naghshzan

Author's profile

Amirhossein Naghshzan is a PhD student in the Department of Software Engineering at ÉTS. His research focuses on natural language processing and machine learning.

Program : Software Engineering 

Research laboratories : LASI – Computer System Architecture Research Laboratory 

Author profile

Latifa Guerrouj

Author's profile

Latifa Guerrouj is an Associate Professor in the Department of Software Engineering at ÉTS. Her work focuses on software analytics, mining software repositories, and empirical software engineering.

Program : Software Engineering 

Research laboratories : LASI – Computer System Architecture Research Laboratory 

Author profile

Olga Baysal

Author's profile

Olga Baysal is an Associate Professor at the School of Computer Science, Carleton University. Her work focuses on empirical software engineering, data mining, AI/ML for software engineering.

Author profile


Get the latest scientific news from ÉTS