Maskeri, Girish and Sarkar, Santonu and Heafield, Kenneth (2008) Mining Business Topics in Source Code using Latent Dirichlet Allocation. In: ISEC '08 Proceedings of the 1st India software engineering conference. ACM , New York, NY, pp. 113-120. ISBN 978-1-59593-917-3. https://resolver.caltech.edu/CaltechAUTHORS:20161130-151624356
Full text is not posted in this repository. Consult Related URLs below.
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20161130-151624356
Abstract
One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approach based on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics and is a satisfactory starting point for further manual refinement of topics.
Item Type: | Book Section | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| |||||||||
Additional Information: | © 2008 ACM. | |||||||||
Subject Keywords: | Theory, Algorithms, Experimentation, Maintenance, Program comprehension, LDA | |||||||||
Classification Code: | D.2. 7[Sof twar eEngineering] :Distribution ,Mai ntenance, and Enhancement—Restructuring, reverse engineering, and reengineering | |||||||||
DOI: | 10.1145/1342211.1342234 | |||||||||
Record Number: | CaltechAUTHORS:20161130-151624356 | |||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20161130-151624356 | |||||||||
Official Citation: | Girish Maskeri, Santonu Sarkar, and Kenneth Heafield. 2008. Mining business topics in source code using latent dirichlet allocation. In Proceedings of the 1st India software engineering conference (ISEC '08). ACM, New York, NY, USA, 113-120. DOI=http://dx.doi.org/10.1145/1342211.1342234 | |||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||
ID Code: | 72472 | |||||||||
Collection: | CaltechAUTHORS | |||||||||
Deposited By: | INVALID USER | |||||||||
Deposited On: | 30 Nov 2016 23:30 | |||||||||
Last Modified: | 11 Nov 2021 05:02 |
Repository Staff Only: item control page