Detecting and Tracking Cross-Language Identifiers through History Mining

Zusammenfassung:

Software systems consist of code written in multiple programming languages; the average number of languages used in open source projects is 9 with around 3 GPLs (general purpose languages) and 6 DSLs (domain specific languages).

In many cases, the code written in each of these languages does not stand alone – rather, it is dependent on code written in another language by the use of cross-language referencing mechanisms which are often based on shared string identifiers. The connection between the individual code blocks is usually done by lookups during runtime, which obviously depend on the names matching correctly.

During system maintenance, identifiers might get changed. While there is usually tool support for rename refactorings within one language, keeping identifiers across languages is usually the (manual) responsibility of the developer.

Finding linked identifiers across many languages and frameworks given just the source code is an expensive operation, since each combination of languages and frameworks has its own set of rules and thus must be supported individually. However, another option for finding linked identifiers is performing a change analysis on revisions of the software in a version control system: If identifiers are changed in more than one language from the same pre-identifier to the same post-identifier in the same commit, there is a good chance that there is a semantic significance to these identifiers having the exact same name.

In this work, we attempt to use information on identifiers changed in this manner to alert the user when one of the identifiers is changed in the future without its corresponding identifiers being changed as well. Based on an initial analysis from the version history, we keep track of identifier positions as the code changes and thus can find incomplete renamings both in existing revisions (retrospective analysis) and on new commits (online analysis); the latter can be triggered right within the IDE to alert developers to problems before the code is even submitted.

The usefulness and fitness-for-purpose of our implementation is demonstrated with an empirical investigation on the revision history of a set of open source case studies.

Download:
N/A