Hapax is an Information Retrival tool to analyze the vocabulary of software systems, ie how classes and methods are related by topic rather than structure. It was written by Adrian Kuhn in 2005 as validation of his Master’s thesis.
Semantic Clustering identifies topics in source code. Based on Latent Semantic Indexing and clustering, it source artifacts that use similar vocabulary. We call these groups semantic clusters and interpret them as linguistic topics that reveal the intention of the code. We compare the concepts to each other, identify links between them, provide automatically retrieved labels, and use a visualization to illustrate how they are distributed over the systemâs structure. Our approach is language independent as it works at the level of identifier names and comments.
Hapax is a software analysis tool, build on top of Moose. Adrian Kuhn developed Hapax as part of his Masterâs thesis, and it implements Semantic Clustering. The name of the tool is derived from the term hapax legomenon, that refers to a word occuring only once a given body of text.
Frequently, we are asked about the difference between topics and concepts, you may find the the following excerpt from the journal paper useful to understand the difference.
When starting this work, one of our hypotheses was that Semantic Clustering will reveal a systems domain semantics. But our experiments disproved this hypothesis: most linguistic topics are applications concepts or architectural components, such as layers. In most case studies, our approach partitioned the system into one (or sometimes two) large domain-specific part and up to a dozen domain-independent parts, such as for example input/output or data storage facilities. Consider for example the application below, Outsight, a webbased job portal application. It is divided into nine parts as follows:
The current research prototype of Hapax is available at the following Store coordinates
Bundle: HapaxDevelopment
interface: PostgresSQLEXDIConnection environment: db.iam.unibe.ch_scgStore user name: storeguest password: storeguest table owner: BERN
Please drop a mail to for questions and feedback.