same: A Code Duplication Detector

Written by Do Te Kien

Supervised by: Professor Terence Parr
University of San Francisco
2005-2006



Introduction

same is a GUI-based tool that detects duplicate code chunks within a set of Java files. In a sense it is the opposite of the UNIX diff tool. In a nutshell, same normalizes Java code and is able to find duplicate code chunks even when the formatting is radically different, when the variable name has changed, and even when constants have changed. For example, the following to code fragments are seen as identical by same. In the second example, same normalizes IDs and constants so that they appear to be the same. The user may specify whether this normalization occurs.

chunk 1chunk 2
int testscore = 76;
char grade;
if (testscore >= 90) {
    grade = 'A';
}
int testscore = 76; char grade;
if (testscore >= 90) {grade = 'A';}

public class Manager {
    public void repeat(){
        for (int j = 0; j < 100; j++) {
            System.out.println("Some thing here....");
        }
    }
}
public class Employer {
    public void sayAgain(){
        for (int i = 0; i < 111; i++) {
            System.out.println("I want...");
        }
    }
}

Here are the slides from the ANTLR2005 workshop: Code duplication detection using ASTs.

Download

Execution

To execute on a Linux box, type this from command line:

java -Xmx512M -jar same.jar

On OS X or Windows, just click the jar.

Screen Shots

Select Java source to check for duplicates Statistics view View of duplicated code chunks across files Which parts of Java file are duplicated?

Acknowledgements

Many thanks to Prof. Terence Parr, Ari Blenkhorn, Jean Bovet, my classmates in CS690, CS684 and ANTLR parser generator community!

Discussion

coming soon...

References