same: A Code Duplication Detector
Written by Do Te Kien
Supervised by: Professor Terence Parr
University of San Francisco
2005-2006
Introduction
same is a GUI-based tool that detects duplicate code chunks
within a set of Java files. In a sense it is the opposite of the UNIX
diff tool. In a nutshell, same normalizes Java code
and is able to find duplicate code chunks even when the formatting is
radically different, when the variable name has changed, and even when
constants have changed. For example, the following to code fragments
are seen as identical by same. In the second example,
same normalizes IDs and constants so that they appear to be
the same. The user may specify whether this normalization occurs.
| chunk 1 | chunk 2 |
int testscore = 76;
char grade;
if (testscore >= 90) {
grade = 'A';
}
|
int testscore = 76; char grade;
if (testscore >= 90) {grade = 'A';}
|
|
public class Manager {
public void repeat(){
for (int j = 0; j < 100; j++) {
System.out.println("Some thing here....");
}
}
}
|
public class Employer {
public void sayAgain(){
for (int i = 0; i < 111; i++) {
System.out.println("I want...");
}
}
}
|
Here are the slides from the ANTLR2005 workshop: Code duplication detection using ASTs.
Download
Execution
To execute on a Linux box, type this from command line:
java -Xmx512M -jar same.jar
On OS X or Windows, just click the jar.
Screen Shots
Select Java source to check for duplicates
|
Statistics view
|
View of duplicated code chunks across files
|
Which parts of Java file are duplicated?
|
Acknowledgements
Many thanks to Prof. Terence Parr, Ari Blenkhorn, Jean Bovet, my
classmates in CS690, CS684 and ANTLR
parser generator community!
Discussion
coming soon...
References
- Richard Wettel and Radu Marinescu. Automated Detection of Code Duplication Clusters.
Diploma Thesis, the Politehnica University of Timisoara, June 2004.
-
http://www.antlr.org