- Code Clone Detection and Analysis
- Software Evolution and Maintenance
- Software Analytics Research
- Empirical Software Engineering
- Recommender Systems in Software Engineering
We focus on building tools in our research. We build tools for software engineers to help them build software that is reliable, scalable, sustainable, and cost-effective. Almost all of our tools are open source. In case the source is not available for a tool, interested folks could send us an email and we will give the source at our earliest convenience.
Clone Detection systems:
NiCad: Our widely used NiCad Clone detection system. Try this out. This tool can efficiently detect Type 1, 2 and 3 clones with high precision and recall. It also has different transformation and filtering features that can help detect more Type 3 clones. We also have a file level clone detection system using NiCad, NiCad-File. There is also a system that detects software forks using NiCad variants, called ForkSim.
SimCad: Improved upon NiCad for dealing with scalability that also used Google’s simhashing. The IDE version with clone management features SimEclipse is another good option to explore. We also have a library developed simLib for others to detect clones in their own environment.
CloneWorks: This is the fast, scalable and user-guided version of our clone detection system.
SourcererCC: With our collaborator we developed as yet another scalable clone detection system.
CCAligner: With our collaborator we developed large gapped clone detector. Paper link for now.
SAGA: With our collaborator we developed ultra large scale clone detection with GPU-acceleration. Paper link for now.
Benchmarking Clone detection systems:
BigCloneBench: The largest validated clone benchmark so far. An updated version of the benchmark is available at BigCloneEval page. BigCloneEval makes it easier to use BigCloneBench with tool support. We also have a web-based version evaluation framework for BigCloneBench called BigCloneWe, where one can evaluate and compare a clone detection tool’s results with several state of the art tools online.
Mutation Injection Framework: We a mutation-based framework that can generate thousands of artificial clones and evaluate clone detection tools automatically.
SemanticBench: We also developed a semantic clone benchmark for comparing semantic clone detection tools using code fragments from Stackoverflow.
CloneCognition: Automatic validation of clone detection results for precision in the cloud. The desktop version is available too.
Clone visualization and analysis:
Clone-Swarm: Given a web link of a subject system, Clone-Swarm can detect and visualize clones of that system in the cloud. We received People’s Choice Award on this tool at IWSC 2020! We also have Clone-World for large scale clone visualization and analysis in the cloud. gCad is another tool that helps study the evolution of near-miss clones. We also have VisCad for clone visualization (and old one).
SPCP-Miner: This is a framework with set of tools that help mine important clones for refactoring and tracking.
Recommender Systems in Software Engineering:
CSCC: Simple, Efficient, Context Sensitive Code Completion Support for Better API Usability
SurfClipse: An IDE-based Context-Aware Meta Search Engine [CSMR-WCRE 2014, ICSME 2014]
RACK: An automated API recommendation system for code search using crowd knowledge from Stack Overflow [SANER 2016, EMSE 2018]
SurfExamples: An IDE-based Context-Aware Exception Handling Code Search Engine [SCAM 2014]
BLIZZARD: A context-aware query reformulation approach for improved IR-based bug localization [ESEC/FSE 2018]
NLP2API: An effective query reformulation for code search using crowdsourced knowledge and extra-large data analytics [ICSME 2018]
ACER: An improved query reformulation for concept location using CodeRank and source document structures[ASE 2017]
Parc: Recommending API Methods Parameters.
CoRRecT: An automated code reviewer recommender for pull requests at GitHub, in collaboration with VendAsta Technologies [ICSE 2016]
Femir: Framework Extension Miner and Recommender: A Tool for Recommending Framework Extension Examples.
A few others tools in our lab to support software evolution and maintenance:
LHDiff: A Language Independent Technique for Tracking Source Code Lines.
A few research highlights of our group (Tool demos):
(1) Clone Detection and Management
An overview of our clone detection and management research [by the StarPhoenix]
Clone-World: Studying and fining/fixing buggy clones Clone-World. You can visualize thousands of clones and see what happens to them over the evolution of the software, how they relate to bugs, how to identify them, which ones are the most crucial ones and how to edit/fix them and many more. We extensively apply the Big Data analytics features in this work:
Clone-Swarm: Clone detection, analysis and visualization in the cloud. You just give a web link of your software repository and the tool can show all the clones, visualize them and help find bugs and their evolution.
CloneCognition: Using machine learning to mimic human behaviour to find the right clones for reducing bugs.
BigCloneWE: Web-based evaluation of clone detection tools using BigCloneBench
CloneWorks: Large scale clone detection with minimal hardware
SPCP-Miner: A framework for finding the most important clones for removing and/or tracking risky ones over time.
(2) Recommender Systems in Software Engineering
RACK: The tool searches code examples from within the IDE using crowdsourced knowledge from Stack Overflow.
CORRECT: Finding the right person to fix a bug for a given piece of code with CORRECT
SurfClipse: We propose a context-aware meta search tool, SurfClipse, that analyzes an encountered exception andits context in the IDE, and recommends not only suitable search queries but also relevant web pages for the exception (and its context).
Parc: Recommending API Methods Parameters