Research Areas

  • Code Clone Detection and Analysis
  • Software Evolution and Maintenance
  • Software Analytics Research
  • Empirical Software Engineering
  • Recommender Systems in Software Engineering
As part of our industrial stream inter-university NSERC CREATE grant we are focusing on  five major areas of software analytics in direct collaboration with industrial partners, (1) Software Systems Quality , (2) Evolutionary Software Design, (3) Technical Debt, (4) Social Software Engineering, and (5)  Trustworthy, Explainable, and Visual Analytics for Software Teams .
We are looking for Post doctoral fellows, PhD and MSc students (and a lot of them) to work on the above topics with industrial partners. Potential applicants are welcome contact Prof. Chanchal Roy at for further details.


We focus on building tools in our research.  We build tools for software engineers to help them build software that is reliable, scalable, sustainable, and cost-effective. Almost all of our tools are open source. In case the source is not available for a tool, interested folks could send us an email and we will give the source at our earliest convenience. 

Clone Detection systems:

NiCad: Our widely used NiCad Clone detection system. Try this out. This tool can efficiently detect Type 1, 2 and 3 clones with high precision and recall. It also has different transformation and filtering features that can help detect more Type 3 clones. We also have a  file level clone detection system using NiCad, NiCad-File. There is also a system that detects software forks using NiCad variants, called ForkSim.

SimCad: Improved upon NiCad for dealing with scalability that also used Google’s simhashing. The IDE version with clone management features SimEclipse is another good option to explore. We also have a library developed simLib for others to detect clones in their own environment. 

CloneWorks: This is the fast, scalable and user-guided version of our clone detection system. 

SourcererCC: With our collaborator we developed as yet another scalable clone detection system. 

CCAligner: With our collaborator we developed large gapped clone detector. Paper link for now. 

SAGA: With our collaborator we developed ultra large scale clone detection with GPU-acceleration. Paper link for now.

Benchmarking Clone detection systems:

BigCloneBench: The largest validated clone benchmark so far. An updated version of the benchmark is available at BigCloneEval page. BigCloneEval makes it easier to use BigCloneBench with tool support. We also have a web-based version evaluation framework for BigCloneBench called BigCloneWe, where one can evaluate and compare a clone detection tool’s results with several state of the art tools online. 

Mutation Injection Framework: We a mutation-based framework that can generate thousands of artificial clones and evaluate clone detection tools automatically.

SemanticBench: We also developed a semantic clone benchmark for comparing semantic clone detection tools using code fragments from Stackoverflow.

CloneCognition: Automatic validation of clone detection results for precision in the cloud. The desktop version is available too.

Clone visualization and analysis:

Clone-Swarm: Given a web link of a subject system, Clone-Swarm can detect and visualize clones of that system in the cloud. We received People’s Choice Award on this tool at IWSC 2020! We also have Clone-World for large scale clone visualization and analysis in the cloud. gCad is another tool that helps study the evolution of near-miss clones. We also have VisCad for clone visualization (and old one).

SPCP-Miner: This is a framework with set of tools that help mine important clones for refactoring and tracking.

Recommender Systems in Software Engineering:

CSCC: Simple, Efficient, Context Sensitive Code Completion Support for Better API Usability

SurfClipse: An IDE-based Context-Aware Meta Search Engine [CSMR-WCRE 2014, ICSME 2014]

RACK: An automated API recommendation system for code search using crowd knowledge from Stack Overflow [SANER 2016, EMSE 2018]

SurfExamples: An IDE-based Context-Aware Exception Handling Code Search Engine [SCAM 2014]

BLIZZARD: A context-aware query reformulation approach for improved IR-based bug localization [ESEC/FSE 2018]

NLP2API: An effective query reformulation for code search using crowdsourced knowledge and extra-large data analytics [ICSME 2018]

ACER: An improved query reformulation for concept location using CodeRank and source document structures[ASE 2017]

Parc: Recommending API Methods Parameters.

CoRRecT: An automated code reviewer recommender for pull requests at GitHub, in collaboration with VendAsta Technologies [ICSE 2016]

Femir: Framework Extension Miner and Recommender: A Tool for Recommending Framework Extension Examples.

A few others tools in our lab to support software evolution and maintenance:

LHDiff: A Language Independent Technique for Tracking Source Code Lines.

CSCC: Simple, Efficient, Context Sensitive Code Completion

Muhammad Asaduzzaman, Chanchal K. Roy, Kevin Schneider and Daqing Hou, “CSCC: Simple, Efficient, Context Sensitive Code Completion“, In Proceedings of the 30th International Conference on Software Maintenance and Evolution (ICSME 2014), 10 pp., Victoria, Canada, September 2014

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools

C.K. Roy and J.R. Cordy, 2009. “A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools” in Proceedings of the ICST 4th International Workshop on Mutation Analysis (Mutation 2009), IEEE Press, Denver, Colorado, USA, April 2009, pp. 157-166.

A few research highlights of our group (Tool demos):

(1) Clone Detection and Management 

An overview of our clone detection and management research [by the StarPhoenix]

Clone-World: Studying and fining/fixing buggy clones Clone-World. You can visualize thousands of clones and see what happens to them over the evolution of the software, how they relate to bugs, how to identify them, which ones are the most crucial ones and how to edit/fix them and many more. We extensively apply the Big Data analytics features in this work:

Clone-Swarm: Clone detection, analysis and visualization in the cloud. You just give a web link of your software repository and the tool can show all the clones, visualize them and help find bugs and their evolution.

CloneCognition: Using machine learning to mimic human behaviour to find the right clones for reducing bugs.

BigCloneWE: Web-based evaluation of clone detection tools using BigCloneBench


CloneWorks: Large scale clone detection with minimal hardware

SPCP-Miner: A framework for finding the most important clones for removing and/or tracking risky ones over time.

(2) Recommender Systems in Software Engineering 

RACK: The tool searches code examples from within the IDE using crowdsourced knowledge from Stack Overflow.

CORRECT: Finding the right person to fix a bug for a given piece of code with CORRECT

SurfClipse: We propose a context-aware meta search tool, SurfClipse, that analyzes an encountered exception andits context in the IDE, and recommends not only suitable search queries but also relevant web pages for the exception (and its context).

Parc: Recommending API Methods Parameters