# Alumni

PhD Students

Zephyrin Soh

Zephyrin Soh graduated in Dec 2015. Previously, he completed his Master program (DEA in French system) in Computer Science at University of Yaoundé 1 (Cameroon) in 2005. He also worked as assistant lecturer at University of Ngaoundéré (Cameroon) in the year of 2008. His research interest is to find how to improve the comprehension and the quality of software systems by trying to use the eye-tracker systems and in general interaction traces (e.g., from Mylin) in an industrial setting. Zephyrin focused on how context can be used to improve comprehension and recommendation.

###### Publications
• R. Morales, Z. Soh, F. Khomh, G. Antoniol, and F. Chicano, “On the use of developers’ context for automatic refactoring of software anti-patterns,” Journal of systems and software, p. -, 2016.
[Bibtex]
@article{Morales2016,
title = "On the use of developers’ context for automatic refactoring of software anti-patterns ",
journal = "Journal of Systems and Software ",
volume = "",
number = "",
pages = " - ",
year = "2016",
note = "",
issn = "0164-1212",
doi = "http://dx.doi.org/10.1016/j.jss.2016.05.042",
url = "http://www.sciencedirect.com/science/article/pii/S0164121216300632",
author = "Rodrigo Morales and Zéphyrin Soh and Foutse Khomh and Giuliano Antoniol and Francisco Chicano",
abstract = "Abstract Anti-patterns are poor solutions to design problems that make software systems hard to understand and extend. Entities involved in anti-patterns are reported to be consistently related to high change and fault rates. Refactorings, which are behavior preserving changes are often performed to remove anti-patterns from software systems. Developers are advised to interleave refactoring activities with their regular coding tasks to remove anti-patterns, and consequently improve software design quality. However, because the number of anti-patterns in a software system can be very large, and their interactions can require a solution in a set of conflicting objectives, the process of manual refactoring can be overwhelming. To automate this process, previous works have modeled anti-patterns refactoring as a batch process where a program provides a solution for the total number of classes in a system, and the developer has to examine a long list of refactorings, which is not feasible in most situations. Moreover, these proposed solutions often require that developers modify classes on which they never worked before (i.e., classes on which they have little or no knowledge). To improve on these limitations, this paper proposes an automated refactoring approach, ReCon (Refactoring approach based on task Context), that leverages information about a developer’s task (i.e., the list of code entities relevant to the developer’s task) and metaheuristics techniques to compute the best sequence of refactorings that affects only entities in the developer’s context. We mine 1705 task contexts (collected using the Eclipse plug-in Mylyn) and 1013 code snapshots from three open-source software projects (Mylyn, PDE, Eclipse Platform) to assess the performance of our proposed approach. Results show that ReCon can remove more than 50% of anti-patterns in a software system, using fewer resources than the traditional approaches from the literature. "
}
• Z. Soh, F. Khomh, Y. Guéhéneuc, G. Antoniol, and B. Adams, “On the effect of program exploration on maintenance tasks,” in Wcre, 2013, pp. 391-400.
[Bibtex]
@inproceedings{conf/wcre/SohKGAA13,
author = {Z{\'e}phyrin Soh and Foutse Khomh and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Bram Adams},
title = {On the effect of program exploration on maintenance tasks},
booktitle = {WCRE},
year = {2013},
pages = {391-400},
ee = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2013.6671314},
crossref = {DBLP:conf/wcre/2013},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• Z. Sharafi, Z. Soh, Y. Guéhéneuc, and G. Antoniol, “Women and men – different but equal: on the impact of identifier style on source code reading,” in Icpc, 2012, pp. 27-36.
[Bibtex]
@inproceedings{conf/iwpc/SharafiSGA12,
author = {Zohreh Sharafi and Z{\'e}phyrin Soh and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Women and men - Different but equal: On the impact of identifier style on source code reading},
booktitle = {ICPC},
year = {2012},
pages = {27-36},
ee = {http://dx.doi.org/10.1109/ICPC.2012.6240505},
crossref = {DBLP:conf/iwpc/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• Z. Soh, Z. Sharafi, B. V. den Plas, G. C. Porras, Y. Guéhéneuc, and G. Antoniol, “Professional status and expertise for uml class diagram comprehension: an empirical study,” in Icpc, 2012, pp. 163-172.
[Bibtex]
@inproceedings{conf/iwpc/SohSPPGA12,
author = {Z{\'e}phyrin Soh and Zohreh Sharafi and Bertrand Van den Plas and Gerardo Cepeda Porras and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Professional status and expertise for UML class diagram comprehension: An empirical study},
booktitle = {ICPC},
year = {2012},
pages = {163-172},
ee = {http://dx.doi.org/10.1109/ICPC.2012.6240484},
crossref = {DBLP:conf/iwpc/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• Z. Soh, F. Khomh, Y. Guéhéneuc, and G. Antoniol, “Towards understanding how developers spend their effort during maintenance activities,” in Wcre, 2013, pp. 152-161.
[Bibtex]
@inproceedings{conf/wcre/SohKGA13,
author = {Z{\'e}phyrin Soh and Foutse Khomh and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Towards understanding how developers spend their effort during maintenance activities},
booktitle = {WCRE},
year = {2013},
pages = {152-161},
ee = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2013.6671290},
crossref = {DBLP:conf/wcre/2013},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Aminata SABANÉ

Aminata SABANÉ successfully defended her PhD in Dec 2015. She is a member of SoccerLab and Ptidej team. She obtained her Bachelor and her Master at École Supérieure d’Informatique (Université Polytechnique de Bobo Dioulasso / Burkina Faso). She contributes in the development of a parser for java source code to PADL models. Her research interests include software quality and evolution and also refactoring. Her topic is Improving System Testability and Testing with Patterns. Her PhD dissertation focused on anti-pattern and cost of testing.

###### Publications
• A. Sabane, Y. G. Guéhéneuc, V. Arnaudova, and G. Antoniol, “Fragile base-class problem, problem?,” Empirical software engineering, p. To Appear, 2016.
[Bibtex]
@article{Aminta2016ense,
author = {Aminata Sabane and
Yann Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and Venera Arnaudova and
Giuliano Antoniol},
title = {Fragile Base-class Problem, Problem?},
journal = {Empirical Software Engineering},
volume = {},
number = {},
pages = {To Appear},
year = {2016},
abstract = { }
}
• R. Morales, A. Sabane, P. Musavi, F. Khomh, F. Chicano, and G. Antoniol, “Finding the best compromise between design quality and testing effort during refactoring,” in Saner, 2016, p. To Appear.
[Bibtex]
@inproceedings{rodrigo2016saner,
author = { Rodrigo Morales and Aminata Sabane and Pooya Musavi and Foutse Khomh and Francisco Chicano and Giulio Antoniol},
title = {Finding the Best Compromise Between Design Quality and Testing Effort During Refactoring},
booktitle = {SANER},
year = {2016},
pages = {To Appear},
abstract = {
Anti-patterns are poor design choices that hinder code evolution, and understandability. Practitioners perform refactoring, that are semantic-preserving-code transformations, to correct anti-patterns and to improve design quality. However, manual refactoring is a consuming task and a heavy burden for developers who have to struggle to complete their coding tasks and maintain the design quality of the system at the same time. For that reason, researchers and practitioners have proposed several approaches to bring automated support to developers, with solutions that ranges from single anti-patterns correction, to multiobjective solutions. The latter attempt to reduce refactoring effort, or to improve semantic similarity between classes and methods in addition to remove anti-patterns. To the best of our knowledge none of the previous approaches have considered the impact of refactoring on another important aspect of software development, which is the testing effort. In this paper we propose a novel search-based multiobjective approach for removing five well-know anti-patterns and minimizing testing effort. To assess the effectiveness of our proposed approach, we implement three different multiobjective metaheuristics (NSGA-II, SPEA2, MOCell) and apply them to a benchmark comprised of four open-source systems. Results show that MOCell is the metaheuristic that provides the best performance.
},
}
• A. Sabane, M. D. Penta, G. Antoniol, and Y. Guéhéneuc, “A study on the relation between antipatterns and the cost of class unit testing,” in Csmr, 2013, pp. 167-176.
[Bibtex]
@inproceedings{06498465,
author = {Aminata Sabane and Massimiliano Di Penta and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {A Study on the Relation between Antipatterns and the Cost of Class Unit Testing},
booktitle = {CSMR},
year = {2013},
pages = {167-176},
ee = {http://dx.doi.org/10.1109/CSMR.2013.26, http://doi.ieeecomputersociety.org/10.1109/CSMR.2013.26},
crossref = {DBLP:conf/csmr/2013},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Antipatterns are known as recurring, poor design choices, recent and past studies indicated that they negatively affect software systems in terms of understand ability and maintainability, also increasing change-and defect-proneness. For this reason, refactoring actions are often suggested. In this paper, we investigate a different side-effect of antipatterns, which is their effect on testability and on testing cost in particular. We consider as (upper bound) indicator of testing cost the number of test cases that satisfy the minimal data member usage matrix (MaDUM) criterion proposed by Bashir and Goel. A study-carried out on four Java programs, Ant 1.8.3, ArgoUML 0.20, Check Style 4.0, and JFreeChart 1.0.13-supports the evidence that, on the one hand, antipatterns unit testing requires, on average, a number of test cases substantially higher than unit testing for non-antipattern classes. On the other hand, antipattern classes must be carefully tested because they are more defect-prone than other classes. Finally, we illustrate how specific refactoring actions-applied to classes participating in antipatterns-could reduce testing cost.
},
pdf = {2013/06498465.pdf},
}
• A. Maiga, N. Ali, N. Bhattacharya, A. Sabane, Y. Guéhéneuc, G. Antoniol, and E. Aïmeur, “Support vector machines for anti-pattern detection,” in Ase, 2012, pp. 278-281.
[Bibtex]
@inproceedings{conf/kbse/MaigaABSGAA12,
author = {Abdou Maiga and Nasir Ali and Neelesh Bhattacharya and Aminata Sabane and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Esma A\"{\i}meur},
title = {Support vector machines for anti-pattern detection},
booktitle = {ASE},
year = {2012},
pages = {278-281},
ee = {http://doi.acm.org/10.1145/2351676.2351723},
crossref = {DBLP:conf/kbse/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• N. Ali, A. Sabane, Y. Guéhéneuc, and G. Antoniol, “Improving bug location using binary class relationships,” in Scam, 2012, pp. 174-183.
[Bibtex]
@inproceedings{conf/scam/AliSGA12,
author = {Nasir Ali and Aminata Sabane and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Improving Bug Location Using Binary Class Relationships},
booktitle = {SCAM},
year = {2012},
pages = {174-183},
ee = {http://doi.ieeecomputersociety.org/10.1109/SCAM.2012.26},
crossref = {DBLP:conf/scam/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Laleh M. Eshkevari

Laleh M. Eshkevari joined the PhD program of Department of Computer Science and Software engineering of École Polytechnique de Montréal in fall 2009 and graduated in Dec 2015. She received her Master degree in Computer Science from Concordia University Montreal, Canada) and a Bachelor of Mathematic Applied in Computer Science, from Amirkabir University of Technology (Tehran, Iran). Her research interest is in the domain of software maintenance and evolution: Change impact analysis, Linguistic refactoring, Source code analysis, Programming languages, and Data mining. Laleh dissertation focused on Java and PHP renamings. She is currently PostDoc fellow at Concordia University,

###### Publications
• L. M. Eshkevari, F. D. Santos, J. R. Cordy, and G. Antoniol, “Are php applications ready for hack,” in International conference on software analysis, evolution, and reengineering (saner), 2015-01-01 2015, pp. 63-72.
[Bibtex]
@inproceedings{laleh2015,
title = {Are PHP applications ready for Hack},
author = {Laleh Mousavi Eshkevari and Fabien Dos Santos and James R. Cordy and Giuliano Antoniol},
year = {2015},
date = {2015-01-01},
booktitle = {International Conference on Software Analysis, Evolution, and Reengineering (SANER)},
abstract = {
PHP is by far the most popular WEB scripting language, accounting
for more than 80\% of existing websites.
PHP is dynamically typed, which means that variables take on the type
of the objects that they are assigned, and may change type as execution proceeds.
While some type changes are likely not harmful, others involving function calls and
global variables may be more difficult to understand and the source of many bugs.
Hack, a new PHP variant endorsed by Facebook, attempts to address this
problem by adding static typing to PHP variables, which limits them to
a single consistent type throughout execution.
This paper defines an empirical taxonomy of PHP type changes along three dimensions:
the complexity or burden imposed to understand the type change;
whether or not the change is potentially harmful;
and the actual types changed.
We apply static and dynamic analyses to three widely used WEB applications coded in
PHP (WordPress, Drupal and phpBB) to investigate (1) to what extent developers really use dynamic typing,
(2) what kinds of type changes are actually encountered; and
(3) how difficult it might be to refactor the code to avoid type changes, and thus meet
the constraints of Hack's static typing.
We report evidence that dynamic typing is actually a relatively uncommon practice
in production PHP programs, and that most dynamic type changes are simple
representational changes, such as between strings and integers.
We observe that most PHP type changes in these programs are relatively simple,
and that the largest proportion of them are easy to refactor to consistent static typing
using simple local renaming transformations.
Overall, the paper casts doubt on the usefulness of dynamic typing in PHP, and
indicates that for many production applications, conversion to Hack's static typing
may not be very difficult.
},
pages = {63-72},
}
• V. Arnaoudova, L. M. Eshkevari, R. Oliveto, Y. Guéhéneuc, and G. Antoniol, “Physical and conceptual identifier dispersion: measures and relation to fault proneness,” in Icsm, 2010, pp. 1-5.
[Bibtex]
@inproceedings{conf/icsm/ArnaoudovaEOGA10,
author = {Venera Arnaoudova and Laleh Mousavi Eshkevari and Rocco Oliveto and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Physical and conceptual identifier dispersion: Measures and relation to fault proneness},
booktitle = {ICSM},
year = {2010},
pages = {1-5},
ee = {http://dx.doi.org/10.1109/ICSM.2010.5609748},
crossref = {DBLP:conf/icsm/2010},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage in Rhino v1.4R3 and ArgoUML v0.16, and we show statistically that methods and attributes containing terms with high entropy and context-coverage are more fault-prone.},
}
• L. M. Eshkevari, G. Antoniol, J. R. Cordy, and M. D. Penta, “Identifying and locating interference issues in php applications: the case of wordpress,” in Icpc, 2014, pp. 157-167.
[Bibtex]
@inproceedings{conf/iwpc/EshkevariACP14,
author = {Laleh Mousavi Eshkevari and Giuliano Antoniol and James R. Cordy and Massimiliano Di Penta},
title = {Identifying and locating interference issues in PHP applications: the case of WordPress},
booktitle = {ICPC},
year = {2014},
pages = {157-167},
ee = {http://doi.acm.org/10.1145/2597008.2597153},
crossref = {DBLP:conf/iwpc/2014},
abstract = {
he large success of Content management Systems (CMS) such as WordPress is largely due to the rich ecosystem of themes and plugins developed around the CMS that allows users to easily build and customize complex Web applications featuring photo galleries, contact forms, and blog pages. However, the design of the CMS, the plugin-based architecture, and the implicit characteristics of the programming language used to develop them (often PHP), can cause interference or unwanted side effects between the resources declared and used by different plugins. This paper describes the problem of interference between plugins in CMS, specifically those developed using PHP, and outlines an approach combining static and dynamic analysis to detect and locate such interference. Results of a case study conducted over 10 WordPress plugins shows that the analysis can help to identify and locate plugin interference, and thus be used to enhance CMS quality assurance
},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• V. Arnaoudova, L. M. Eshkevari, M. D. Penta, R. Oliveto, G. Antoniol, and Y. Guéhéneuc, “Repent: analyzing the nature of identifier renamings,” Ieee trans. software eng., vol. 40, iss. 5, pp. 502-532, 2014.
[Bibtex]
@article{journals/tse/ArnaoudovaEPOAG14,
author = {Venera Arnaoudova and Laleh Mousavi Eshkevari and Massimiliano Di Penta and Rocco Oliveto and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {REPENT: Analyzing the Nature of Identifier Renamings},
journal = {IEEE Trans. Software Eng.},
volume = {40},
number = {5},
year = {2014},
pages = {502-532},
ee = {http://doi.ieeecomputersociety.org/10.1109/TSE.2014.2312942},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• L. M. Eshkevari, V. Arnaoudova, M. D. Penta, R. Oliveto, Y. Guéhéneuc, and G. Antoniol, “An exploratory study of identifier renamings,” in Msr, 2011, pp. 33-42.
[Bibtex]
@inproceedings{conf/msr/EshkevariAPOGA11,
author = {Laleh Mousavi Eshkevari and Venera Arnaoudova and Massimiliano Di Penta and Rocco Oliveto and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {An exploratory study of identifier renamings},
booktitle = {MSR},
year = {2011},
pages = {33-42},
ee = {http://doi.acm.org/10.1145/1985441.1985449},
crossref = {DBLP:conf/msr/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Zohreh Sharafi

Zohreh Sharafi graduated in June 2015. Her work focuses on program understanding and software visualization; she investigated the role of different visualization model, the role of native language and the gender bias. She used eye-trackers to propose a framework to evaluate usability of software architecture visualization techniques.

###### Publications
• N. Ali, Z. Sharafi, Y. G. Guéhéneuc, and G. Antoniol, “An empirical study on the importance of source code entities for requirements traceability,” Empirical software engineering, vol. 20, iss. 2, pp. 442-478, 2015.
[Bibtex]
@article{AliSGA15,
author = {Nasir Ali and
Zohreh Sharafi and
Yann Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and
Giuliano Antoniol},
title = {An empirical study on the importance of source code entities for requirements
traceability},
journal = {Empirical Software Engineering},
volume = {20},
number = {2},
pages = {442--478},
year = {2015},
abstract = {
Requirements Traceability (RT) links help developers during program compre-
hension and maintenance tasks. However, creating RT links is a laborious and resource-
consuming task. Information Retrieval (IR) techniques are useful to automatically create
traceability links. However, IR-based techniques typically have low accuracy (precision,
recall, or both) and thus, creating RT links remains a human intensive process. We conjec-
ture that understanding how developers verify RT links could help improve the accuracy of
IR-based RT techniques to create RT links. Consequently, we perform an empirical study
consisting of four case studies. First, we use an eye-tracking system to capture develop-
ers’ eye movements while they verify RT links. We analyse the obtained data to identify
and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs.
implementation-level source code terms and class names vs. method names. Second, we
perform another eye-tracking case study to confirm that it is the semantic content of the
developers’ preferred types of SCEs and not their locations that attract developers’ atten-
tion and help them in their task to verify RT links. Third, we propose an improved term
weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency
(DPTF/IDF), that uses the knowledge of the developers’ preferred types of SCEs to
give more importance to these SCEs into the term weighting scheme. We integrate this
weighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create
a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka),
we show that the proposed technique statistically improves the accuracy of the recovered
RT links over a technique based on LSI and the usual Term Frequency/Inverse Docu-
ment Frequency (TF/IDF) weighting scheme. Finally, we compare the newly proposed
DPTF/IDF with our original Domain Or Implementation/Inverse Document Frequency
(DOI/IDF) weighting scheme.
}
}
• B. D. Smet, L. Lempereur, Z. Sharafi, Y. Guéhéneuc, G. Antoniol, and N. Habra, “Taupe: visualizing and analyzing eye-tracking data,” Sci. comput. program., vol. 79, pp. 260-278, 2014.
[Bibtex]
@article{1s20S0167642312000135main,
author = {Beno\^{\i}t De Smet and Lorent Lempereur and Zohreh Sharafi and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Naji Habra},
title = {Taupe: Visualizing and analyzing eye-tracking data},
journal = {Sci. Comput. Program.},
volume = {79},
year = {2014},
pages = {260-278},
ee = {http://dx.doi.org/10.1016/j.scico.2012.01.004},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Program comprehension is an essential part of any maintenance activity. It allows developers to build mental models of the program before undertaking any change. It has been studied by the research community for many years with the aim to devise models and tools to understand and ease this activity. Recently, researchers have introduced the use of eye-tracking devices to gather and analyze data about the developers’ cognitive processes during program comprehension. However, eye-tracking devices are not completely reliable and, thus, recorded data sometimes must be processed, filtered, or corrected. Moreover, the analysis software tools packaged with eye-tracking devices are not open-source and do not always provide extension points to seamlessly integrate new sophisticated analyses. Consequently, we develop the Taupe software system to help researchers visualize, analyze, and edit the data recorded by eye-tracking devices. The two main objectives of Taupe are compatibility and extensibility so that researchers can easily: (1) apply the system on any eye-tracking data and (2) extend the system with their own analyses. To meet our objectives, we base the development of Taupe: (1) on well-known good practices, such as design patterns and a plug-in architecture using reflection, (2) on a thorough documentation, validation, and verification process, and (3) on lessons learned from existing analysis software systems. This paper describes the context of development of Taupe, the architectural and design choices made during its development, and its documentation, validation and verification process. It also illustrates the application of Taupe in three experiments on the use of design patterns by developers during program comprehension.
},
pdf = {2014/1s20S0167642312000135main.pdf},
}
• N. Ali, Z. Sharafi, Y. Guéhéneuc, and G. Antoniol, “An empirical study on requirements traceability using eye-tracking,” in Icsm, 2012, pp. 191-200.
[Bibtex]
@inproceedings{06405271,
author = {Nasir Ali and Zohreh Sharafi and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {An empirical study on requirements traceability using eye-tracking},
booktitle = {ICSM},
year = {2012},
pages = {191-200},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICSM.2012.6405271},
crossref = {DBLP:conf/icsm/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
},
pdf = {2012/06405271.pdf},
}
• Z. Sharafi, A. Marchetto, A. Susi, G. Antoniol, and Y. Guéhéneuc, “An empirical study on the efficiency of graphical vs. textual representations in requirements comprehension,” in Icpc, 2013, pp. 33-42.
[Bibtex]
@inproceedings{conf/iwpc/SharafiMSAG13,
author = {Zohreh Sharafi and Alessandro Marchetto and Angelo Susi and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {An empirical study on the efficiency of graphical vs. textual representations in requirements comprehension},
booktitle = {ICPC},
year = {2013},
pages = {33-42},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICPC.2013.6613831},
crossref = {DBLP:conf/iwpc/2013},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• Z. Sharafi, Z. Soh, Y. Guéhéneuc, and G. Antoniol, “Women and men – different but equal: on the impact of identifier style on source code reading,” in Icpc, 2012, pp. 27-36.
[Bibtex]
@inproceedings{conf/iwpc/SharafiSGA12,
author = {Zohreh Sharafi and Z{\'e}phyrin Soh and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Women and men - Different but equal: On the impact of identifier style on source code reading},
booktitle = {ICPC},
year = {2012},
pages = {27-36},
ee = {http://dx.doi.org/10.1109/ICPC.2012.6240505},
crossref = {DBLP:conf/iwpc/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• Z. Soh, Z. Sharafi, B. V. den Plas, G. C. Porras, Y. Guéhéneuc, and G. Antoniol, “Professional status and expertise for uml class diagram comprehension: an empirical study,” in Icpc, 2012, pp. 163-172.
[Bibtex]
@inproceedings{conf/iwpc/SohSPPGA12,
author = {Z{\'e}phyrin Soh and Zohreh Sharafi and Bertrand Van den Plas and Gerardo Cepeda Porras and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Professional status and expertise for UML class diagram comprehension: An empirical study},
booktitle = {ICPC},
year = {2012},
pages = {163-172},
ee = {http://dx.doi.org/10.1109/ICPC.2012.6240484},
crossref = {DBLP:conf/iwpc/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Wei Wu

Wei Wu successfully defended in winter 2014. He graduated in Software Engineering at École Polytechnique de Montréal under the supervision of Prof. Yann-Gaël Guéhéneuc and Prof. Giuliano Antoniol. Previously, he received his M.S. in Computer Science at Université de Montréal under the supervision of Prof. Yann-Gaël Guéhéneuc in 2010. From 1997 to 2006, he worked in TravelSky Technology Ltd. as network engineering, software developer and project manager. He received his M.S. and B.S. in E&E at Beijing University of Aeronautics and Astronautics in 1997 and 1994 respectively.

###### Publications
• W. Wu, F. Khomh, B. Adams, Y. G. Guéhéneuc, and G. Antoniol, “An exploratory study of api changes and usages based on apache and eclipse ecosystems,” Empirical software engineering, pp. 1-47, 2015.
[Bibtex]
@article{Wei2015emse,
author = {Wei Wu and
Foutse Khomh and Bram Adams and
Yann Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and
Giuliano Antoniol},
title = {An exploratory study of api changes and usages based on apache and eclipse ecosystems},
journal = {Empirical Software Engineering},
volume = {},
number = {},
pages = {1-47},
year = {2015},
abstract = { Frameworks are widely used in modern software development to reduce
development costs. They are accessed through their Application
Programming Interfaces (APIs), which specify the contracts with client
programs. When frameworks evolve, API backward-compatibility cannot
always be guaranteed and client programs must upgrade to use the new
releases. Because framework upgrades are not cost-free, observing API
changes and usages together at fine-grained levels is necessary to help
developers understand, assess, and forecast the cost of each framework
upgrade. Whereas previous work studied API changes in frameworks and API
usages in client programs separately, we analyse and classify API
changes and usages together in 22 framework releases from the Apache and
Eclipse ecosystems and their client programs. We find that (1) missing
classes and methods happen more often in frameworks and affect client
programs more often than the other API change types do, (2) missing
interfaces occur rarely in frameworks but affect client programs often,
(3) framework APIs are used on average in 35\% of client classes and
interfaces, (4) most of such usages could be encapsulated locally and
reduced in number, and (5) about 11\% of APIs usages could cause ripple
effects in client programs when these APIs change. Based on these
findings, we provide suggestions for developers and researchers to
reduce the impact of API evolution through language mechanisms and
design strategies. }
}
• W. Wu, A. Serveaux, Y. G. Guéhéneuc, and G. Antoniol, “The impact of imperfect change rules on framework api evolution identification: an empirical study,” Empirical software engineering, vol. 20, pp. 1126-1158, 2014.
[Bibtex]
@article{Wei2014emse,
author = {Wei Wu and
Yann Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and
Giuliano Antoniol},
title = {The Impact of Imperfect Change Rules on Framework API Evolution Identification: An Empirical Study},
journal = {Empirical Software Engineering},
volume = {20},
number = {},
pages = {1126--1158},
year = {2014},
abstract = { Softwareframeworkskeepevolving.Itisoftentime-consumingfordevelopersto
keep their client code up-to-date. Not all frameworks have documentation
ease the impact of non-documented framework evolution on developers by
identifying change rules between two releases of a framework, but these
change rules are imperfect, i.e., not 100 \% correct. To the best of our
knowledge, there is no empirical study to show the usefulness of these
imperfect change rules. Therefore, we design and conduct an experiment
to evaluate their impact. In the experiment, the subjects must find the
replacements of 21 missing methods in the new releases of three
open-source frameworks with the help of (1) all-correct, (2) imperfect,
and (3) no change rules. The statistical analysis results show that the
precision of the replace- ments found by the subjects with the three
sets of change rules are significantly different. The precision with
all-correct change rules is the highest while that with no change rules
is the lowest, while imperfect change rules give a precision in
between. The effect size of the difference between the subjects with no
and imperfect change rules is large and that between the subjects with
imperfect and correct change rules is moderate. The results of this
study show that the change rules generated by framework API evolution
approaches do help developers, even they are not always correct. The
imperfect change rules can be used by developers upgrading their code
when documentation is not available or as a complement to partial
documentation. The moderate difference between results from subjects
with imper- fect and all-correct change rules also suggests that
improving precision of change rules will still help developers
}
}
• W. Wu, Y. Guéhéneuc, G. Antoniol, and M. Kim, “Aura: a hybrid approach to identify framework evolution,” in Icse (1), 2010, pp. 325-334.
[Bibtex]
@inproceedings{p325-wu,
author = {Wei Wu and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Miryung Kim},
title = {AURA: a hybrid approach to identify framework evolution},
booktitle = {ICSE (1)},
year = {2010},
pages = {325-334},
ee = {http://doi.acm.org/10.1145/1806799.1806848},
crossref = {DBLP:conf/icse/2010-1},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2010/p325-wu.pdf},
abstract = {Software frameworks and libraries are indispensable to to- day’s software systems. As they evolve, it is often time- consuming for developers to keep their code up-to-date, so approaches have been proposed to facilitate this. Usually, these approaches cannot automatically identify change rules for one-replaced-by-many and many-replaced-by-one meth- ods, and they trade off recall for higher precision using one or more experimentally-evaluated thresholds. We introduce AURA, a novel hybrid approach that combines call depen- dency and text similarity analyses to overcome these limita- tions. We implement it in a Java system and compare it on five frameworks with three previous approaches by Dagenais and Robillard, M. Kim et al., and Sch ̈fer et al. The compar- a ison shows that, on average, the recall of AURA is 53.07 \% higher while its precision is similar, e.g., 0.10 \% lower.},
}
• N. Ali, W. Wu, G. Antoniol, M. D. Penta, Y. Guéhéneuc, and J. H. Hayes, “Moms: multi-objective miniaturization of software,” in Icsm, 2011, pp. 153-162.
[Bibtex]
@inproceedings{conf/icsm/AliWAPGH11,
author = {Nasir Ali and Wei Wu and Giuliano Antoniol and Massimiliano Di Penta and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Jane Huffman Hayes},
title = {MoMS: Multi-objective miniaturization of software},
booktitle = {ICSM},
year = {2011},
pages = {153-162},
ee = {http://dx.doi.org/10.1109/ICSM.2011.6080782},
crossref = {DBLP:conf/icsm/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Soumaya Medini

Soumaya Medini successfully defended her Ph.D. mid fall 2014. She holds a Master in Software Engineering from Sherbrooke University; and also holds National Engineering Degree from Institut National des Sciences Appliquées et de Technologie in Tunisia. Soumaya research interests are: Software Maintenance and Evolution, Search Based Software Engineering, and Information Retrieval. Her main research topic has been Concepts Identification in Execution Traces.

###### Publications
• S. Medini, V. Arnaoudova, M. D. Penta, G. Antoniol, Y. Guéhéneuc, and P. Tonella, “Scan: an approach to label and relate execution trace segments,” Journal of software: evolution and process (jsep), vol. 26, iss. 11, pp. 962-995, 2014.
[Bibtex]
@article{SCAN-14,
title = {SCAN: An Approach to Label and Relate Execution Trace Segments},
author = {Soumaya Medini and Venera Arnaoudova and Massimiliano Di Penta and Giuliano Antoniol and Yann-Gaël Guéhéneuc and Paolo Tonella},
year = {2014},
date = {2014-01-01},
journal = {Journal of Software: Evolution and Process (JSEP)},
volume = {26},
number = {11},
pages = {962--995},
abstract = {Program comprehension is a prerequisite to any maintenance and evolution task. In particular, when performing feature location, developers perform program comprehension by abstracting software features and identifying the links between high-level abstractions (features) and program elements.
We present Segment Concept AssigNer (SCAN), an approach to support developers in feature location. SCAN uses a search-based approach to split execution traces into cohesive segments. Then, it labels the segments with relevant keywords and, finally, uses formal concept analysis to identify relations among segments. In a first study, we evaluate the performances of SCAN on six Java programs by 31 participants. We report an average precision of 69\% and a recall of 63\% when comparing the manual and automatic labels and a precision of 63\% regarding the relations among segments identified by SCAN. After that, we evaluate the usefulness of SCAN for the purpose of feature location on two Java programs. We provide evidence that SCAN (i) identifies 69\% of the gold set methods and (ii) is effective in reducing the quantity of information that developers must process to locate features—reducing the number of methods to understand by an average of 43\% compared to the entire execution traces.}
}
• S. Medini, P. Galinier, M. D. Penta, Y. Guéhéneuc, and G. Antoniol, “A fast algorithm to locate concepts in execution traces,” in Ssbse, 2011, pp. 252-266.
[Bibtex]
@inproceedings{chp3A1010072F978364223716422,
author = {Soumaya Medini and Philippe Galinier and Massimiliano Di Penta and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {A Fast Algorithm to Locate Concepts in Execution Traces},
booktitle = {SSBSE},
year = {2011},
pages = {252-266},
ee = {http://dx.doi.org/10.1007/978-3-642-23716-4_22},
crossref = {DBLP:conf/ssbse/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
The identification of cohesive segments in execution traces is an important step in concept location which, in turns, is of paramount importance for many program-comprehension activities. In this paper, we reformulate concept location as a trace segmentation problem solved via dynamic programming. Differently to approaches based on genetic algorithms, dynamic programming can compute an exact solution with better performance than previous approaches, even on long traces. We describe the new problem formulation and the algorithmic details of our approach. We then compare the performances of dynamic programming with those of a genetic algorithm, showing that dynamic programming reduces dramatically the time required to segment traces, without sacrificing precision and recall; even slightly improving them.
},
pdf = {2011/chp3A1010072F978364223716422.pdf},
}
• S. Medini, G. Antoniol, Y. Guéhéneuc, M. D. Penta, and P. Tonella, “Scan: an approach to label and relate execution trace segments,” in Wcre, 2012, pp. 135-144.
[Bibtex]
@inproceedings{06385109,
author = {Soumaya Medini and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Massimiliano Di Penta and Paolo Tonella},
title = {SCAN: An Approach to Label and Relate Execution Trace Segments},
booktitle = {WCRE},
year = {2012},
pages = {135-144},
ee = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2012.23},
crossref = {DBLP:conf/wcre/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2012/06385109.pdf},
abstract = {Identifying concepts in execution traces is a task often necessary to support program comprehension or maintenance activities. Several approaches---static, dynamic or hybrid---have been proposed to identify cohesive, meaningful sequence of methods in execution traces. However, none of the proposed approaches is able to label such segments and to identify relations identified in other segments of the same trace This paper present SCAN (Segment Concept AssigNer) an approach to assign labels to sequences of methods in execution traces, and to identify relations between such segments. SCAN uses information retrieval methods and formal concept analysis to produce sets of words helping the developer to understand the concept implemented by a segment. Specifically, formal concept analysis allows SCAN to discover commonalities between segments in different trace areas, as well as terms more specific to a given segment and higher level relation between segments. The paper describes SCAN along with a preliminary manual validation---upon execution traces collected from usage scenarios of JHotDraw and ArgoUML---of SCAN accuracy in assigning labels representative of concepts implemented by trace segments.},
}

Venera Arnaoudova

Venera successfully defended her Ph.D. mid of August 2014. Before she received her engineering degree in Computer Science, Microelectronics and Automation from Polytech Lille (Lille, France). She received her Master Degree in Computer Science form Concordia University (Montreal, Canada). Her research interest is in the domain of software evolution: source code analysis, change impact analysis, refactoring, patterns and anti-patterns. Currently she is investigating different linguistic aspects of source code identifiers.

###### Publications
• S. Medini, V. Arnaoudova, M. D. Penta, G. Antoniol, Y. Guéhéneuc, and P. Tonella, “Scan: an approach to label and relate execution trace segments,” Journal of software: evolution and process (jsep), vol. 26, iss. 11, pp. 962-995, 2014.
[Bibtex]
@article{SCAN-14,
title = {SCAN: An Approach to Label and Relate Execution Trace Segments},
author = {Soumaya Medini and Venera Arnaoudova and Massimiliano Di Penta and Giuliano Antoniol and Yann-Gaël Guéhéneuc and Paolo Tonella},
year = {2014},
date = {2014-01-01},
journal = {Journal of Software: Evolution and Process (JSEP)},
volume = {26},
number = {11},
pages = {962--995},
abstract = {Program comprehension is a prerequisite to any maintenance and evolution task. In particular, when performing feature location, developers perform program comprehension by abstracting software features and identifying the links between high-level abstractions (features) and program elements.
We present Segment Concept AssigNer (SCAN), an approach to support developers in feature location. SCAN uses a search-based approach to split execution traces into cohesive segments. Then, it labels the segments with relevant keywords and, finally, uses formal concept analysis to identify relations among segments. In a first study, we evaluate the performances of SCAN on six Java programs by 31 participants. We report an average precision of 69\% and a recall of 63\% when comparing the manual and automatic labels and a precision of 63\% regarding the relations among segments identified by SCAN. After that, we evaluate the usefulness of SCAN for the purpose of feature location on two Java programs. We provide evidence that SCAN (i) identifies 69\% of the gold set methods and (ii) is effective in reducing the quantity of information that developers must process to locate features—reducing the number of methods to understand by an average of 43\% compared to the entire execution traces.}
}
• V. Arnaoudova, M. D. Penta, G. Antoniol, and Y. Guéhéneuc, “A new family of software anti-patterns: linguistic anti-patterns,” in Csmr, 2013, pp. 187-196.
[Bibtex]
@inproceedings{06498467,
author = {Venera Arnaoudova and Massimiliano Di Penta and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {A New Family of Software Anti-patterns: Linguistic Anti-patterns},
booktitle = {CSMR},
year = {2013},
pages = {187-196},
ee = {http://dx.doi.org/10.1109/CSMR.2013.28, http://doi.ieeecomputersociety.org/10.1109/CSMR.2013.28},
crossref = {DBLP:conf/csmr/2013},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Recent and past studies have shown that poor source code lexicon negatively affects software understand ability, maintainability, and, overall, quality. Besides a poor usage of lexicon and documentation, sometimes a software artifact description is misleading with respect to its implementation. Consequently, developers will spend more time and effort when understanding these software artifacts, or even make wrong assumptions when they use them. This paper introduces the definition of software linguistic antipatterns, and defines a family of them, i.e., those related to inconsistencies (i) between method signatures, documentation, and behavior and (ii) between attribute names, types, and comments. Whereas "design" antipatterns represent recurring, poor design choices, linguistic antipatterns represent recurring, poor naming and commenting choices. The paper provides a first catalogue of one family of linguistic antipatterns, showing real examples of such antipatterns and explaining what kind of misunderstanding they can cause. Also, the paper proposes a detector prototype for Java programs called LAPD (Linguistic Anti-Pattern Detector), and reports a study investigating the presence of linguistic antipatterns in four Java software projects.
},
pdf = {2013/06498467.pdf},
}
• S. L. Abebe, V. Arnaoudova, P. Tonella, G. Antoniol, and Y. Guéhéneuc, “Can lexicon bad smells improve fault prediction?,” in Wcre, 2012, pp. 235-244.
[Bibtex]
@inproceedings{06385119,
author = {Surafel Lemma Abebe and Venera Arnaoudova and Paolo Tonella and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {Can Lexicon Bad Smells Improve Fault Prediction?},
booktitle = {WCRE},
year = {2012},
pages = {235-244},
ee = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2012.33},
crossref = {DBLP:conf/wcre/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2012/06385119.pdf},
abstract = {In software development, early identification of fault-prone classes can save a considerable amount of resources. In the literature, source code structural metrics have been widely investigated as one of the factors that can be used to identify faulty classes. Structural metrics measure code complexity, one aspect of the source code quality. Complexity might affect program understanding and hence increase the likelihood of inserting errors in a class. Besides the structural metrics, we believe that the quality of the identifiers used in the code may also affect program understanding and thus increase the likelihood of error insertion. In this study, we measure the quality of identifiers using the number of Lexicon Bad Smells (LBS) they contain. We investigate whether using LBS in addition to structural metrics improves fault prediction. To conduct the investigation, we asses s the prediction capability of a model while using i) only structural metrics, and ii) structural metrics and LBS. The results on three open source systems, ArgoUML, Rhino, and Eclipse, indicate that there is an improvement in the majority of the cases.},
}
• V. Arnaoudova, M. D. Penta, and G. Antoniol, “Linguistic antipatterns: what they are and how developers perceive them,” Empirical software engineering (emse), pp. 104-158, 2015.
[Bibtex]
@article{LAsPerception-15,
title = {Linguistic Antipatterns: What They are and How Developers Perceive Them},
author = {Venera Arnaoudova and Massimiliano Di Penta and Giuliano Antoniol},
year = {2015},
date = {2015-01-29},
journal = {Empirical Software Engineering (EMSE)},
pages = {104-158},
abstract = {Antipatterns are known as poor solutions to recurring problems. For example, Brown et al. and Fowler define practices concerning poor design or implementation solutions. However, we know that the source code lexicon is part of the factors that affect the psychological complexity of a program, i.e., factors that make a program difficult to understand and maintain by humans. The aim of this work is to identify recurring poor practices related to inconsistencies among the naming, documentation, and implementation of an entity—called Linguistic Antipatterns (LAs)—that may impair program understanding. To this end, we first mine examples of such inconsistencies in real open-source projects and abstract them into a catalog of 17 recurring LAs related to methods and attributes1. Then, to understand the relevancy of LAs, we perform two empirical studies with developers—30 external (i.e., not familiar with the code) and 14 internal (i.e., people developing or maintaining the code). Results indicate that the majority of the participants perceive LAs as poor practices and therefore must be avoided—69\% and 51\% of the external and internal developers, respectively. As further evidence of LAs’ validity, open source developers that were made aware of LAs reacted to the issue by making code changes in 10\% of the cases. Finally, in order to facilitate the use of LAs in practice, we identified a sub-set of LAs which were universally agreed upon as being problematic; those which had a clear dissonance between code behavior and lexicon.},
keywords = {developers' perception, empirical study, linguistic antipatterns, source code identifiers},
}
• S. Panichella, V. Arnaoudova, M. D. Penta, and G. Antoniol, “Would static analysis tools help developers with code reviews?,” in International conference on software analysis, evolution, and reengineering (saner), 2015-01-01 2015, pp. 161-170.
[Bibtex]
@inproceedings{Panichella:saner15:CodeReviewsWarnings,
title = {Would Static Analysis Tools Help Developers with Code Reviews?},
author = {Sebastiano Panichella and Venera Arnaoudova and Massimiliano Di Penta and Giuliano Antoniol},
year = {2015},
date = {2015-01-01},
booktitle = {International Conference on Software Analysis, Evolution, and Reengineering (SANER)},
abstract = {
Code reviews have been conducted since decades in
software projects, with the aim of improving code quality from
many different points of view. During code reviews, developers
are supported by checklists, coding standards and, possibly, by
various kinds of static analysis tools. This paper investigates
whether warnings highlighted by static analysis tools are taken
care of during code reviews and, whether there are kinds of
warnings that tend to be removed more than others. Results
of a study conducted by mining the Gerrit repository of six
Java open source projects indicate that the density of warnings
only slightly vary after each review. The overall percentage
of warnings removed during reviews is slightly higher than
what previous studies found for the overall project evolution
history. However, when looking (quantitatively and qualitatively)
at specific categories of warnings, we found that during code
reviews developers focus on certain kinds of problems. For such
categories of warnings the removal percentage tend to be very
high, often above 50\% and sometimes up to 100\%. Examples
of those are warnings in the imports, regular expressions, and type resolution
categories. In conclusion, while a broad warning
detection might produce way too many false positives, enforcing
the removal of certain warnings prior to the patch submission
could reduce the amount of effort provided during the code review
process.
},
pages = {161-170},
}
• V. Arnaoudova, L. M. Eshkevari, R. Oliveto, Y. Guéhéneuc, and G. Antoniol, “Physical and conceptual identifier dispersion: measures and relation to fault proneness,” in Icsm, 2010, pp. 1-5.
[Bibtex]
@inproceedings{conf/icsm/ArnaoudovaEOGA10,
author = {Venera Arnaoudova and Laleh Mousavi Eshkevari and Rocco Oliveto and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Physical and conceptual identifier dispersion: Measures and relation to fault proneness},
booktitle = {ICSM},
year = {2010},
pages = {1-5},
ee = {http://dx.doi.org/10.1109/ICSM.2010.5609748},
crossref = {DBLP:conf/icsm/2010},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage in Rhino v1.4R3 and ArgoUML v0.16, and we show statistically that methods and attributes containing terms with high entropy and context-coverage are more fault-prone.},
}
• V. Arnaoudova, L. M. Eshkevari, M. D. Penta, R. Oliveto, G. Antoniol, and Y. Guéhéneuc, “Repent: analyzing the nature of identifier renamings,” Ieee trans. software eng., vol. 40, iss. 5, pp. 502-532, 2014.
[Bibtex]
@article{journals/tse/ArnaoudovaEPOAG14,
author = {Venera Arnaoudova and Laleh Mousavi Eshkevari and Massimiliano Di Penta and Rocco Oliveto and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {REPENT: Analyzing the Nature of Identifier Renamings},
journal = {IEEE Trans. Software Eng.},
volume = {40},
number = {5},
year = {2014},
pages = {502-532},
ee = {http://doi.ieeecomputersociety.org/10.1109/TSE.2014.2312942},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• L. M. Eshkevari, V. Arnaoudova, M. D. Penta, R. Oliveto, Y. Guéhéneuc, and G. Antoniol, “An exploratory study of identifier renamings,” in Msr, 2011, pp. 33-42.
[Bibtex]
@inproceedings{conf/msr/EshkevariAPOGA11,
author = {Laleh Mousavi Eshkevari and Venera Arnaoudova and Massimiliano Di Penta and Rocco Oliveto and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {An exploratory study of identifier renamings},
booktitle = {MSR},
year = {2011},
pages = {33-42},
ee = {http://doi.acm.org/10.1145/1985441.1985449},
crossref = {DBLP:conf/msr/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Latifa Guerrouj

Latifa Guerrouj successfully defended her Ph.D. mid of August 2013. She received her engineering degree with honours in software engineering in 2008 and began her PhD program in 2009 under supervision of Drs. Giuliano Antoniol and Yann-Gaël Guéhéneuc. Her research areas are program comprehension and software quality, in particular through the development of theories, approaches, and tools that ease program understanding and enhance the quality of source code. Her first contribution was a contextual approach that tackles the problem of splitting identifiers. She is currently studying the impact of using sophisticated splitting algorithms in the context of feature location and traceability. This latter research work is the first to combine identifier splitting approaches with feature location and traceability techniques. In addition, Latifa Guerrouj is investigating the effect of context on the splitting and expansion of program identifiers by means of experimental studies. Latifa Guerrouj is also interested in data mining, empirical software engineering, and search-based software engineering.

###### Publications
• L. Guerrouj, Z. Kermansaravi, V. Arnaoudouva, B. Fung, F. Khomh, G. Antoniol, and Y. Gueheneuc, “An empirical study on the impact of lexical smells on change- and fault- proneness,” Software quality journal, p. To Appear, 2016.
[Bibtex]
@article{latifa2016,
author = {Latifa Guerrouj and Zeinab Kermansaravi and Venera Arnaoudouva and Benjamin Fung and Foutse Khomh and Giuliano Antoniol and Yann-Gael Gueheneuc},
title = {An Empirical Study on the Impact of Lexical Smells on Change- and Fault- Proneness},
journal = {Software Quality Journal},
year = {2016},
pages = {To Appear},
abstract = {
Anti-patterns are poor design choices that hinder code evolution, and understandability. Practitioners perform refactoring, that are semantic-preserving-code transformations, to correct anti-patterns and to improve design quality. However, manual refactoring is a consuming task and a heavy burden for developers who have to struggle to complete their coding tasks and maintain the design quality of the system at the same time. For that reason, researchers and practitioners have proposed several approaches to bring automated support to developers, with solutions that ranges from single anti-patterns correction, to multiobjective solutions. The latter attempt to reduce refactoring effort, or to improve semantic similarity between classes and methods in addition to remove anti-patterns. To the best of our knowledge none of the previous approaches have considered the impact of refactoring on another important aspect of software development, which is the testing effort. In this paper we propose a novel search-based multiobjective approach for removing five well-know anti-patterns and minimizing testing effort. To assess the effectiveness of our proposed approach, we implement three different multiobjective metaheuristics (NSGA-II, SPEA2, MOCell) and apply them to a benchmark comprised of four open-source systems. Results show that MOCell is the metaheuristic that provides the best performance.
}
}
• B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol, “Can better identifier splitting techniques help feature location?,” in Icpc, 2011, pp. 11-20.
[Bibtex]
@inproceedings{05970159,
author = {Bogdan Dit and Latifa Guerrouj and Denys Poshyvanyk and Giuliano Antoniol},
title = {Can Better Identifier Splitting Techniques Help Feature Location?},
booktitle = {ICPC},
year = {2011},
pages = {11-20},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICPC.2011.47},
crossref = {DBLP:conf/iwpc/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2011/05970159.pdf},
abstract = {The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: CamelCase, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected.},
}
• L. Guerrouj, M. D. Penta, Y. G. Guéhéneuc, and G. Antoniol, “An experimental investigation on the effects of context on source code identifiers splitting and expansion,” Empirical software engineering, vol. 19, iss. 6, pp. 1706-1753, 2014.
[Bibtex]
@article{journals/ese/GuerroujPGA14,
author = {Latifa Guerrouj and Massimiliano Di Penta and Yann Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and Giuliano Antoniol},
title = {An experimental investigation on the effects of context on source code identifiers splitting and expansion},
journal = {Empirical Software Engineering},
volume = {19},
number = {6},
pages = {1706--1753},
year = {2014},
url = {http://dx.doi.org/10.1007/s10664-013-9260-1},
abstract = {
Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers’ productivity.
},
doi = {10.1007/s10664-013-9260-1},
}
• L. Guerrouj, P. Galinier, Y. Guéhéneuc, G. Antoniol, and M. D. Penta, “Tris: a fast and accurate identifiers splitting and expansion algorithm,” in Wcre, 2012, pp. 103-112.
[Bibtex]
@inproceedings{conf/wcre/GuerroujGGAP12,
author = {Latifa Guerrouj and Philippe Galinier and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Massimiliano Di Penta},
title = {TRIS: A Fast and Accurate Identifiers Splitting and Expansion Algorithm},
booktitle = {WCRE},
year = {2012},
pages = {103-112},
ee = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2012.20},
crossref = {DBLP:conf/wcre/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {Understanding source code identifiers, by identifying words composing them, is a necessary step for many program comprehension, reverse engineering, or redocumentation tasks. To this aim, researchers have proposed several identifier splitting and expansion approaches such as Samurai, TIDIER and more recently GenTest. The ultimate goal of such approaches is to help disambiguating conceptual information encoded in compound (or abbreviated) identifiers. This paper presents TRIS, TRee-based Identifier Splitter, a two-phases approach to split and expand program identifiers. TRIS takes as input a dictionary of words, the identifiers to split/expand, and the identifiers source code application. First, TRIS pre-compiles transformed dictionary words into a tree representation, associating a cost to each transformation. In a second phase, it maps the identifier splitting/expansion problem into a minimization problem, \ie{} the search of the shortest path (optimal split/expansion) in a weighted graph. We apply TRIS to a sample of 974 identifiers extracted from JHotDraw, 3,085 from Lynx, and to a sample of 489 identifiers extracted from 340 C programs. Also, we compare TRIS with GenTest on a set of 2,663 mixed Java, C and C++ identifiers. We report evidence that TRIS split (and expansion) is more accurate than state-of-the-art approaches and that it is also efficient in terms of computation time.},
}
• L. Guerrouj, M. D. Penta, G. Antoniol, and Y. Guéhéneuc, “Tidier: an identifier splitting approach using speech recognition techniques,” Journal of software: evolution and process, vol. 25, iss. 6, pp. 575-599, 2013.
[Bibtex]
@article{journals/smr/GuerroujPAG13,
author = {Latifa Guerrouj and Massimiliano Di Penta and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {TIDIER: an identifier splitting approach using speech recognition techniques},
journal = {Journal of Software: Evolution and Process},
volume = {25},
number = {6},
year = {2013},
pages = {575-599},
ee = {http://dx.doi.org/10.1002/smr.539},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and its quality. Amon g other factors, recent studies have shown that a proper choice of identifiers influences understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program comprehension when high-level documentation is scarce or outdat ed and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier-splitting approaches when naming conventions (e.g. Camel Case) are not used or when identifiers contain abbreviations. We apply the approach on a sample of more than 1,000 identifiers extracted from 340 C programs and compare its results with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach: (i) to outper form the alternative ones, when using a dictionary augmented with domain knowledge or a contextual dictionary and (ii) to expand 48 \% of a set of selecte d abbreviations into dictionary words.},
}
• N. Madani, L. Guerrouj, M. D. Penta, Y. Guéhéneuc, and G. Antoniol, “Recognizing words from source code identifiers using speech recognition techniques,” in Csmr, 2010, pp. 68-77.
[Bibtex]
@inproceedings{conf/csmr/MadaniGPGA10,
author = {Nioosha Madani and Latifa Guerrouj and Massimiliano Di Penta and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques},
booktitle = {CSMR},
year = {2010},
pages = {68-77},
ee = {http://dx.doi.org/10.1109/CSMR.2010.31},
crossref = {DBLP:conf/csmr/2010},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifier components guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (e.g,, Camel Case) are not used or strictly followed and--or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared with manually-built oracles and with Camel Case split are encouraging. In fact, they show that the technique successfully recognize words composing identifiers (even when abbreviated) in about 90\% of cases and that it performs better than Camel Case. Furthermore, it was even able to spot mistakes in the manually built oracle.},
}

Nasir Ali

Nasir Ali is successfully defended his Ph.D in late 2013, since then he has been post-doctoral fellows at Queen’s University and then at the University of Waterloo. He holds a MS in computer science from University of Lahore, Pakistan (under Dr. Nadeem Asif supervision). He also holds MBA degree from National College of Business Administration & Economics. His research interest includes requirement engineering, empirical software engineering, information retrieval, reverse engineering, and program comprehension. He is the first one to introduce Trust-based requirement traceability in the context of software engineering to recovery traceability links among requirements or high level documents to low level documents and increase each links trust using temporal information. He is currently working on improving requirement traceability in terms of precision and recall.

###### Publications
• N. Ali, Z. Sharafi, Y. G. Guéhéneuc, and G. Antoniol, “An empirical study on the importance of source code entities for requirements traceability,” Empirical software engineering, vol. 20, iss. 2, pp. 442-478, 2015.
[Bibtex]
@article{AliSGA15,
author = {Nasir Ali and
Zohreh Sharafi and
Yann Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and
Giuliano Antoniol},
title = {An empirical study on the importance of source code entities for requirements
traceability},
journal = {Empirical Software Engineering},
volume = {20},
number = {2},
pages = {442--478},
year = {2015},
abstract = {
Requirements Traceability (RT) links help developers during program compre-
hension and maintenance tasks. However, creating RT links is a laborious and resource-
consuming task. Information Retrieval (IR) techniques are useful to automatically create
traceability links. However, IR-based techniques typically have low accuracy (precision,
recall, or both) and thus, creating RT links remains a human intensive process. We conjec-
ture that understanding how developers verify RT links could help improve the accuracy of
IR-based RT techniques to create RT links. Consequently, we perform an empirical study
consisting of four case studies. First, we use an eye-tracking system to capture develop-
ers’ eye movements while they verify RT links. We analyse the obtained data to identify
and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs.
implementation-level source code terms and class names vs. method names. Second, we
perform another eye-tracking case study to confirm that it is the semantic content of the
developers’ preferred types of SCEs and not their locations that attract developers’ atten-
tion and help them in their task to verify RT links. Third, we propose an improved term
weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency
(DPTF/IDF), that uses the knowledge of the developers’ preferred types of SCEs to
give more importance to these SCEs into the term weighting scheme. We integrate this
weighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create
a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka),
we show that the proposed technique statistically improves the accuracy of the recovered
RT links over a technique based on LSI and the usual Term Frequency/Inverse Docu-
ment Frequency (TF/IDF) weighting scheme. Finally, we compare the newly proposed
DPTF/IDF with our original Domain Or Implementation/Inverse Document Frequency
(DOI/IDF) weighting scheme.
}
}
• N. Ali, Y. Guéhéneuc, and G. Antoniol, “Trust-based requirements traceability,” in Icpc, 2011, pp. 111-120.
[Bibtex]
@inproceedings{05970169,
author = {Nasir Ali and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Trust-Based Requirements Traceability},
booktitle = {ICPC},
year = {2011},
pages = {111-120},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICPC.2011.42},
crossref = {DBLP:conf/iwpc/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2011/05970169.pdf},
}
• N. Ali, Z. Sharafi, Y. Guéhéneuc, and G. Antoniol, “An empirical study on requirements traceability using eye-tracking,” in Icsm, 2012, pp. 191-200.
[Bibtex]
@inproceedings{06405271,
author = {Nasir Ali and Zohreh Sharafi and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {An empirical study on requirements traceability using eye-tracking},
booktitle = {ICSM},
year = {2012},
pages = {191-200},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICSM.2012.6405271},
crossref = {DBLP:conf/icsm/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
},
pdf = {2012/06405271.pdf},
}
• N. Ali, Y. Guéhéneuc, and G. Antoniol, “Requirements traceability for object oriented systems by partitioning source code,” in Wcre, 2011, pp. 45-54.
[Bibtex]
@inproceedings{06079774,
author = {Nasir Ali and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Requirements Traceability for Object Oriented Systems by Partitioning Source Code},
booktitle = {WCRE},
year = {2011},
pages = {45-54},
ee = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2011.16},
crossref = {DBLP:conf/wcre/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Requirements trace ability ensures that source code is consistent with documentation and that all requirements have been implemented. During software evolution, features are added, removed, or modified, the code drifts away from its original requirements. Thus trace ability recovery approaches becomes necessary to re-establish the trace ability relations between requirements and source code. This paper presents an approach (Coparvo) complementary to existing trace ability recovery approaches for object-oriented programs. Coparvo reduces false positive links recovered by traditional trace ability recovery processes thus reducing the manual validation effort. Coparvo assumes that information extracted from different entities (i.e., class names, comments, class variables, or methods signatures) are different information sources, they may have different level of reliability in requirements trace ability and each information source may act as a different expert recommending trace ability links. We applied Coparvo on three data sets, Pooka, SIP Communicator, and iTrust, to filter out false positive links recovered via the information retrieval approach, i.e., vector space model. The results show that Coparvo significantly improves the of the recovered links accuracy and also reduces up to 83% effort required to manually remove false positive links.
},
pdf = {2011/06079774.pdf},
}
• N. Ali, Y. Guéhéneuc, and G. Antoniol, “Trustrace: mining software repositories to improve the accuracy of requirement traceability links,” Ieee trans. software eng., vol. 39, iss. 5, pp. 725-741, 2013.
[Bibtex]
@article{06341764,
author = {Nasir Ali and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Trustrace: Mining Software Repositories to Improve the Accuracy of Requirement Traceability Links},
journal = {IEEE Trans. Software Eng.},
volume = {39},
number = {5},
year = {2013},
pages = {725-741},
ee = {http://doi.ieeecomputersociety.org/10.1109/TSE.2012.71},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2013/06341764.pdf},
abstract = {Traceability is the only means to ensure that the source code of a system is consistent with its requirements and that all and only the specified requirements have been implemented by developers. During software maintenance and evolution, requirement traceability links become obsolete because developers do not/cannot devote effort to update them. Yet, recovering these traceability links later is a daunting and costly task for developers. Consequently, the literature proposed methods, techniques, and tools to recover these traceability links semi-automatically or automatically. Among the proposed techniques, the literature showed that information retrieval (IR) techniques can automatically recover traceability links between free-text requirements and source code. However, IR techniques lack accuracy (precision and recall). In this paper, we show that mining software repositories and combining mined results with IR techniques can improve the accuracy (precision and recall) of IR techniques and we propose Trustrace, a trust-based traceability recovery approach. We apply Trustrace on four medium-size open-source systems to compare the accuracy of its traceability links with those recovered using state-of-the-art IR techniques from the literature, based on the Vector Space Model and Jensen--Shannon model. The results of Trustrace are up to 22.7\% more precise and have 7.66\% better recall values than those of the other techniques, on average. We thus show that mining software repositories and combining the mined data with existing results from IR techniques improves the precision and recall of requirement traceability links.},
}
• N. Ali, Y. Guéhéneuc, and G. Antoniol, “Factors impacting the inputs of traceability recovery approaches,” in Software and systems traceability, , 2012, pp. 99-127.
[Bibtex]
@incollection{chp3A1010072F97814471223955,
author = {Nasir Ali and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Factors Impacting the Inputs of Traceability Recovery Approaches},
booktitle = {Software and Systems Traceability},
year = {2012},
pages = {99-127},
ee = {http://dx.doi.org/10.1007/978-1-4471-2239-5_5},
crossref = {DBLP:books/daglib/0028967},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2012/chp3A1010072F97814471223955.pdf},
}
• N. Ali, W. Wu, G. Antoniol, M. D. Penta, Y. Guéhéneuc, and J. H. Hayes, “Moms: multi-objective miniaturization of software,” in Icsm, 2011, pp. 153-162.
[Bibtex]
@inproceedings{conf/icsm/AliWAPGH11,
author = {Nasir Ali and Wei Wu and Giuliano Antoniol and Massimiliano Di Penta and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Jane Huffman Hayes},
title = {MoMS: Multi-objective miniaturization of software},
booktitle = {ICSM},
year = {2011},
pages = {153-162},
ee = {http://dx.doi.org/10.1109/ICSM.2011.6080782},
crossref = {DBLP:conf/icsm/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• A. Maiga, N. Ali, N. Bhattacharya, A. Sabane, Y. Guéhéneuc, G. Antoniol, and E. Aïmeur, “Support vector machines for anti-pattern detection,” in Ase, 2012, pp. 278-281.
[Bibtex]
@inproceedings{conf/kbse/MaigaABSGAA12,
author = {Abdou Maiga and Nasir Ali and Neelesh Bhattacharya and Aminata Sabane and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Esma A\"{\i}meur},
title = {Support vector machines for anti-pattern detection},
booktitle = {ASE},
year = {2012},
pages = {278-281},
ee = {http://doi.acm.org/10.1145/2351676.2351723},
crossref = {DBLP:conf/kbse/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• N. Ali, A. Sabane, Y. Guéhéneuc, and G. Antoniol, “Improving bug location using binary class relationships,” in Scam, 2012, pp. 174-183.
[Bibtex]
@inproceedings{conf/scam/AliSGA12,
author = {Nasir Ali and Aminata Sabane and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Improving Bug Location Using Binary Class Relationships},
booktitle = {SCAM},
year = {2012},
pages = {174-183},
ee = {http://doi.ieeecomputersociety.org/10.1109/SCAM.2012.26},
crossref = {DBLP:conf/scam/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Segla Kpodjedo

Segla Kpodjedo received his Engineering Diploma in computer engineering in 2005 from UTBM, France. From 2006 to 2007, he pursued and completed his Master of Science Applied (M.Sc.A) at Ecole Polytechnique de Montreal, Canada. Since September 2007,he is a Ph. D student under the supervision of Professors Philippe Galinier and Giuliano Antoniol at SOCCERLab, Ecole Polytechnique de Montreal, Canada. His current work mainly revolves around search based software engineering and software evolution.

###### Publications
• S. Kpodjedo, F. Ricca, P. Galinier, and G. Antoniol, “Recovering the evolution stable part using an ecgm algorithm: is there a tunnel in mozilla?,” in Csmr, 2009, pp. 179-188.
[Bibtex]
@inproceedings{04812751,
author = {Segla Kpodjedo and Filippo Ricca and Philippe Galinier and Giuliano Antoniol},
title = {Recovering the Evolution Stable Part Using an ECGM Algorithm: Is There a Tunnel in Mozilla?},
booktitle = {CSMR},
year = {2009},
pages = {179-188},
ee = {http://dx.doi.org/10.1109/CSMR.2009.24},
crossref = {DBLP:conf/csmr/2009},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2009/04812751.pdf},
abstract = {Analyzing the evolutionary history of the design of Object-Oriented Software is an important and difficult task where matching algorithms play a fundamental r ole. In this paper, we investigate the applicability of an error-correcting graph matching (ECGM) algorithm to object-oriented software evolution. By means of a case study, we report evidence of ECGM applicability in studying the Mozilla class diagram evolution. We collected 144 Mozilla snapshots over the past six years, reverse-engineered class diagrams and recovered traceability links between subsequent class diagrams. Our algorithm allows us to identify evolving classes that maintain a stable structure of relations (associations, inheritances and aggregations) with other classes and thus likely constitute the backbone of Mozilla.},
}
• A. Belderrar, S. Kpodjedo, Y. Guéhéneuc, G. Antoniol, and P. Galinier, “Sub-graph mining: identifying micro-architectures in evolving object-oriented software,” in Csmr, 2011, pp. 171-180.
[Bibtex]
@inproceedings{05741259,
author = {Ahmed Belderrar and Segla Kpodjedo and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Philippe Galinier},
title = {Sub-graph Mining: Identifying Micro-architectures in Evolving Object-Oriented Software},
booktitle = {CSMR},
year = {2011},
pages = {171-180},
ee = {http://dx.doi.org/10.1109/CSMR.2011.23},
crossref = {DBLP:conf/csmr/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Developers introduce novel and undocumented micro-architectures when performing evolution tasks on object-oriented applications. We are interested in understanding whether those organizations of classes and relations can bear, much like cataloged design and anti-patterns, potential harm or benefit to an object-oriented application. We present SGFinder, a sub-graph mining approach and tool based on an efficient enumeration technique to identify recurring micro-architectures in object-oriented class diagrams. Once SGFinder has detected instances of micro-architectures, we exploit these instances to identify their desirable properties, such as stability, or unwanted properties, such as change or fault proneness. We perform a feasibility study of our approach by applying SGFinder on the reverse-engineered class diagrams of several releases of two Java applications: ArgoUML and Rhino. We characterize and highlight some of the most interesting micro-architectures, e.g., the most fault prone and the most stable, and conclude that SGFinder opens the way to further interesting studies.
},
pdf = {2011/05741259.pdf},
}
• S. Kpodjedo, F. Ricca, P. Galinier, G. Antoniol, and Y. Guéhéneuc, “Madmatch: many-to-many approximate diagram matching for design comparison,” Ieee trans. software eng., vol. 39, iss. 8, pp. 1090-1111, 2013.
[Bibtex]
@article{06464271,
author = {Segla Kpodjedo and Filippo Ricca and Philippe Galinier and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {MADMatch: Many-to-Many Approximate Diagram Matching for Design Comparison},
journal = {IEEE Trans. Software Eng.},
volume = {39},
number = {8},
year = {2013},
pages = {1090-1111},
ee = {http://doi.ieeecomputersociety.org/10.1109/TSE.2013.9},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2013/06464271.pdf},
abstract = {Matching algorithms play a fundamental role in many important but difficult software engineering activities, especially design evolution analysis and model compari son. We present MADMatch, a fast and scalable Many-to-many Approximate Diagram Matching approach based on an Error-Tolerant Graph matching (ETGM) formulation. Diag rams are represented as graphs, costs are assigned to possible differences between two given graphs, and the goal is to retrieve the cheapest matching. We address the resulting optimisation problem with a tabu search enhanced by the novel use of lexical and structural information. Through several case studies with different types of diagrams and tasks, we show that our generic approach obtains better results than dedicated state-of-the-art algorithms, such as AURA, PLTSDiff or UMLDiff, on the exact same datasets used to introduce (and evaluate) these algorithms.},
}
• S. Kpodjedo, F. Ricca, P. Galinier, Y. Guéhéneuc, and G. Antoniol, “Design evolution metrics for defect prediction in object oriented systems,” Empirical software engineering, vol. 16, iss. 1, pp. 141-175, 2011.
[Bibtex]
@article{art3A1010072Fs1066401091517,
author = {Segla Kpodjedo and Filippo Ricca and Philippe Galinier and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Design evolution metrics for defect prediction in object oriented systems},
journal = {Empirical Software Engineering},
volume = {16},
number = {1},
year = {2011},
pages = {141-175},
ee = {http://dx.doi.org/10.1007/s10664-010-9151-7},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2011/art3A1010072Fs1066401091517.pdf},
abstract = {Testing is the most widely adopted practice to ensure software quality. However, this activity is often a compromise between the available resources and software quality. In object-oriented development, testing effort should be focused on defective classes. Unfortunately, identifying those classes is a challenging and difficult activity on which many metrics, techniques, and models have been tried. In this paper, we investigate the usefulness of elementary design evolution metrics to identify defective classes. The metrics include the numbers of added, deleted, and modified attributes, methods, and relations. The metrics are used to recommend a ranked list of classes likely to contain defects for a system. They are compared to Chidamber and Kemerer's metrics on several versions of Rhino and of ArgoUML. Further comparison is conducted with the complexity metrics computed by Zimmermann \textit{et al.} on several releases of Eclipse. The comparisons are made according to three criteria: presence of defects, number of defects, and defect density in the top-ranked classes. They show that the design evolution metrics, when used in conjunction with known metrics, improve the identification of defective classes. In addition, they show that the design evolution metrics make significantly better predictions of defect density than other metrics and, thus, can help in reducing the testing effort by focusing test activity on a reduced volume of code.},
}
• S. Kpodjedo, P. Galinier, and G. Antoniol, “Enhancing a tabu algorithm for approximate graph matching by using similarity measures,” in Evocop, 2010, pp. 119-130.
[Bibtex]
@inproceedings{conf/evoW/KpodjedoGA10,
author = {Segla Kpodjedo and Philippe Galinier and Giuliano Antoniol},
title = {Enhancing a Tabu Algorithm for Approximate Graph Matching by Using Similarity Measures},
booktitle = {EvoCOP},
year = {2010},
pages = {119-130},
ee = {http://dx.doi.org/10.1007/978-3-642-12139-5_11},
crossref = {DBLP:conf/evoW/2010cop},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {In this paper, we investigate heuristics in order to solve the Approximated Matching Problem (AGM). We propose a tabu search algorithm which exploits a simple neighborhood but is initialized by a greedy procedure which uses a measure of similarity between the vertices of the two graphs. The algorithm is tested on a large collection of graphs of various sizes (from 300 vertices and up to 3000 vertices) and densities. Computing times range from less than 1 second up to a few minutes. The algorithm obtains consistently very good results, especially on labeled graphs. The results obtained by the tabu algorithm alone (without the greedy procedure) were very poor, illustrating the importance of using vertex similarity during the early steps of the search process.},
}
• S. Kpodjedo, F. Ricca, P. Galinier, G. Antoniol, and Y. Guéhéneuc, “Studying software evolution of large object-oriented software systems using an etgm algorithm,” Journal of software: evolution and process, vol. 25, iss. 2, pp. 139-163, 2013.
[Bibtex]
@article{journals/smr/KpodjedoRGAG13,
author = {Segla Kpodjedo and Filippo Ricca and Philippe Galinier and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {Studying software evolution of large object-oriented software systems using an ETGM algorithm},
journal = {Journal of Software: Evolution and Process},
volume = {25},
number = {2},
year = {2013},
pages = {139-163},
ee = {http://dx.doi.org/10.1002/smr.519},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• S. Kpodjedo, P. Galinier, and G. Antoniol, “Using local similarity measures to efficiently address approximate graph matching,” Discrete applied mathematics, vol. 164, pp. 161-177, 2014.
[Bibtex]
@article{journals/dam/KpodjedoGA14,
author = {Segla Kpodjedo and Philippe Galinier and Giuliano Antoniol},
title = {Using local similarity measures to efficiently address approximate graph matching},
journal = {Discrete Applied Mathematics},
volume = {164},
year = {2014},
pages = {161-177},
ee = {http://dx.doi.org/10.1016/j.dam.2012.01.019},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• S. Kpodjedo, P. Galinier, and G. Antoniol, “On the use of similarity metrics for approximate graph matching,” Electronic notes in discrete mathematics, vol. 36, pp. 687-694, 2010.
[Bibtex]
@article{journals/endm/KpodjedoGA10,
author = {Segla Kpodjedo and Philippe Galinier and Giuliano Antoniol},
title = {On the use of similarity metrics for approximate graph matching},
journal = {Electronic Notes in Discrete Mathematics},
volume = {36},
year = {2010},
pages = {687-694},
ee = {http://dx.doi.org/10.1016/j.endm.2010.05.087},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• S. Kpodjedo, F. Ricca, P. Galinier, and G. Antoniol, “Error correcting graph matching application to software evolution,” in Wcre, 2008, pp. 289-293.
[Bibtex]
@inproceedings{conf/wcre/KpodjedoRGA08,
author = {Segla Kpodjedo and Filippo Ricca and Philippe Galinier and Giuliano Antoniol},
title = {Error Correcting Graph Matching Application to Software Evolution},
booktitle = {WCRE},
year = {2008},
pages = {289-293},
ee = {http://dx.doi.org/10.1109/WCRE.2008.48},
crossref = {DBLP:conf/wcre/2008},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {Graph representations and graph algorithms are widely adopted to model and resolve problems in many different areas from telecommunications, to bio-informatics, to civil and software engineering. Many software artifacts such as the class diagram can be thought of as graphs and thus, many software evolution problems can be reformulated as a graph matching problem. In this paper, we investigate the applicability of an error-correcting graph matching algorithm to object-oriented software evolution and report results obtained on a small system --- the Latazza application --- supporting applicability and usefulness of our proposal.},
}
Massimiliano Di penta

Massimiliano Di Penta is assistant professor at the Department of Engineering, University of Sannio – Benevento (Italy). He graduated in computer Engineering at the University of Sannio on December 1999. From May 2000 he joined the Software Engineering Research Group as Ph.D. student, under the supervision of Prof. Giuliano Antoniol, and he got the Ph.D. in computer engineering in July 2003. He is member of the IEEE, of the IEEE Computer Society, and of the ACM.

###### Publications
• E. Merlo, M. Dagenais, P. Bachand, J. S. Sormani, S. Gradara, and G. Antoniol, “Investigating large software system evolution: the linux kernel,” in Compsac, 2002, pp. 421-426.
[Bibtex]
@inproceedings{01045038,
author = {Ettore Merlo and Michel Dagenais and P. Bachand and J. S. Sormani and Sara Gradara and Giuliano Antoniol},
title = {Investigating Large Software System Evolution: The Linux Kernel},
booktitle = {COMPSAC},
year = {2002},
pages = {421-426},
ee = {http://doi.ieeecomputersociety.org/10.1109/CMPSAC.2002.1045038},
crossref = {DBLP:conf/compsac/2002},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2002/01045038.pdf},
abstract = {Large multi-platform multi-million lines of codes software systems evolve to cope with new platform or to meet user ever changing needs. While there has been several studies focused on the similarity of code fragments or modules few studies addressed the need to monitor the overall system evolution. Meanwhile the decision to evolve or to refactor a large software system needs to be supported by high level information representing the system overall picture abstracting from unnecessary details. This paper proposes to extend the concept of similarity of code fragments to quantify similarities at the release/system level. Similarities are captured by four software metrics representative of the commonalities and differences within and among software artifacts. To show the feasibility of characterizing large software system with the new metrics 365 releases of the Linux kernel were analyzed. The metrics the experimental results as well as the lessons learned are presented in the paper.},
}
• G. Antoniol, S. Gradara, and G. Venturi, “Methodological issues in a cmm level 4 implementation,” Software process: improvement and practice, vol. 9, iss. 1, pp. 33-50, 2004.
[Bibtex]
@article{journals/sopr/AntoniolGV04,
author = {Giuliano Antoniol and Sara Gradara and Gabriele Venturi},
title = {Methodological issues in a CMM Level 4 implementation},
journal = {Software Process: Improvement and Practice},
volume = {9},
number = {1},
year = {2004},
pages = {33-50},
ee = {http://dx.doi.org/10.1002/spip.183},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {The Capability Maturity Model (CMM) developed by the Software Engineering Institute is an improvement paradigm. It provides a framework for assessing the maturity of software processes on a five level scale, and guidelines which help to improve software process and artifact quality. Moving towards CMM Level 4 and Level 5, is a very demanding task even for large software companies already accustomed to the CMM and ISO certifications. It requires, for example, quality monitoring, control, feedback, and process optimization. In fact, going beyond CMM Level 3 requires a radical change in the way projects are carried out and managed. It involves quantitative and statistical techniques to control software processes and quality, and it entails substantial changes in the way the organization approaches software life cycle activities. In this paper we describe the process changes, adaptation, integration and tailoring, and we report lessons learned while preparing an Italian solution centre of EDS for the Level 4 internal assessment. The solution centre has about 350 people and carries out about 40 software development and maintenance projects each year. We describe how Level 4 Key Process Areas have been implemented building a methodological framework which leverages both existing available methodologies and practices already in place (e.g., derived form ISO compliance). We discuss how methodologies have been adapted to the company's internal and external situation and what are the underlining assumptions for the methodology adaptation. Furthermore we discuss cultural and organizational changes required to obtain a CMM Level 4 certification. The steps and the process improvement we have carried out, and the challenges we have faced were most likely those whith the highest risk and cost driving factor common to all organizations aiming at achieving CMM Level 4.},
}
• M. D. Penta, S. Gradara, and G. Antoniol, “Traceability recovery in rad software systems,” in Iwpc, 2002, pp. 207-218.
[Bibtex]
@inproceedings{conf/iwpc/PentaGA02,
author = {Massimiliano Di Penta and Sara Gradara and Giuliano Antoniol},
title = {Traceability Recovery in RAD Software Systems},
booktitle = {IWPC},
year = {2002},
pages = {207-218},
ee = {http://computer.org/proceedings/iwpc/1495/14950207abs.htm},
crossref = {DBLP:conf/iwpc/2002},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
Fabio Rollo
h6>Publications
• G. Antoniol, V. F. Rollo, and G. Venturi, “Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories,” Acm sigsoft software engineering notes, vol. 30, iss. 4, pp. 1-5, 2005.
[Bibtex]
@article{p14-antoniol,
author = {Giuliano Antoniol and Vincenzo Fabio Rollo and Gabriele Venturi},
title = {Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories},
journal = {ACM SIGSOFT Software Engineering Notes},
volume = {30},
number = {4},
year = {2005},
pages = {1-5},
ee = {http://doi.acm.org/10.1145/1082983.1083156},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
This paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are stored in software repositories which often contain the changes history. On the other hand there is a lack of approaches, technologies and methods to efficiently extract and represent time dependent information. Disciplines such as signal and image processing or speech recognition adopt frequency domain representations to mitigate differences of signals evolving in time. Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories. LPC or Cepstrum allow obtaining very compact representations with linear complexity. These representations can be used to highlight components and artifacts evolved in the same way or with very similar evolution patterns. To assess the proposed approach we applied LPC and Cepstral analysis to 211 Linux kernel releases (i.e., from 1.0 to 1.3.100), to identify files with very similar size histories. The approach, the preliminary results and the lesson learned are presented in this paper.
},
pdf = {2005/p14-antoniol.pdf},
}
• E. Merlo, G. Antoniol, M. D. Penta, and V. F. Rollo, “Linear complexity object-oriented similarity for clone detection and software evolution analyses,” in Icsm, 2004, pp. 412-416.
[Bibtex]
@inproceedings{conf/icsm/MerloAPR04,
author = {Ettore Merlo and Giuliano Antoniol and Massimiliano Di Penta and Vincenzo Fabio Rollo},
title = {Linear Complexity Object-Oriented Similarity for Clone Detection and Software Evolution Analyses},
booktitle = {ICSM},
year = {2004},
pages = {412-416},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357826},
crossref = {DBLP:conf/icsm/2004},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• G. Antoniol, M. Ceccarelli, V. F. Rollo, W. Longo, T. Nutile, M. Ciullo, E. Colonna, A. Calabria, M. Astore, A. Lembo, P. Toriello, and G. M. Persico, “Browsing large pedigrees to study of the isolated populations in the "parco nazionale del cilento e vallo di diano",” in Wirn, 2003, pp. 258-268.
[Bibtex]
@inproceedings{conf/wirn/AntoniolCRLNCCCALTP03,
author = {Giuliano Antoniol and Michele Ceccarelli and Vincenzo Fabio Rollo and Wanda Longo and Teresa Nutile and Marina Ciullo and Enza Colonna and Antonietta Calabria and Maria Astore and Anna Lembo and Paola Toriello and M. Grazia Persico},
title = {Browsing Large Pedigrees to Study of the Isolated Populations in the "Parco Nazionale del Cilento e Vallo di Diano"},
booktitle = {WIRN},
year = {2003},
pages = {258-268},
ee = {http://dx.doi.org/10.1007/978-3-540-45216-4_29},
crossref = {DBLP:conf/wirn/2003},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• G. Antoniol, V. F. Rollo, and G. Venturi, “Detecting groups of co-changing files in cvs repositories,” in Iwpse, 2005, pp. 23-32.
[Bibtex]
@inproceedings{conf/iwpse/AntoniolRV05,
author = {Giuliano Antoniol and Vincenzo Fabio Rollo and Gabriele Venturi},
title = {Detecting groups of co-changing files in CVS repositories},
booktitle = {IWPSE},
year = {2005},
pages = {23-32},
ee = {http://doi.ieeecomputersociety.org/10.1109/IWPSE.2005.11},
crossref = {DBLP:conf/iwpse/2005},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Master Students

Amir Sabouri

Amir graduated in December 2016 in Software Engineering at Polytechnique Montréal under the supervision of Prof. Foutse Khomh and Prof. Giuliano Antoniol. His work focused on JabvaScript code smell and JavaScript specific poor programming practices.

###### Publications

Ons Mlouki

Ons graduated in early 2015 in Software Engineering at Polytechnique Montréal under the supervision of Prof. Foutse Khomh and Prof. Giuliano Antoniol. Her works focused on Android licensing and license compatibility.

###### Publications
• O. Mlouki, F. Khomh, and G. Antoniol, “On the detection of licenses violations in android ecosystem,” in Saner, 2016, pp. 382-392.
[Bibtex]
@inproceedings{ons2016saner,
author = {Ons Mlouki and Foutse Khomh and Giulio Antoniol},
title = {On the Detection of Licenses Violations in Android Ecosystem},
booktitle = {SANER},
year = {2016},
pages = {382-392},
abstract = {
Mobile applications (apps), developers often reuse code from existing libraries and frameworks in order to reduce development costs. However, these libraries and frameworks are governed by licenses to which developers must comply. A failure to comply with a license is likely to result in penalties and fines. In this paper we define a three steps approach that helps to identify licenses used in a system and thus to detect licenses violations. We validate our approach in a set of apps from the F-droid market1 . We identify first the most common license used in mobile open source apps. Then we propose our model that identify licenses across different categories of mobile apps, some kinds of violation and licence changes in the process of software
},
}

Neelesh Bhattacharya

Neelesh Bhattacharya graduated in 2012. His specific research interests were Software Testing, Metaheuristics, Search Based Software Engineering and Exception Handling.

###### Publications
• N. Bhattacharya, A. Sakti, G. Antoniol, Y. Guéhéneuc, and G. Pesant, “Divide-by-zero exception raising via branch coverage,” in Ssbse, 2011, pp. 204-218.
[Bibtex]
@inproceedings{chp3A1010072F978364223716419,
author = {Neelesh Bhattacharya and Abdelilah Sakti and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Gilles Pesant},
title = {Divide-by-Zero Exception Raising via Branch Coverage},
booktitle = {SSBSE},
year = {2011},
pages = {204-218},
ee = {http://dx.doi.org/10.1007/978-3-642-23716-4_19},
crossref = {DBLP:conf/ssbse/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
In this paper, we discuss how a search-based branch coverage approach can be used to design an effective test data generation approach, specifically targeting divide-by-zero exceptions. We first propose a novel testability transformation combining approach level and branch distance. We then use different search strategies, i.e., hill climbing, simulated annealing, and genetic algorithm, to evaluate the performance of the novel testability transformation on a small synthetic example as well as on methods known to throw divide-by-zero exceptions, extracted from real world systems, namely Eclipse and Android. Finally, we also describe how the test data generation for divide-by-zero exceptions can be formulated as a constraint programming problem and compare the resolution of this problem with a genetic algorithm in terms of execution time. We thus report evidence that genetic algorithm using our novel testability transformation out-performs hill climbing and simulated annealing and a previous approach (in terms of numbers of fitness evaluation) but is out-performed by constraint programming (in terms of execution time).
},
pdf = {2011/chp3A1010072F978364223716419.pdf},
}
• N. Bhattacharya, O. El-Mahi, E. Duclos, G. Beltrame, G. Antoniol, S. L. Digabel, and Y. Guéhéneuc, “Optimizing threads schedule alignments to expose the interference bug pattern,” in Ssbse, 2012, pp. 90-104.
[Bibtex]
@inproceedings{conf/ssbse/BhattacharyaEDBADG12,
author = {Neelesh Bhattacharya and Olfat El-Mahi and Etienne Duclos and Giovanni Beltrame and Giuliano Antoniol and S{\'e}bastien Le Digabel and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {Optimizing Threads Schedule Alignments to Expose the Interference Bug Pattern},
booktitle = {SSBSE},
year = {2012},
pages = {90-104},
ee = {http://dx.doi.org/10.1007/978-3-642-33119-0_8},
crossref = {DBLP:conf/ssbse/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• A. Maiga, N. Ali, N. Bhattacharya, A. Sabane, Y. Guéhéneuc, G. Antoniol, and E. Aïmeur, “Support vector machines for anti-pattern detection,” in Ase, 2012, pp. 278-281.
[Bibtex]
@inproceedings{conf/kbse/MaigaABSGAA12,
author = {Abdou Maiga and Nasir Ali and Neelesh Bhattacharya and Aminata Sabane and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Esma A\"{\i}meur},
title = {Support vector machines for anti-pattern detection},
booktitle = {ASE},
year = {2012},
pages = {278-281},
ee = {http://doi.acm.org/10.1145/2351676.2351723},
crossref = {DBLP:conf/kbse/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Ferdaous Boughanmi

Ferdaous Boughanmi joined the program of Department of Computer Science and Software engineering of École Polytechnique de Montréal in winter 2009. She graduated in 2012 with a research on licensing and software architectures. More precisely, the impact of license constraints on system architecture.

###### Publications
• S. Hassaine, F. Boughanmi, Y. Guéhéneuc, S. Hamel, and G. Antoniol, “Change impact analysis: an earthquake metaphor,” in Icpc, 2011, pp. 209-210.
[Bibtex]
@inproceedings{05970184,
author = {Salima Hassaine and Ferdaous Boughanmi and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Sylvie Hamel and Giuliano Antoniol},
title = {Change Impact Analysis: An Earthquake Metaphor},
booktitle = {ICPC},
year = {2011},
pages = {209-210},
ee = {http://doi.ieeecomputersociety.org/10.1109/ICPC.2011.54},
crossref = {DBLP:conf/iwpc/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Impact analysis is crucial to make decisions among different alternative implementations and to anticipate future maintenance tasks. Several approaches were proposed to identify software artefacts being affected by a change. However, to the best of our knowledge, none of these approaches have been used to study the scope of changes in a program. Yet, this information would help developers assess their change efforts and perform more adequate changes. Thus, we present a metaphor inspired by seismology and propose a mapping between the concepts of seismology and software evolution. We show the applicability and usefulness of our metaphor using Rhino and Xerces-J.
},
pdf = {2011/05970184.pdf},
}
• S. Hassaine, F. Boughanmi, Y. Guéhéneuc, S. Hamel, and G. Antoniol, “A seismology-inspired approach to study change propagation,” in Icsm, 2011, pp. 53-62.
[Bibtex]
@inproceedings{conf/icsm/HassaineBGHA11,
author = {Salima Hassaine and Ferdaous Boughanmi and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Sylvie Hamel and Giuliano Antoniol},
title = {A seismology-inspired approach to study change propagation},
booktitle = {ICSM},
year = {2011},
pages = {53-62},
ee = {http://dx.doi.org/10.1109/ICSM.2011.6080772},
crossref = {DBLP:conf/icsm/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
}

Etienne Duclos

My name is Etienne Duclos, Ii am an engineer in computer science from the ISIMA, a French engineering school. I am now a Master student in Ecole Polytechnique de Montreal, under the supervision of Mr. Sébastien Le Digabel et Yann-Gaël Guéhéneuc. My interests are software engineering, software testing and memory gestion in software.

###### Publications
• N. Bhattacharya, O. El-Mahi, E. Duclos, G. Beltrame, G. Antoniol, S. L. Digabel, and Y. Guéhéneuc, “Optimizing threads schedule alignments to expose the interference bug pattern,” in Ssbse, 2012, pp. 90-104.
[Bibtex]
@inproceedings{conf/ssbse/BhattacharyaEDBADG12,
author = {Neelesh Bhattacharya and Olfat El-Mahi and Etienne Duclos and Giovanni Beltrame and Giuliano Antoniol and S{\'e}bastien Le Digabel and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {Optimizing Threads Schedule Alignments to Expose the Interference Bug Pattern},
booktitle = {SSBSE},
year = {2012},
pages = {90-104},
ee = {http://dx.doi.org/10.1007/978-3-642-33119-0_8},
crossref = {DBLP:conf/ssbse/2012},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
Zeina Awedikian

Zeina Awedikian, received a B.E in Computer Engineering from the Lebanese Americain Univesity. and later in 2010 the masters degree in Software Engineering in the Soccer Lab, under the supervision of Giuliano Antoniol. The masters project consists of automatically generating test data to cover the Modified condition/decision
coverage test criteria. The aim is to extract all decisions in the code and generate the correct data to be able to test them, covering the MC/DC criteria. This criteria is used in critical systems such as avionic systems, where even the tiniest errors in decisions need to be found.

###### Publications
• Z. Awedikian, K. Ayari, and G. Antoniol, “Mc/dc automatic test input data generation,” in Gecco, 2009, pp. 1657-1664.
[Bibtex]
@inproceedings{conf/gecco/AwedikianAA09,
author = {Zeina Awedikian and Kamel Ayari and Giuliano Antoniol},
title = {MC/DC automatic test input data generation},
booktitle = {GECCO},
year = {2009},
pages = {1657-1664},
ee = {http://doi.acm.org/10.1145/1569901.1570123},
crossref = {DBLP:conf/gecco/2009g},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {In regulated domain such as aerospace and in safety critical domains, software quality assurance is subject to strict regulation such as the RTCA DO-178B standard. Among other conditions, the DO-178B mandates for the satisfaction of the modified condition/decision coverage (MC/DC) testing criterion for software where failure condition may have catastrophic consequences. MC/DC is a white box testing criterion aiming at proving that all conditions involved in a predicate can influence the predicate value in the desired way. In this paper, we propose a novel fitness function inspired by chaining test data generation to efficiently generate test input data satisfying the MC/DC criterion. Preliminary results show the superiority of the novel fitness function that is able to avoid plateau leading to a behavior close to random test of traditional white box fitness functions.},
}
Ahmed Belderrar

Ahmed Belderrar received a Master degree at École Polytechnique de Montréal under the supervision of Prof. Giuliano Antoniol. His research interests includes: software maintenance, software comprehension, software evolution, design patterns, and formal verification for software. The master project focuses on micro-architecture inference.

###### Publications
• A. Belderrar, S. Kpodjedo, Y. Guéhéneuc, G. Antoniol, and P. Galinier, “Sub-graph mining: identifying micro-architectures in evolving object-oriented software,” in Csmr, 2011, pp. 171-180.
[Bibtex]
@inproceedings{05741259,
author = {Ahmed Belderrar and Segla Kpodjedo and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol and Philippe Galinier},
title = {Sub-graph Mining: Identifying Micro-architectures in Evolving Object-Oriented Software},
booktitle = {CSMR},
year = {2011},
pages = {171-180},
ee = {http://dx.doi.org/10.1109/CSMR.2011.23},
crossref = {DBLP:conf/csmr/2011},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Developers introduce novel and undocumented micro-architectures when performing evolution tasks on object-oriented applications. We are interested in understanding whether those organizations of classes and relations can bear, much like cataloged design and anti-patterns, potential harm or benefit to an object-oriented application. We present SGFinder, a sub-graph mining approach and tool based on an efficient enumeration technique to identify recurring micro-architectures in object-oriented class diagrams. Once SGFinder has detected instances of micro-architectures, we exploit these instances to identify their desirable properties, such as stability, or unwanted properties, such as change or fault proneness. We perform a feasibility study of our approach by applying SGFinder on the reverse-engineered class diagrams of several releases of two Java applications: ArgoUML and Rhino. We characterize and highlight some of the most interesting micro-architectures, e.g., the most fault prone and the most stable, and conclude that SGFinder opens the way to further interesting studies.
},
pdf = {2011/05741259.pdf},
}

Nioosha Madani received a Master degree at the department of computer engineering at École Polyetchnique de Montréal in 2010. She received her bachelor degree from Azad University of Iran in software engineering. Her research interests are in the area of Software Engineering, specifically as it relates to the topics of Software Testing, Object oriented programming and Database designing. Her Master project is on identifier split via dynamic programming techniques.

###### Publications
• N. Madani, L. Guerrouj, M. D. Penta, Y. Guéhéneuc, and G. Antoniol, “Recognizing words from source code identifiers using speech recognition techniques,” in Csmr, 2010, pp. 68-77.
[Bibtex]
@inproceedings{conf/csmr/MadaniGPGA10,
author = {Nioosha Madani and Latifa Guerrouj and Massimiliano Di Penta and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc and Giuliano Antoniol},
title = {Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques},
booktitle = {CSMR},
year = {2010},
pages = {68-77},
ee = {http://dx.doi.org/10.1109/CSMR.2010.31},
crossref = {DBLP:conf/csmr/2010},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifier components guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (e.g,, Camel Case) are not used or strictly followed and--or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared with manually-built oracles and with Camel Case split are encouraging. In fact, they show that the technique successfully recognize words composing identifiers (even when abbreviated) in about 90\% of cases and that it performs better than Camel Case. Furthermore, it was even able to spot mistakes in the manually built oracle.},
}

Fatemeh Asadi received her bachelor degree in Software Engineering from National University of Iran in 2004 and her master degree in Software Design and Development from Iran University of Science and Technology in 2006. In her master she worked on concurrency Control in Native XML Databases. In 2010 she received a master degree at the department of computer engineering at École Polytechnique de Montréal under the supervision of Prof. Giuliano Antoniol

###### Publications
• F. Asadi, M. D. Penta, G. Antoniol, and Y. Guéhéneuc, “A heuristic-based approach to identify concepts in execution traces,” in Csmr, 2010, pp. 31-40.
[Bibtex]
@inproceedings{05714415,
author = {Fatemeh Asadi and Massimiliano Di Penta and Giuliano Antoniol and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {A Heuristic-Based Approach to Identify Concepts in Execution Traces},
booktitle = {CSMR},
year = {2010},
pages = {31-40},
ee = {http://dx.doi.org/10.1109/CSMR.2010.17},
crossref = {DBLP:conf/csmr/2010},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {
Concept or feature identification, i.e., the identification of the source code fragments implementing a particular feature, is a crucial task during software understanding and maintenance. This paper proposes an approach to identify concepts in execution traces by finding cohesive and decoupled fragments of the traces. The approach relies on search-based optimization techniques, textual analysis of the system source code using latent semantic indexing, and trace compression techniques. It is evaluated to identify features from execution traces of two open source systems from different domains, JHotDraw and ArgoUML. Results show that the approach is always able to identify trace segments implementing concepts with a high precision and, for highly cohesive concepts, with a high overlap with the manually-built oracle.
},
pdf = {2010/05714415.pdf},
}

Kamel Ayari

Kamel Ayari received a Master degree at École Polytechnique de Montréal under the supervision of Prof. Giuliano Antoniol. The master project focuses on ant colonies techniques to generate automatic test input data for integer and real value parameters.

###### Publications
• K. Ayari, P. Meshkinfam, G. Antoniol, and M. D. Penta, “Threats on building models from cvs and bugzilla repositories: the mozilla case study,” in Cascon, 2007, pp. 215-228.
[Bibtex]
@inproceedings{p215-ayari,
author = {Kamel Ayari and Peyman Meshkinfam and Giuliano Antoniol and Massimiliano Di Penta},
title = {Threats on building models from CVS and Bugzilla repositories: the Mozilla case study},
booktitle = {CASCON},
year = {2007},
pages = {215-228},
ee = {http://doi.acm.org/10.1145/1321211.1321234},
crossref = {DBLP:conf/cascon/2007},
bibsource = {DBLP, http://dblp.uni-trier.de},
pdf = {2007/p215-ayari.pdf},
abstract = {Information obtained by merging data extracted from problem reporting systems -- such as Bugzilla -- and versioning systems -- such as Concurrent Version System (CVS) -- is widely used in quality assessment approaches. This paper attempts to shed some light on threats and difficulties faced when trying to integrate information extracted from Mozilla CVS and bug repositories. Indeed, the heterogeneity of Mozilla bug reports, often dealing with non-defect issues, and lacking of traceable information may undermine validity of quality assessment approaches relying on repositories integration. In the reported Mozilla case study, we observed that available integration heuristics are unable to recover thousands of traceability links. Furthermore, Bugzilla classification mechanisms do not enforce a distinction between different kinds of maintenance activities. Obtained evidence suggests that a large amount of information is lost; we conjecture that to benefit from CVS and problem reporting systems, more systematic issue classification and more reliable traceability mechanisms are needed.},
}
• G. Antoniol, K. Ayari, M. D. Penta, F. Khomh, and Y. Guéhéneuc, “Is it a bug or an enhancement?: a text-based approach to classify change requests,” in Cascon, 2008, p. 23.
[Bibtex]
@inproceedings{conf/cascon/AntoniolAPKG08,
author = {Giuliano Antoniol and Kamel Ayari and Massimiliano Di Penta and Foutse Khomh and Yann-Ga{\"e}l Gu{\'e}h{\'e}neuc},
title = {Is it a bug or an enhancement?: a text-based approach to classify change requests},
booktitle = {CASCON},
year = {2008},
pages = {23},
ee = {http://doi.acm.org/10.1145/1463788.1463819},
crossref = {DBLP:conf/cascon/2008},
abstract = {
Bug tracking systems are valuable assets for managing maintenance activities. They are widely used in open-source projects as well as in the software industry. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organizational issues. These different kinds of issues are simply labeled as "bug" for lack of a better classification support or of knowledge about the possible kinds.
This paper investigates whether the text of the issues posted in bug tracking systems is enough to classify them into corrective maintenance and other kinds of activities.
We show that alternating decision trees, naive Bayes classifiers, and logistic regression can be used to accurately distinguish bugs from other kinds of issues. Results from empirical studies performed on issues for Mozilla, Eclipse, and JBoss indicate that issues can be classified with between 77\% and 82\% of correct decisions.
},
bibsource = {DBLP, http://dblp.uni-trier.de},
}
• K. Ayari, S. Bouktif, and G. Antoniol, “Automatic mutation test input data generation via ant colony,” in Gecco, 2007, pp. 1074-1081.
[Bibtex]
@inproceedings{conf/gecco/AyariBA07,
author = {Kamel Ayari and Salah Bouktif and Giuliano Antoniol},
title = {Automatic mutation test input data generation via ant colony},
booktitle = {GECCO},
year = {2007},
pages = {1074-1081},
ee = {http://doi.acm.org/10.1145/1276958.1277172},
crossref = {DBLP:conf/gecco/2007},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {Fault-based testing is often advocated to overcome limitations of other testing approaches; however it is also recognized as being expen sive. On the other hand, evolutionary algorithms have been proved suitable for reducing the cost of data generation in the context of coverage based testing. In this paper, we propose a new evolutionary approach based on ant colony optimization for au tomatic test input data generation in the context of mutation testing to reduce the cost of such a test strategy. In our approach the a nt colony optimization algorithm is enhanced by a probability density estimation technique. We compare our proposal with other evolution ary algorithms, e.g., Genetic Algorithm. Our preliminary results on JAVA testbeds show that our approach performed significantly better than other alternatives.},
}
• Z. Awedikian, K. Ayari, and G. Antoniol, “Mc/dc automatic test input data generation,” in Gecco, 2009, pp. 1657-1664.
[Bibtex]
@inproceedings{conf/gecco/AwedikianAA09,
author = {Zeina Awedikian and Kamel Ayari and Giuliano Antoniol},
title = {MC/DC automatic test input data generation},
booktitle = {GECCO},
year = {2009},
pages = {1657-1664},
ee = {http://doi.acm.org/10.1145/1569901.1570123},
crossref = {DBLP:conf/gecco/2009g},
bibsource = {DBLP, http://dblp.uni-trier.de},
abstract = {In regulated domain such as aerospace and in safety critical domains, software quality assurance is subject to strict regulation such as the RTCA DO-178B standard. Among other conditions, the DO-178B mandates for the satisfaction of the modified condition/decision coverage (MC/DC) testing criterion for software where failure condition may have catastrophic consequences. MC/DC is a white box testing criterion aiming at proving that all conditions involved in a predicate can influence the predicate value in the desired way. In this paper, we propose a novel fitness function inspired by chaining test data generation to efficiently generate test input data satisfying the MC/DC criterion. Preliminary results show the superiority of the novel fitness function that is able to avoid plateau leading to a behavior close to random test of traditional white box fitness functions.},
}