Software Engineering Domain Knowledge to Identify Duplicate Bug Reports


Earlier, many methodologies was proposed for detecting duplicate bug reports by comparing the textual content of bug reports to subject-specific contextual material, namely lists of software-engineering terms, such as non-functional requirements and architecture keywords. When a bug report includes a word in these word-list contexts, the bug report is measured to be linked with that context and this information is likely to improve bug-deduplication methods. Here, we recommend a technique to partially automate the extraction of contextual word lists from software-engineering literature. Evaluating this software-literature context technique on real-world bug reports creates useful consequences that indicate this semi-automated method has the potential to significantly decrease the manual attempt used in contextual bug deduplication while suffering only a minor loss in accuracy.

Keywords : software literature; duplicate bug reports; information retrieval; machine learning; documentation.

