Demystifying the Practices of Software Design and the Impact on Codebase Quality and Sustainability
Software systems continue to increase in size and complexity to match the ever-increasing user expectations. Designing and engineering such complex software systems brings about unique challenges. Complex systems are not only expensive to develop, but even more expensive to maintain. Most software systems must adapt to continuously changing business contexts and requirements. In the process, software accumulates arbitrary complexities making its maintenance even more challenging. Design and modeling are the primary methodologies to develop reliable, sustainable, maintainable systems. The development of novel design languages, tools, and methodologies is frequently not able to keep up with the exponential increase in software complexities. As a result, engineers often experiment with new design languages and methodologies with insufficient understanding of their efficacy or impact on software maintainability and sustainability. This dissertation explores the practices of software design and investigates the impact of the design on code quality, maintainability, and sustainability. The research aims to answer the following main research questions. What impact do design activities and software modeling have on code quality? What are the key quality attributes that are most impacted by design and modeling activities? What are the practices and perceptions of professional software engineers with respect to design effectiveness? The dissertation also explores the potential negative impacts of software designs in certain contexts. For example, the dissertation explores how do software engineers integrate handwritten code with code that is automatically generated from design models. This dissertation presents four main contributions as follows. The first contribution is a survey study to investigate contemporary software design practices and to explore the perception of practitioners and the tools and notations they use. I conducted a survey of 228 software practitioners replicating a previous survey study that was conducted ten years ago. The goal of the study is to uncover trends in the practice of software design and the adoption patterns of modeling languages such as UML. The first phase was conducted in April-December 2007 and included 113 responses. The second phase was conducted in March-November 2017 and included 115 responses. The results uncover key trends in the practice, including a significant increase in the adoption of informal modeling, and modeling with domain-specific notations. The study also uncovers key deficiencies in the prevalent design tools, including perceptions of significant learning curve and complexities of modeling tools, and the insufficient support for model-based collaborations. The second contribution is a study to investigate whether there is a correlation between design activities, using languages such as UML, and improvements in code quality and sustainability. The general consensus of researchers and practitioners is that up-front and continuous software design using modeling languages such as UML improve code quality and reliability particularly as the software evolves over time. However, our understanding of the impact of using such modeling and design languages remains limited. My aim is to characterize this impact on code quality and sustainability. I identified a sample of open-source software repositories with extensive use of designs and modeling and compare their code qualities with similar code-centric repositories. My evaluation focuses on various code quality attributes such as code smells and technical debt. I also conducted code evolution analysis over a five-year period and collected additional data from questionnaires and interviews with active repository contributors. The study finds that repositories with significant use of models and design activities are associated with reduced critical code smells but are also associated with an increase in non-critical code smells. The results suggest that modeling and design activities are associated with a significant reduction in measures of technical debt. Analyzing code evolution over a five-year period reveals that UML repositories start with significantly lower technical debt density measures but tend to decline over time. The third contribution is a study to investigate whether there are unexpected side effects of using design and generative approaches. For example, typical design practices involve generating code from design models. This generated code is fundamentally different than what engineers would typically write by hand. As a result, extending and integrating with this code presents unique challenges to engineers. Therefore, this study aims to understand the impacts of UML modeling on code quality in different environments such as MDE and Non-MDE contexts. I investigate the unique handwritten code quality in the MDE context. The study analyzes these unique code fragments and compares their characteristics to handwritten code in repositories where code generation is not present. The study finds that handwritten code quality in the MDE context suffers from elevated Technical Debt (TD) and Code Smells (CS). I observe key code smells that are particularly evident in this handwritten code. These findings imply that code generators must optimize for human comprehension, prioritize extensibility, and must facilitate integration with handwritten code elements. The fourth contribution is investigating model usages in open-source projects. To identify model-based repositories I used in house developed ModelMine tool. This tool can identify model-based repositories in GitHub and other open-source platforms. I identified seventeen repositories and their modeling artifacts. Modeling artifacts includes UML files. Further, I analyzed the model files to understand the practices and usages of models in these selected repositories. This investigated how models are used, those who create the models are the practitioners who develop the main systems, how models are maintained and get updated, what is the model life cycles in relation to the repository life cycle. The results show that models are not frequently used in software development in open-source, rarely maintained and updated by the developers, and the model’s life cycle start early in the software development phase and end early.
Computer science|Computer Engineering
Rahad, Khandoker Abdul, "Demystifying the Practices of Software Design and the Impact on Codebase Quality and Sustainability" (2021). ETD Collection for University of Texas, El Paso. AAI28860413.