Software Maintenance & Evolution

Most software effort happens after the first release. This graduate course examines the theories, tools, and techniques that keep large systems healthy as they grow, change, and age over years of active use.

Course Overview

Building software is hard. Keeping it running, comprehensible, and adaptable over many years is arguably harder. Industry studies consistently show that maintenance activities — fixing defects, adding features to existing systems, migrating platforms, and paying down technical debt — consume the majority of a software project's total lifetime cost. Yet most computer science curricula focus almost exclusively on the initial construction phase.

Software Maintenance & Evolution fills that gap. The course picks up where a traditional software engineering sequence leaves off: your system has shipped, users depend on it, and now the real work begins. Over the semester you will study why software degrades over time, how to measure and manage internal quality, and what modern research has to offer in the way of automated support for the grueling day-to-day work of keeping large codebases alive.

Who is this course for? Graduate students in computer science with an interest in software engineering research or anyone who wants a deeper understanding of what it takes to maintain systems that must evolve continuously. Prior exposure to version control, testing, and at least one large collaborative project is expected.

Understand

Why long-lived software degrades and the theoretical frameworks that explain it.

Measure

How to quantify code quality, complexity, and maintainability with practical metrics.

Improve

Systematic refactoring, smell detection, and strategies for reducing technical debt.

Automate

NLP and ML techniques that help triage bugs, comprehend code, and mine repositories.

Software Decay & Aging

Software does not wear out like a physical machine, yet it deteriorates in its own way. Every quick fix that bypasses the original architecture, every requirement change that bends a module beyond its intended purpose, and every dependency that drifts out of sync contributes to a gradual erosion of structural integrity. Left unchecked, these forces make the system progressively harder to understand, extend, and debug.

Technical Debt

Ward Cunningham introduced the metaphor of "technical debt" to describe the accumulated cost of expedient decisions. Like financial debt, technical debt compounds: shortcuts taken today generate interest in the form of future rework. The course explores how to identify, classify, and prioritize repayment of various forms of technical debt — from architecture-level misalignments to small-scale code-level shortcuts.

Lehman's Laws of Software Evolution

Manny Lehman's empirical laws, formulated through decades of observation, provide one of the few theoretical frameworks for understanding how large systems change over time. Key principles include the law of continuing change (a system must be continually adapted or it becomes progressively less satisfactory) and the law of increasing complexity (as a system evolves, its complexity increases unless work is done to maintain or reduce it). These laws remain remarkably relevant and serve as a recurring touchstone throughout the course.

Software entropy. Much like the second law of thermodynamics, software tends toward disorder without deliberate effort to counteract it. Understanding this tendency is the first step toward effective maintenance planning.

Software Quality

Maintenance decisions depend on the ability to measure quality — but quality is a multidimensional concept. This part of the course distinguishes between external quality (attributes visible to users, such as reliability and performance) and internal quality (attributes visible to developers, such as readability, modularity, and low coupling).

Complexity Metrics

Cyclomatic complexity, Halstead volume, and cognitive complexity each capture different facets of how difficult a piece of code is to understand and modify. The course teaches you to compute these metrics, interpret their values in context, and understand their known limitations.

Maintainability Indices

Composite scores like the Maintainability Index combine several lower-level measures into a single number intended to estimate how easy a module will be to change. You will learn how these indices are computed, where they are useful, and why a single number can never tell the full story about a system's health.

External Quality

  • Correctness and reliability
  • Performance and efficiency
  • Usability and accessibility
  • Security and robustness

Internal Quality

  • Readability and clarity
  • Low coupling, high cohesion
  • Adequate test coverage
  • Consistent naming and style

Code Smells & Refactoring

A "code smell" is a surface-level indication that something deeper may be wrong with the design. Long methods, feature envy, data clumps, and god classes are among the most commonly cited patterns. The course surveys the catalog of known smells and teaches you to recognize them in real codebases.

Systematic Refactoring

Refactoring is the disciplined process of restructuring existing code without altering its observable behavior. Each refactoring move — extract method, move field, replace conditional with polymorphism, and many others — addresses a specific structural problem. The course covers how to chain these small, well-defined transformations into larger architectural improvements while keeping the test suite green at every step.

Automated Detection

Manual inspection does not scale to systems with millions of lines of code. Researchers have developed metric-based heuristics, rule-based detectors, and more recently machine-learning classifiers that flag potential smells automatically. You will study how these tools work, evaluate their precision and recall, and understand the practical trade-offs involved in deploying them on real projects.

The refactoring safety net. Refactoring without adequate tests is dangerous. A recurring theme in this course is the interplay between testing and maintenance: good tests make refactoring possible, and refactoring makes future testing easier.

Bug Report Analysis

Every major open-source project and enterprise system accumulates thousands of bug reports over its lifetime. Managing this volume manually is unsustainable. This section of the course — closely connected to the instructor's own research — investigates how automated techniques can assist developers in triaging, deduplicating, localizing, and prioritizing defect reports.

Bug Triage

When a new report arrives, someone must decide who should fix it, how urgent it is, and whether it duplicates an existing issue. Automated triage systems use text classification and information retrieval to recommend assignees and severity levels, significantly reducing the time maintainers spend on administrative overhead.

Duplicate Detection

Large projects frequently receive multiple reports describing the same underlying defect. Detecting these duplicates requires comparing unstructured natural-language descriptions, often written by different reporters with different vocabularies. Techniques range from traditional TF-IDF similarity to modern neural embedding models.

Bug Localization

Given a bug report, which source files are most likely to contain the fault? Information retrieval approaches treat the report as a query and the codebase as a document collection, ranking files by textual similarity. More advanced methods incorporate version history, stack traces, and program structure to improve accuracy.

60%
of maintenance time spent understanding code
30%
of bug reports are duplicates in large projects
3–5x
cost ratio: fixing a bug post-release vs. during development
1000+
new reports per month in major OSS projects

Mining Repositories for Maintenance

Version control systems, issue trackers, code review platforms, and continuous integration logs are rich sources of historical data. Mining software repositories (MSR) techniques extract actionable insights from this data to guide maintenance decisions.

Change History Analysis

By examining which files tend to change together, how frequently modules are modified, and which areas of the code attract the most bug-fixing commits, maintainers can identify hot spots that deserve attention. Change coupling and code churn analysis are powerful tools for prioritizing refactoring and testing efforts.

Defect Prediction

Statistical models trained on historical defect data can estimate which components are most likely to contain latent faults. Features commonly used in these models include code complexity metrics, change frequency, developer experience on the module, and the age of the code. The course examines the methodological challenges — data leakage, class imbalance, cross-project generalization — that make this a rich area of ongoing research.

Connection to Module 01. If you have taken the Generative AI for Software Development course on CodeLab, you will recognize the MSR pipeline from Module 01: Mining Software Repositories. This section extends those ideas specifically toward maintenance-oriented analysis rather than dataset construction for machine learning.

NLP & ML for Code Comprehension

Understanding unfamiliar code is the single most time-consuming activity in software maintenance. Natural language processing and machine learning offer promising avenues for reducing this burden by automatically generating summaries, answering developer queries, and linking documentation to source code.

Code Summarization

Automatically producing concise natural-language descriptions of what a method or class does can dramatically accelerate the onboarding process for new maintainers. Approaches range from template-based generation to sequence-to-sequence neural models trained on large corpora of code-comment pairs.

Traceability and Linking

Maintenance tasks often require tracing connections between artifacts: linking a requirement to its implementation, a test case to the feature it validates, or a commit message to the issue it resolves. NLP-based traceability recovery uses textual similarity and semantic embeddings to reconstruct these links when they are missing or outdated.

Developer Query Answering

Developers constantly ask questions about code: "Where is this feature implemented?" "Why was this design decision made?" "What will break if I change this?" Research in this area builds systems that answer such questions by jointly reasoning over source code, documentation, and version history.

Verification & Validation

Testing in a maintenance context poses unique challenges. The system already exists, a large test suite is (hopefully) already in place, and the primary concern is ensuring that changes do not introduce regressions or violate existing contracts.

Regression Testing

Running the entire test suite after every change becomes impractical as the system grows. Test selection and prioritization techniques use dependency analysis, change impact analysis, and historical failure data to run the most relevant tests first, catching regressions earlier without wasting resources on tests unlikely to be affected.

Impact Analysis

Before making a change, developers need to understand its potential ripple effects. Static impact analysis traces dependencies through the call graph and data flow; dynamic approaches use execution traces to identify which tests exercise the modified code. Combining these perspectives gives maintainers a more complete picture of what a proposed change might affect.

Test Selection

Choose a subset of tests that are relevant to the change, skipping those that cannot possibly be affected. Reduces feedback time without sacrificing safety.

Test Prioritization

Reorder the full test suite so that tests most likely to reveal faults run first. Useful when you can eventually run everything but want fast initial signal.

About the Instructor

Oscar Chaparro is an Associate Professor of Computer Science at William & Mary, where he directs the Software Evolution and Analysis (SEA) Lab. Beginning in Fall 2026, he holds the Wilson P. & Martha Claiborne Stephens Associate Professorship.

Chaparro earned his PhD from the University of Texas at Dallas in 2019, working with Dr. Andrian (Andi) Marcus on problems at the intersection of natural language processing and software maintenance. His research focuses on understanding and automating the ways developers communicate about defects — particularly through bug reports — and on applying NLP and information retrieval techniques to support software comprehension, traceability, and evolution.

His work has been published at top software engineering venues and has contributed practical tools and datasets to the research community. At William & Mary, Chaparro's teaching bridges the gap between foundational software engineering concepts and the cutting-edge research produced by the SEA Lab.

Learn more. Visit Oscar Chaparro's website for publications, ongoing projects, and information about joining the SEA Lab.

Cross-Course Connections

Software Maintenance & Evolution sits naturally alongside the other courses hosted on CodeLab. While each course has its own focus, the overlap in methods and concerns creates valuable opportunities for deeper understanding.

Generative AI for Software Development

Prof. Mastropaolo's course covers how large language models are trained on code, how to prompt them effectively, and how to evaluate their output. Many of the MSR and NLP foundations explored here — repository mining, code representation, evaluation metrics — are directly applicable to maintenance tasks studied in this course.

Shared Foundations

Both courses draw on mining software repositories, natural language processing for source code, and empirical evaluation methodology. Students who take both will develop a particularly rich perspective on how data-driven techniques are reshaping software engineering practice.

Explore more

← Back to All Courses

Browse the full catalog of learning material available on CodeLab.