Start Here

Your orientation guide to CodeLab — three courses, three research labs, and one shared mission: mastering AI-powered software development.

What is CodeLab?

CodeLab is an open learning hub at William & Mary where three research labs come together to teach the intersection of artificial intelligence and software engineering. Whether you want to build with AI, understand the full software lifecycle, or learn how production systems age and get renewed — there is a path for you here.

Three Courses, One Ecosystem

Generative AI for Software Development

Prof. Antonio Mastropaolo · AURA Lab

  • Language models for code
  • Prompt engineering & evaluation
  • Hallucination detection

Software Engineering

Prof. Denys Poshyvanyk · SEMERU Lab

  • Requirements to deployment
  • Design patterns & testing
  • Deep learning in SE practice

Software Maintenance & Evolution

Prof. Oscar Chaparro · SEA Lab

  • Code smells & refactoring
  • Bug report analysis
  • NLP/ML for comprehension

Generative vs. Discriminative AI

A concept that runs across every course: Generative AI learns the underlying probability distribution of data and produces entirely new outputs. Discriminative AI learns decision boundaries — it classifies and labels but does not create.

  • Generative — code completion, documentation writing, image synthesis
  • Discriminative — spam detection, sentiment analysis, bug classification
Key Insight: GenAI models learn probability distributions over data and sample from them to generate new outputs. This is why the same prompt can produce different results each time — the model is sampling from a distribution, not looking up an answer.

The Evolution of AI for Code

AI-assisted programming did not appear overnight. Decades of research built the foundation for today’s large language models.

1970s – 1990s

Rule-Based Systems

Expert systems and static analysis tools relied on hand-crafted rules written by domain experts. Effective for narrow tasks like linting and style checking, but brittle and unable to generalize.

2000s

Statistical Methods

N-gram models and probabilistic approaches treated source code as a natural language. Researchers discovered that code is even more predictable than English text — a key insight explored in Module 2.

2010s

Deep Learning

Recurrent neural networks (RNNs) and sequence-to-sequence models enabled code completion, bug detection, and code summarization. The introduction of the Transformer architecture in 2017 was a turning point.

2020s

Large Language Models

GPT, Codex, CodeLlama, StarCoder — models trained on billions of lines of code can now generate entire functions, explain complex code, and assist with debugging. This is the era we focus on.

Across the Hub: Each CodeLab course addresses a different slice of this timeline. Mastropaolo’s course dives deep into the LLM era. Poshyvanyk’s course covers the full lifecycle — from rule-based to modern DL applications. Chaparro’s course focuses on the long tail: maintaining and evolving the software those tools help you build.

GenAI in Action: Before & After

What does AI-assisted development actually look like in practice?

Before GenAI

  • Search & Copy — browse Stack Overflow, copy snippets, adapt them manually. 15–30 min per problem.
  • Write Boilerplate — manually write CRUD endpoints, data models, config files.
  • Debug Alone — read error traces, add print statements, search forums for hours.
  • Write Tests Manually — think through edge cases yourself. Coverage was often an afterthought.

With GenAI

  • AI Autocomplete — Copilot suggests the next line or entire function as you type.
  • Scaffold Features — describe what you need in plain English. AI generates a working starting point in seconds.
  • AI-Assisted Debugging — paste the error into an LLM. Get an explanation and suggested fix instantly.
  • Generated Test Suites — ask AI to generate unit tests. It identifies edge cases you might have missed.
Important: AI does not replace your need to understand the code. It accelerates your workflow, but you are still the engineer. This platform teaches you to be a critical collaborator, not a passive consumer.

The AI + Software Engineering Landscape

Generative AI is not a distant future — it is transforming software development right now.

92%
of developers use AI coding tools
55%
faster task completion
$30B+
AI coding tools market size
150M+
GitHub Copilot users

What’s Working

  • Code autocomplete — accepted suggestions save significant keystrokes
  • Boilerplate generation — CRUD, configs, and scaffolds in seconds
  • Documentation — generating docstrings and README files
  • Debugging assistance — explaining errors and suggesting fixes

The Reality Check

  • Complex logic — AI still struggles with multi-step reasoning
  • Security — generated code often contains vulnerabilities
  • Testing — AI tests frequently miss critical edge cases
  • Architecture — system-level design remains a human skill

AI Across the Software Lifecycle

Requirements Design Coding Testing Code Review Documentation Deployment
Hype vs. Reality: Headlines claim AI will replace developers. The research tells a different story: AI is a powerful amplifier for skilled engineers, but it cannot replace understanding, judgment, and creativity.

What Can Go Wrong?

AI-assisted development brings real risks. Understanding these upfront makes you a more responsible and effective engineer.

Code Hallucinations

AI can confidently generate code that calls APIs that do not exist, uses deprecated methods, or invents function signatures.

  • Fabricated library functions
  • Incorrect API parameters
  • Plausible but wrong logic

Security Vulnerabilities

AI-generated code frequently contains security flaws: SQL injection, XSS, hardcoded secrets, and improper input validation.

  • Insecure default configurations
  • Missing input sanitization
  • Exposed credentials in examples

Over-Reliance

Developers who lean too heavily on AI risk losing fundamental skills. If the AI is down or wrong, can you still solve the problem?

  • Atrophy of debugging skills
  • Reduced deep understanding
  • Dependency on AI availability

Licensing & IP

AI models trained on open-source code may reproduce copyrighted snippets. The legal landscape is still evolving.

  • Copilot lawsuit precedent
  • License compliance questions
  • Attribution requirements

AI models also reflect the biases in their training data — generating code that follows outdated patterns, reinforces non-inclusive variable naming, or performs poorly on underrepresented programming languages and paradigms.

Tools You’ll Use

Throughout CodeLab, you will work with a variety of AI models and tools — both commercial and open-source.

GPT

OpenAI · Commercial

OpenAI's models for code generation, analysis, and reasoning.

Claude

Anthropic · Commercial

Strong coding, analysis, and long-context capabilities.

GitHub Copilot

GitHub · IDE-integrated

AI pair programmer for real-time code suggestions in your editor.

CodeLlama / StarCoder

Meta / BigCode · Open Source

Open-source code LLMs you can run locally, fine-tune, and study.

Jupyter + Python

Open Source

Your primary workspace for experiments, model evaluation, and data analysis.

HuggingFace

Open Source Platform

Access pre-trained models, datasets, and the Transformers library.

Commercial vs. Open-Source

Commercial (GPT, Claude)Open Source (CodeLlama, StarCoder)
StrengthsHighest performance, large context, multimodal, regular updatesFree, self-hosted, full data privacy, fine-tunable
TradeoffsAPI costs, data sent to third parties, rate limitsRequires GPU hardware, generally lower performance on complex tasks
Best ForComplex generation, debugging, rapid prototypingResearch, fine-tuning, privacy-sensitive projects

What is Vibe Coding?

A development approach where you describe what you want in natural language and iterate with AI until the code works.

1

Describe Your Intent

Write a clear, detailed prompt describing what you want to build. Include constraints, technologies, and expected behavior.

2

AI Generates Code

The LLM produces a first draft — often a working scaffold with routing, data models, and basic UI. This is your starting point, not your final product.

3

Test and Evaluate

Run the generated code. Does it work? Does it handle edge cases? Identify what is correct, what is broken, and what is missing.

4

Refine the Prompt

Based on your evaluation, refine your instructions. Be more specific about what went wrong. Iterate until the code meets your requirements.

5

Integrate and Deploy

Once the components work, integrate them into your application. Add manual refinements, write tests, and document which parts were AI-generated.

Vibe coding is not about being lazy — it is about being strategically efficient. The best vibe coders understand the code AI generates, can debug it when it breaks, and know when to write code manually instead.

Ethics & Responsible AI

As future engineers, you will shape how AI is used in software development. These are the ethical considerations you need to understand.

Bias in Training Data

AI models learn patterns from their training data — including biases.

  • Non-inclusive variable naming patterns
  • Underperformance on non-English codebases
  • Reinforcing outdated practices

Environmental Cost

Training large language models requires massive computational resources with significant carbon footprints.

  • LLM training: ~$100M+ in compute
  • Inference costs scale with usage
  • Push toward efficient, smaller models

Job Displacement

The evidence points to augmentation, not replacement — but the nature of development work is changing.

  • Augmentation vs. full replacement
  • Shifting skill requirements
  • New roles: prompt engineer, AI auditor

Intellectual Property

AI models trained on open-source code raise complex legal questions about copyright and attribution.

  • The GitHub Copilot class-action lawsuit
  • Fair use in model training
  • License compliance in generated code
This platform teaches you not just how to use AI for code — but to think critically about when and whether you should.

Prerequisites & Getting Set Up

Required

  • Python Proficiency — comfort with functions, classes, file I/O, and pip/conda
  • Basic Data Structures — lists, dictionaries, trees, graphs
  • Git Basics — clone, commit, push, pull, branch

Helpful but Not Required

  • Probability & Statistics — useful for understanding language models and evaluation metrics
  • Linear Algebra — helps with neural network internals (we provide intuitions)
  • Prior ML Exposure — familiarity with training/testing, overfitting, and loss functions

Environment Checklist

Click each item to mark it as complete. Your progress is saved locally.

Python 3.8+ installed — verify with python --version
Jupyter Notebook ready — install via pip install jupyterlab
Git installed & configured — set your name and email with git config
GitHub account created — sign up at github.com
VS Code + Copilot extension — free for students via GitHub Student Developer Pack
API keys obtained (OpenAI, HuggingFace) — free tiers available
0 / 6 ready

Three Courses, Three Paths

CodeLab brings together three complementary courses. Each addresses a different dimension of AI-powered software development — pick the path that matches your goals, or explore all three.

Learning Paths

Each course follows a distinct trajectory. Here is how topics flow across all three:

AI for Code
MSR Code Models Eval Metrics DL Foundations Prompting Hallucinations Genetic Alg.
SE Lifecycle
Requirements Design Version Control Testing Agile Mining DL in SE
Maintenance
Quality Code Smells Refactoring Bug Reports Repo Mining NLP for Code
Shared Foundations: Several topics appear across multiple courses — mining software repositories, testing & code quality, NLP for code, and evaluation metrics. These shared foundations reinforce each other no matter which path you take first.

Where to Start

New to Software Engineering?

Start with Prof. Poshyvanyk’s SE course. It covers the full lifecycle from requirements to deployment and gives you the foundational vocabulary for everything else.

  • Best first course for undergraduates
  • Covers design, testing, and agile

Want to Build with AI?

Jump into Prof. Mastropaolo’s GenAI course. Seven hands-on modules take you from mining repositories to prompting LLMs and detecting hallucinations.

  • Ideal if you know SE basics
  • Hands-on with real AI tools

Already Shipping Code?

Head to Prof. Chaparro’s Maintenance & Evolution course. Learn how production systems age, how to assess and improve code quality, and how NLP can help.

  • Great for working developers
  • Focuses on real-world codebases