Spring 2026 — Core Modules
00
Course Introduction
Welcome to Generative AI for Software Development (CSCI 455/555). What is GenAI, the evolution of AI for code, course structure, tools of the trade, AI usage policy, ethics & responsible AI, and getting set up for the semester.
Open Module
30Slides
4Demos
~40mDuration
GenAI primer
AI4SE landscape
tools of the trade
AI usage policy
ethics
vibe coding
01
Mining Software Repositories
How to collect, clean, and prepare source-code data from public repositories. Covers preprocessing, lexer vs BPE tokenization, ASTs, deduplication with hashing, data splitting, ethics & licensing, and data provenance.
Open Module
32Slides
8Demos
~50mDuration
repositories
preprocessing
tokenization
BPE
ASTs
deduplication
ethics & licensing
02
Probabilistic Source Code Modeling
Probability refresher, MLE, chain rule, n-gram models, Zipf's law, the naturalness hypothesis, perplexity & cross-entropy, smoothing, temperature & sampling strategies, OOV handling, and the path from n-grams to LLMs.
Open Module
28Slides
9Demos
~45mDuration
MLE
n-grams
perplexity
cross-entropy
smoothing
temperature
sampling
03
Evaluating AI-enabled Software Development Techniques
Classification metrics, BLEU (with worked example), ROUGE, METEOR, CodeBLEU, CrystalBLEU, pass@k for code generation, embeddings primer, cosine similarity, contrastive learning, SIDE framework, human evaluation, and common evaluation pitfalls.
Open Module
30Slides
12Demos
~50mDuration
BLEU
ROUGE
pass@k
CodeBLEU
embeddings
contrastive learning
human evaluation
04
Deep Learning for Software Development Foundations
Neural network fundamentals, backpropagation, hyperparameters, non-generative tasks (clone detection, vulnerability prediction), embeddings, LSTMs, GRUs, seq2seq, attention, transformers, autoregressive generation, pre-training vs fine-tuning, and the DL4SE toolkit.
Open Module
34Slides
15Demos
~60mDuration
neural networks
backpropagation
LSTM / GRU
transformers
autoregressive
fine-tuning
CodeBERT
05
Prompting LLMs for Software Development Automation
In-Context Learning, few-shot prompting, chain-of-thought, prompt engineering best practices, RAG, tool use & function calling, context window management, prompt chaining, self-consistency, and evaluating prompt effectiveness.
Open Module
30Slides
10Demos
~50mDuration
ICL
chain-of-thought
RAG
tool use
prompt chaining
self-consistency
06
Hallucinations in Coding Tasks
How LLMs generate code, temperature & sampling as a hallucination factor, the CodeHalu taxonomy, spot-the-hallucination exercises, RAG for mitigation, prompt engineering defenses, tool-augmented generation, production case studies, and building hallucination-resistant workflows.
Open Module
30Slides
7Demos
~50mDuration
LLM generation
temperature
CodeHalu
RAG mitigation
tool-augmented
production cases
Extra Module
GA
Genetic Algorithms & LLMs
GA fundamentals (selection, crossover, mutation), the evaluation bottleneck in code generation, why CodeBLEU is not enough, fitness approximation with LLM predictors (95% accuracy, 5 orders of magnitude speedup), the GA+LLM architecture, honest limitations, and research frontiers.
Open Module
30Slides
6Simulations
~45mDuration
population
selection
crossover
fitness approximation
evaluation bottleneck
GA + LLM