# CYB-4203/6203: Secure and Trustworthy AI — Midterm Study Guide

**Midterm Exam**: Monday, March 23, 2026, 12:30 PM
**Coverage**: Units 1-6 (all material through Wednesday, March 4)
**Format**: Two-part comprehensive exam

---

## Exam Structure

The midterm exam has two components:

1. **Common knowledge**: Questions drawn from the course material covered in class (Units 1-6). This study guide covers this component.
2. **Individual knowledge**: Questions specific to each student, drawn from your submitted assignments (Assignments 1-5). Review each of your submissions and be prepared to answer questions about what you wrote, the reasoning behind your analysis, and the concepts you engaged with.

**To prepare**: Use this study guide for the common knowledge component. For the individual component, re-read your own submitted assignments and make sure you can explain and defend your work.

---

## How to Use This Guide

This guide is organized by unit and topic. For each topic, key concepts, terms, and frameworks are listed. Use this guide alongside your lecture notes, presentation slides, and assigned readings to prepare. Bolded terms and frameworks are especially important.

---

## Unit 1 — WHY: Ethics, Dangers, Society, and Accountability

### Topic 1.1: Course Introduction and Objectives

- Course rationale: why security and trustworthiness matter for AI systems
- The dual nature of AI: transformative promise and systemic risk
- The scope of "Secure and Trustworthy AI" as a discipline

### Topic 1.2: Societal Stakes — The Transformative Promise and Risk of AI

- AI's impact across domains: healthcare, finance, defense, education, critical infrastructure
- Concentration of power and resources in AI development
- The speed of AI deployment vs. the pace of governance and regulation

### Topic 1.3: Prominent Failures, Scandals, and Incidents

- Key incidents to know and their lessons:
  - Microsoft Tay (2016) — feedback loop manipulation
  - Amazon hiring algorithm bias
  - COMPAS recidivism prediction and racial bias
  - Clearview AI and surveillance concerns
  - Samsung ChatGPT data leak (2023)
- Pattern recognition: what common factors lead to AI failures?

### Topic 1.4: Influence of AI on Society, Economy, and Geopolitics

- AI and labor market disruption
- Geopolitical competition: U.S.-China AI race, AlphaGo's impact on China's national AI strategy
- AI in defense, intelligence, and critical infrastructure

### Topic 2.1: Philosophical Ethical Frameworks Applied to AI

- **Virtue ethics**: character and moral excellence; what does a "virtuous" AI developer do?
- **Utilitarianism**: greatest good for the greatest number; cost-benefit analysis in AI deployment
- **Deontology**: duty-based ethics; rules and obligations regardless of outcome
- Applying these frameworks to real AI dilemmas (e.g., autonomous vehicles, predictive policing)

### Topic 2.2: Core AI Values

- **Fairness**: equal treatment, non-discrimination, equitable outcomes
- **Transparency**: openness about how AI systems work and make decisions
- **Accountability**: clear responsibility for AI outcomes
- **Privacy**: protecting personal data and individual autonomy
- **Autonomy**: preserving human agency and decision-making
- **Safety**: preventing harm from AI system behavior
- **Sustainability**: environmental and long-term societal impact of AI

### Topic 2.3: AI and Human Rights

- Cross-cultural perspectives on AI ethics
- Legal frameworks protecting rights in AI contexts
- Environmental impact of large-scale AI training and deployment
- AI and surveillance: balancing security with civil liberties

### Topic 2.4: Human-AI Collaboration

- Designing systems that augment rather than replace human capabilities
- Human-in-the-loop, human-on-the-loop, and human-out-of-the-loop paradigms
- Trust calibration: when to trust AI outputs and when to override

---

### Topic 3.1: Unintended Harms — Algorithmic Bias and Discrimination

- Bias in criminal justice (COMPAS), hiring (Amazon), lending, and healthcare
- Sources of bias: training data, feature selection, labeling, evaluation metrics
- Disparate impact vs. disparate treatment
- Feedback loops that amplify existing inequalities

### Topic 3.2: Intentional Misuse

- **Deepfakes**: synthetic media for deception and manipulation
- **Coordinated misinformation**: AI-generated content at scale
- **Surveillance**: facial recognition, predictive policing, social scoring
- **Autonomous cybercrime**: AI-assisted attack generation
- The dual-use nature of AI capabilities

### Topic 3.3: The Alignment Problem and Catastrophic Risks

- **The alignment problem**: ensuring AI systems pursue intended goals
- **Value learning**: teaching AI systems human values
- **Instrumental convergence**: why capable AI systems might develop dangerous sub-goals
- The spectrum from narrow misalignment to existential risk
- AI safety research perspectives and organizations

### Topic 3.4: Responsible Innovation

- Professional obligations in AI development
- Responsible disclosure of AI capabilities and vulnerabilities
- Impact assessments before deployment
- Ethics review processes and institutional oversight

---

### Topic 4.1: International AI Governance

- **GDPR** (EU): data protection, right to explanation, purpose limitation
- **EU AI Act**: risk-based classification (unacceptable, high, limited, minimal risk), requirements for high-risk AI
- Emerging global frameworks: China's AI regulations, UNESCO AI ethics recommendation

### Topic 4.2: U.S. AI Regulation

- **CCPA/CPRA**: California privacy and AI-related requirements
- Executive Orders on AI (Biden administration)
- Sectoral regulation: FDA for medical AI, SEC for financial AI, FTC enforcement actions
- The fragmented U.S. regulatory landscape vs. comprehensive EU approach

### Topic 4.3: Organizational Frameworks

- **NIST AI Risk Management Framework (AI RMF)**: Govern, Map, Measure, Manage functions
- **ISO/IEC 42001**: AI management system standard
- Industry best practices and voluntary commitments
- The relationship between governance frameworks and regulatory compliance

### Topic 4.4: Documentation and Auditability

- **Model cards**: documenting model performance, intended use, and limitations
- **Datasheets for datasets**: documenting data collection, composition, and use
- **Transparency reporting**: regular disclosure of AI system behavior and incidents
- Why documentation matters for accountability and regulatory compliance

---

## Unit 2 — WHAT: Technical and Operational Foundations and Vulnerabilities

### Topic 5.1: Core Architecture of Modern AI/ML Systems

- How AI/ML systems differ fundamentally from traditional software
- **Traditional software**: encoded decisions, deterministic, explicit rules
- **AI/ML systems**: approximated functions, probabilistic, learned from data
- Key comparison dimensions: design philosophy, development process, failure modes, debugging, testing/verification, advantages and use cases
- Biological/neurological/psychological connections to AI system design
- Neural networks and the biological neuron analogy
- Key bio-inspired AI concepts: attention mechanisms, reinforcement learning, experience replay, emergence, continual learning

### Topic 5.2: AI/ML System Lifecycles

- **The 4-stage AI/ML lifecycle**:
  1. **Data Collection and Preparation**: sourcing, cleaning, labeling, feature engineering, data versioning
  2. **Model Training and Evaluation**: architecture selection, hyperparameter tuning, training on GPUs/TPUs, benchmarking
  3. **Deployment and Integration**: inference, model optimization (compression, quantization, distillation), API serving, edge deployment
  4. **Monitoring and Maintenance**: data drift, concept drift, retraining, performance tracking, system prompt patching
- How each stage introduces unique security considerations

### Topic 5.3: Threat Modeling Frameworks and Risk Assessment

- What is threat modeling and why it matters for AI/ML
- Traditional threat modeling concepts adapted for AI systems
- Introduction to the attack surface concept for AI/ML pipelines
- Risk = likelihood x impact

---

### Topic 6.1: Vulnerabilities Across the AI/ML Lifecycle

- The pipeline is the attack surface: every stage has unique vulnerabilities
- **Cross-stage attacks**: planted in one stage, triggered in another
- Key terminology:
  - **Inference**: running a trained model on new inputs
  - **Attack surface**: all points where an attacker can interact with or influence a system
  - **Threat model**: structured analysis of who might attack, how, and what's at stake
- The many meanings of **"adversarial"**: security (attack), training (defensive technique), generation (GANs), game AI (strategy), LLMs (prompting/jailbreaking)

### Topic 6.2: Traditional ML Attack Vectors

**Stage 1 — Data Collection and Preparation Attacks**:
- **Data poisoning**: intentional corruption of training data
  - Data injection, label flipping, clean-label attacks, backdoor insertion via data
- **Supply chain risks in data**: third-party datasets, web scraping risks, PyTorch dependency attack (2022)
- **Training data exposure**: PII/proprietary data memorization and regurgitation
- Real-world examples: medical LLM poisoning, DeepSeek-R1 backdoor, Nightshade, Grok "!Pliny" trigger
- **OWASP CycloneDX ML-BOM**: Software Bill of Materials for ML provenance

**Stage 2 — Model Training and Evaluation Attacks**:
- **Supply chain risks in models**: compromised pre-trained models, poisoned fine-tuning adapters (LoRA/PEFT)
  - PoisonGPT (2023): tampered GPT-J spreading misinformation on Hugging Face
  - Shadow Ray framework vulnerabilities (2024)
- **Backdoor / Trojan implantation**: hidden triggers, clean benchmark performance
  - Planted via poisoned data (Stage 1) or direct parameter manipulation (Stage 2)
  - Activated at inference (Stage 3) — a cross-stage attack
  - Defenses: Neural Cleanse, fine-tuning/pruning
- **Direct parameter manipulation**: modifying model weights post-training
  - PoisonGPT, ONEFLIP (Rowhammer-based single bit flip), ProFlip
- **Sleeper agents** (Hubinger et al., 2024, Anthropic): LLMs trained to be deceptive persist through RLHF/safety training
  - Code backdoor timebombs, standard safety training insufficient
- **Federated learning attacks**: Byzantine gradient attacks, gradient leakage
  - USENIX 2025 SoK: real-world conditions limit some attacks
- **Reward hacking / specification gaming**: gaming the objective without solving the problem
  - CoastRunners boat, Chess LLMs, Q*bert infinite score, RLHF sycophancy
  - Generalizes: 2.6x increase on held-out tasks (METR 2025)
- **Training data memorization**: models memorize verbatim training data, not just patterns

**Stage 3 — Deployment and Integration Attacks**:
- **Adversarial examples**: crafted inputs causing misclassification
  - Perturbation budgets, Lp norms, white-box vs. black-box attacks, transferability
  - Real-world: stop sign stickers (Eykholt 2018), GhostStripe LED attack, Tesla lane-swerving, Proofpoint echospoofing
- **Backdoor/sleeper agent activation**: triggers from Stages 1-2 fire at inference
- **Model extraction / model stealing**: systematic querying to replicate model functionality
  - Tramer et al. (2016), "Thieves on Sesame Street," 100 images to replicate a CNN (Praetorian 2026)
  - Adversarial distillation: OpenAI/DeepSeek controversy
- **Model inversion**: reconstructing training data from outputs (Fredrikson et al. 2015)
- **Membership inference**: determining if a specific record was in training data (Shokri et al. 2017)
- **Resource exhaustion / denial of service**: sponge examples (+200% compute), LLM token exhaustion, OWASP LLM04:2025

**Stage 4 — Monitoring and Maintenance Attacks**:
- **Feedback loop manipulation**: Microsoft Tay, recommendation gaming, VIA attack
- **Monitoring evasion**: clean-label poisoning evading validation, low-rate perturbations below detection thresholds
- **Audit trail integrity**: attacks that corrupt logging and explainability tools

### Topic 6.3: LLM-Specific Vulnerabilities

**Why LLMs Are Qualitatively Different**:
- Natural language is the attack surface — no separation between instruction and data channels
- Emergent capabilities at scale create unpredictable behavior
- No equivalent of parameterized queries for natural language

**The Lethal Trifecta** (Simon Willison, 2025):
- Three capabilities that together create critical risk:
  1. Private data access
  2. Untrusted content exposure
  3. Ability to take external actions
- If an agent has all three, attackers can access and exfiltrate private data
- Most real-world AI agents violate this principle by default

**Agentic AI**:
- Definition: "An LLM agent runs tools in a loop to achieve a goal" (Willison)
- Agents use tools, take actions, make decisions, delegate to sub-agents
- More capability = larger blast radius of compromise
- **Confused deputy problem**: agent acts with server's permissions, not user's

**Model Context Protocol (MCP)** (Anthropic, Nov 2024):
- Open standard for connecting AI to external tools and data
- Security implications: tool descriptions can contain hidden instructions
- Tool poisoning, confused deputy, no universal auth standard

**Prompt Injection** (OWASP LLM01:2025):
- **Direct prompt injection**: user-crafted prompts overriding system instructions
  - "Ignore previous instructions and..." — the canonical pattern
  - Fundamental problem: no reliable instruction/data separation
- **Indirect prompt injection**: hidden instructions in retrieved content
  - RAG poisoning: 5 crafted documents can manipulate responses 90% of the time
  - Resume injection: hidden instructions in job applications
- Why this is fundamentally hard: SQL injection had parameterized queries; no LLM equivalent exists
- Real-world CVEs: GitHub Copilot RCE (CVE-2025-53773, CVSS 9.6), ChatGPT cross-plugin forgery, MCPTox

**Jailbreaking**:
- Many-shot prompting, roleplay/persona (DAN), encoding tricks (Base64), crescendo attacks
- Cat-and-mouse dynamic: new techniques emerge faster than patches deploy
- Anthropic's Constitutional AI and Constitutional Classifiers

**Model and Data Extraction from LLMs**:
- API-based extraction and adversarial distillation
- Anthropic distillation report (Feb 2026): DeepSeek, Kimi K2, MiniMax — 16M+ queries from 24K fraudulent accounts
- Training data extraction: memorization exploitation, divergence attacks (Carlini et al. 2021)
- Legal dimension: NYT v. OpenAI, copyright in training data

**Emerging Agentic Threats**:
- **Excessive agency and tool use exploitation** (OWASP LLM06:2025)
- **OpenClaw case study**: 1,800+ exposed instances, 335 malicious skills (~12% of registry), CVE-2026-25253
- **StrongDM Software Factory**: no human code review, recursive trust problem
- **Slopsquatting**: 20% of LLM-recommended packages don't exist, 43% of hallucinated names repeatable
- **Hallucination as security risk**: fabricated citations, phantom dependencies
- **Context window poisoning**: manipulating long-context inputs, cross-modal injection
- Agent-to-agent attack propagation in multi-agent systems

### Topic 6.4: Threat Modeling Frameworks

- **MITRE ATLAS**: ATT&CK-style matrix adapted for AI/ML systems
  - Tactics, techniques, and real-world case studies
- **OWASP Top 10 for LLM Applications (2025)**:
  - LLM01: Prompt Injection
  - LLM02: Sensitive Information Disclosure
  - LLM03: Supply Chain
  - LLM04: Data and Model Poisoning
  - LLM05: Improper Output Handling
  - LLM06: Excessive Agency
  - LLM07: System Prompt Leakage
  - LLM08-10: Vector/Misinformation/Unbounded Consumption
- **STRIDE adapted for AI/ML**: Spoofing (model impersonation), Tampering (poisoning), Repudiation (unauditable decisions), Information disclosure (extraction), DoS (sponge examples), Elevation of privilege (prompt injection)
- **CSA MAESTRO** (Feb 2025): seven-layer architecture for agentic AI security
- **NIST AI RMF** and **NIST Adversarial ML Taxonomy** (Mar 2025)
- **Building an AI threat model**: identify assets, enumerate threats, assess risk, map mitigations

---

## Key Frameworks and Models to Know

| Framework | Purpose | Key Elements |
|-----------|---------|--------------|
| NIST AI RMF | Risk management governance | Govern, Map, Measure, Manage |
| EU AI Act | Regulatory classification | Risk tiers: unacceptable, high, limited, minimal |
| OWASP Top 10 for LLMs | LLM vulnerability catalog | 10 ranked vulnerability categories |
| MITRE ATLAS | AI/ML attack taxonomy | ATT&CK-style tactics and techniques |
| CSA MAESTRO | Agentic AI security | 7-layer architecture |
| STRIDE for AI/ML | Threat categorization | 6 threat types adapted for AI |
| The Lethal Trifecta | Agentic risk assessment | Private data + untrusted content + external actions |
| 4-Stage AI/ML Lifecycle | System development | Data, Training, Deployment, Monitoring |

---

## Key Comparisons to Understand

1. **Traditional software vs. AI/ML systems**: encoded decisions vs. approximated functions, deterministic vs. probabilistic, explicit debugging vs. interpretability challenges
2. **Direct vs. indirect prompt injection**: user-crafted override vs. hidden instructions in retrieved content
3. **White-box vs. black-box attacks**: full model access vs. query-only access
4. **Data poisoning vs. model poisoning**: corrupting training data vs. directly modifying weights
5. **Backdoors vs. sleeper agents**: trigger-pattern activation vs. context-dependent deceptive behavior
6. **Reward hacking vs. adversarial attack**: alignment failure (self-inflicted) vs. external attack
7. **Model extraction vs. model inversion**: stealing functionality vs. reconstructing training data

---

## Study Tips

- Be able to map attacks to specific stages of the 4-stage AI/ML lifecycle
- Understand how cross-stage attacks work (e.g., backdoors planted in data/training, triggered at deployment)
- Know real-world examples for each major attack type
- Be able to apply threat modeling frameworks (OWASP, ATLAS, STRIDE, MAESTRO) to a given AI system
- Understand the ethical frameworks and be able to apply them to AI dilemmas
- Know the key regulatory frameworks and how they differ (EU comprehensive vs. U.S. sectoral)
- Be prepared to analyze a scenario and identify relevant vulnerabilities, applicable frameworks, and appropriate mitigations

---

**Questions?** Contact me at dallas-elleman@utulsa.edu or visit office hours (by appointment).

---

## Study Guide Creation AI Use Disclaimer

This study guide was drafted using Claude Opus 4.6 and Claude Code. All content was reviewed, verified, and approved by Dallas Elleman, who takes full responsibility for its publication.