Monday, April 27, 2026
Dallas Elleman — Spring 2026
Section 4 — SYNTHESIS
What I worked on, and what I want you to know about
code.claude.com/docs/en/permission-modes#eliminate-prompts-with-auto-mode
Unit 11 — survey only
Unit 12 — survey only
Unit 13 — today's focus
Four sectors where AI security and trust requirements are the most differentiated — each gets a single best-reference URL you can use as a launching pad.
Healthcare
FDA, HIPAA, clinical AI
Finance
SR 11-7, fair lending, fraud
Defense
DoD RAI, dual-use, ethics
Critical Infrastructure
CISA, EO 14110, ICS/OT
SaMD framework + Good Machine Learning Practice
FDA's SaMD classification + the 10 GMLP principles set baseline safety, validation, and clinical-evaluation expectations for any AI/ML medical device.
Predetermined Change Control Plan (PCCP)
Final FDA guidance (December 2024) governs post-market model updates and drift monitoring — the regulatory answer to "the model isn't static after deployment."
Transparency + lifecycle expectations
FDA Transparency Guiding Principles (June 2024) + AI-enabled device lifecycle draft (January 2025) address bias, hallucination risk, and clinician-facing labeling.
Existing model-risk frameworks apply — with gaps
SR 11-7 model risk and third-party-risk guidance carry over to AI; gaps remain for credit unions and generative AI.
AI is dual-use in finance
Amplifies cybersecurity, data-privacy, fair-lending bias, and synthetic-identity fraud risks — while strengthening fraud detection.
Trustworthy deployment essentials
Explainability, bias testing, governance, and continuous monitoring — coordinated across OCC, FDIC, Fed, SEC, CFPB, NCUA.
AI-first warfighting posture
Establishes seven Pace-Setting Projects with mandated cross-Service data access, compute, and talent provisioning — AI moves from "responsible enabler" to capability frontier.
CDAO benchmarks become acquisition criteria
CDAO must publish model-objectivity and trust benchmarks as primary procurement criteria — vendors compete on measured trust, not stated principles.
Subordinate documents still operate
2024 RAI Strategy & Implementation Pathway and DoDD 3000.09 (2023) remain in force as implementing documents under the new top-level strategy.
Four secure-AI principles for OT
Awareness of AI use · threat-informed design · secure-by-design AI · secure AI lifecycle management.
OT-focused, all 16 sectors
Targets operators across the 16 U.S. critical-infrastructure sectors — co-authored with allied cyber agencies (ACSC, CCCS, NCSC-UK, others).
Operationalizes current policy
Implements the July 2025 AI Action Plan's CISA mandate. Effectively supersedes the Biden-era April 2024 DHS guidance (which is still hosted but orphaned by EO 14179).
Several of the references on the prior slides come from two different administrations. Here's the delta.
| Dimension | Biden (2021–Jan 2025) | Trump (Jan 2025–present) |
|---|---|---|
| Overarching EO | EO 14110 (Oct 2023) — risk + safety + civil rights | EO 14179 (Jan 2025) — "Removing Barriers"; revokes 14110. AI Action Plan (Jul 2025). |
| Federal AI use | OMB M-24-10 + M-24-18 | OMB M-25-21 + M-25-22 (Apr 2025); CAIO + risk-mgmt skeleton retained, framing toward speed/innovation |
| Frontier model reporting | DPA-based mandatory reporting + pre-deployment safety-test sharing | Mandatory reporting paused; voluntary CAISI engagement; red-teaming reframed as ideological-bias check |
| "AI Bill of Rights" | OSTP Blueprint (Oct 2022), cited across agencies | Framing dropped; document moved to Biden archive |
| AI Safety Institute | US AISI at NIST (Nov 2023); MOUs with frontier labs | Renamed CAISI (Jun 2025) — standards/competitiveness framing replaces "safety" |
| State preemption | Implicit deference; state experimentation encouraged | Active preemption: Dec 2025 EO + DOJ AI Litigation Task Force (legal force contested) |
| Critical infra AI | DHS/CISA April 2024 guidelines under EO 14110 | CISA Dec 2025 OT principles (joint w/ allies); 2024 doc orphaned but still hosted |
What stayed: NIST AI RMF + GenAI Profile, the institute itself (rebranded), sector regulators, and a rapidly growing body of state AI laws.
Three debates that will shape the regulatory environment your career will live in.
Open Model Release
Transparency vs. proliferation
Foundation Model Regulation
Compute thresholds, evals, audits
Governance Evolution
Voluntary → binding, national → international
Benefits of open weights
Research access, competition, privacy — cited and substantiated.
Marginal misuse risks
CBRN, cyber, CSAM, disinformation — framed as marginal risk over the closed-model baseline, not absolute.
Recommended posture
Active monitoring + evidence collection rather than preemptive restriction; preserve flexibility for future regulatory pivots.
An IPCC-style consensus document for AI
Multilateral: 30 governments + UN, EU, OECD; chaired by Yoshua Bengio; ~100 nominated experts.
Surveys the regulatory toolkit
Capability evaluations, red-teaming, transparency mandates, third-party audits — and the limits of compute thresholds as a policy lever.
Frames the 2025–2026 trajectory
EU AI Act GPAI tier · the AISI network (UK, US, plus follow-ons) · emerging frontier safety frameworks.
Voluntary → binding
NIST AI RMF and OECD AI Principles giving way to enforceable EU AI Act, US executive orders, and state laws.
National → multilateral
AI Safety Institutes proliferating since Bletchley 2023; international AISI network formalized at Seoul 2024 and Paris 2025.
Velocity
Legislative AI mentions up 21.3% across 75 countries in 2024; US state AI laws went from 49 to 131.
Top student-vote topics — deeper treatment
Real production incidents have stopped looking like academic adversarial-example papers. The frontier moved.
Indirect prompt injection
EchoLeak (CVE-2025-32711) was the first zero-click LLM exploit at production scale — emails → Copilot → data exfil. The pattern repeats: attacker text reaches the context window through any retrieval channel.
Tool-call abuse / excessive agency
Agents wired into email, calendar, repos, and browsers gain real-world leverage. 2025–2026 incidents centered on agent tool-misuse, not on "the model said something bad."
Supply-chain attacks on models
Pickle-deserialization RCE in HF model files (PyTorch / TensorFlow loaders); typosquatted model names; poisoned LoRA adapters distributed via community hubs.
RAG-corpus poisoning
Wiki, Drive, ticket-system content treated as authoritative once retrieved — the trust boundary is set at indexing time, not query time. (See Pres 20.)
AI Safety Institutes
UK AISI, US AISI, Japan AISI, Singapore AI Safety Centre, EU AI Office — doing pre-deployment evals on frontier models. Network formalized at Seoul 2024, Paris 2025.
Lab-internal red teams
Anthropic, OpenAI, Google DeepMind, Meta — structured red-team programs publishing capability/safety reports per release. Frontier safety frameworks now standard.
Eval ecosystem
METR, Apollo Research, Pattern Labs, MLCommons AILuminate, plus academic shops (CHAI, MILA, FAR.AI). Evals are professionalizing — with all the methodology debates that implies.
Defensive frameworks landing
NIST AI RMF + GenAI Profile (AI 600-1), MITRE ATLAS, Meta's Agents Rule of Two, OWASP LLM Top 10. Different layers, complementary, none sufficient alone.
Three frontiers worth watching — each will reshape the attack surface in the next 12–24 months.
Agentic Systems
Tool use, MCP, computer use, multi-agent
Mechanistic Interpretability
Sparse autoencoders, circuits, probes
AI for AI Safety
Constitutional AI, debate, scalable oversight
Model Context Protocol (MCP)
Anthropic's open standard for tool/server integration with LLM clients. Now the de-facto agent ↔ tool wiring layer (Claude, Cursor, Continue, ChatGPT, others).
Computer use / browser agents
Claude Computer Use, OpenAI ChatGPT Agent (Operator merged in Aug 2025), Google Gemini Agent (Project Mariner winds down May 4, 2026). Agents that drive a real browser or desktop — full web/app capability + every web/app vulnerability.
Multi-agent orchestration
Long-running agent crews (engineering, research, ops). New problems: cross-agent prompt injection, principal/delegate identity, audit trails across agent boundaries.
Reasoning + extended thinking
GPT-5.5 Thinking (o-series folded into GPT-5 line), Claude Opus 4.7 adaptive thinking, DeepSeek V4-Pro (R1 superseded; V4 collapses chat + reasoner). Models that plan before acting — longer attack chains.
From "black box" to "we can name some circuits"
Sparse autoencoders identify human-interpretable features inside frontier models. Anthropic's Scaling Monosemanticity mapped millions of features in Claude 3 Sonnet; later work extends this to Claude 3.5+ and Llama-class models.
Why it matters for security
If we can detect deception features, sycophancy features, or jailbreak-trigger features at activation time, we get a defense layer that doesn't depend on prompt-text classification.
Where it's still pre-paradigmatic
Feature-finding works; causally intervening to suppress unsafe behavior at scale, in deployed systems, is research-grade. Don't ship interpretability-based safety as your only line.
Scalable oversight
Use AI systems to help humans evaluate other AI systems on tasks humans cannot reliably grade alone. Constitutional AI, RLAIF, debate, recursive reward modeling.
Automated red-teaming
Models that generate jailbreaks, edge cases, and adversarial inputs against other models. Currently used in production at every frontier lab.
Verifiable / formal-method assists
LLMs as theorem-proving assistants (Lean, Coq), policy-as-code generators, and formal-spec drafters — bringing program-verification rigor into AI safety.