Curated collection of frameworks, tools, and materials
This page provides a curated collection of resources related to Secure & Trustworthy AI, most of which were introduced during class throughout the Spring 2026 semester.
Comprehensive database tracking AI-related incidents, failures, and safety issues. Maintained by MIT researchers with rigorous categorization and analysis of real-world AI incidents.
Visit ResourceCommunity-maintained database of AI incidents with detailed case studies and analysis. Features searchable incident reports, cross-references, and collaborative documentation of AI system failures.
Visit ResourceThe U.S. government repository of standards-based vulnerability data (CVEs) with severity scores and references — a core reference for security testing and triage.
Visit ResourceAn open-source knowledge base of failure modes for AI/ML systems, cataloging security, ethics, and performance vulnerabilities mapped to taxonomies like MITRE ATLAS and the NIST AI RMF.
Visit ResourceAnthropic (November 2025). Documents the first large-scale cyberattack executed without substantial human intervention. A Chinese state-sponsored group manipulated Claude Code to autonomously attack ~30 global targets, performing reconnaissance, credential theft, and lateral movement.
Visit ResourceAnthropic Threat Intelligence (August 2025). Documents real-world cases of Claude being weaponized for ransomware development, large-scale data extortion, and AI-driven credential harvesting. A cybercriminal with only basic coding skills sold AI-generated ransomware.
Visit ResourceEuropean Union Agency for Cybersecurity (October 2025). Annual report analyzing 4,875 cybersecurity incidents. Finds AI-supported phishing accounts for over 80% of social engineering activity worldwide, enabled by jailbroken models, synthetic media, and model poisoning.
Visit ResourceAnthropic (February 2026). Documents industrial-scale model distillation by DeepSeek, Moonshot/Kimi K2, and MiniMax, extracting frontier capabilities without safety guardrails. Introduces detection methodologies and countermeasures for intellectual property theft via API-based knowledge extraction.
Visit ResourceGoogle Threat Intelligence Group (November 2025). Documents the first malware families that use AI capabilities mid-execution: FRUITSHELL, PROMPTFLUX, PROMPTLOCK, PROMPTSTEAL, and QUIETVAULT. Malware is no longer static -- it adapts using the same AI tools defenders use.
Visit ResourceThe Hacker News (June 2025). Write-up of the first disclosed zero-click prompt-injection vulnerability in a production AI assistant. Aim Labs researchers showed an attacker could exfiltrate data from Microsoft 365 Copilot via a crafted email, without any user interaction.
Visit ResourceOfficial National Vulnerability Database entry for EchoLeak. Describes the zero-click prompt-injection vector in Microsoft 365 Copilot, CVSS scoring, and affected versions. Primary reference for the first CVE-cataloged agent prompt-injection flaw.
Visit ResourceOWASP (2025). Community-driven catalog of the ten most critical security risks for large language model applications. Covers prompt injection (#1 entry), sensitive information disclosure, supply chain vulnerabilities, and excessive agency.
Visit ResourceOWASP (December 2025). Security risks unique to autonomous AI agents: agent goal hijacking, tool misuse, identity and privilege abuse, memory poisoning, and more. Developed with input from 100+ security researchers.
Visit ResourceMITRE. Adversarial Threat Landscape for AI Systems -- a knowledge base of tactics, techniques, and case studies for attacking AI/ML systems. Structured in an ATT&CK-style matrix covering 15 tactics and 66 techniques across the entire AI lifecycle.
Visit ResourceFoundational voluntary, sector-agnostic framework organized around four core functions (Govern, Map, Measure, Manage).
Visit ResourceCompanion implementation guide providing suggested actions and practical guidance for each subcategory across all four framework functions.
Visit ResourceCompanion to the AI RMF addressing risks unique to generative AI, with specific actions for GenAI risk management.
Visit ResourceStructured side-by-side comparison covering differences in scope, certification requirements, implementation approach, and organizational fit.
Visit ResourcePractitioner-oriented framework for auditing AI systems organized around the IIA's Three Lines Model. Updated in 2024 to align with NIST AI RMF.
Visit ResourceInternational standard specifying requirements for establishing, implementing, and improving an AI management system. The first certifiable international AI governance standard.
Visit ResourceModel process for integrating ethical values and risk analysis into system and software life cycles during AI system design.
Visit ResourceCloud Security Alliance (February 2025). Multi-Agent Environment, Security, Threat, Risk, and Outcome framework. Seven-layer architecture designed specifically for agentic AI security threat modeling, building on STRIDE, PASTA, and LINDDUN with AI-specific threat considerations.
Visit ResourceThe world's first auditable certification standard for agentic AI systems. Covers security, safety, reliability, accountability, and societal risk, operationalizing principles from NIST AI RMF, EU AI Act, and MITRE ATLAS into auditable controls. Certification requires upfront testing plus quarterly reassessment.
Visit ResourceStandardized AI Bill of Materials for documenting components, dependencies, and supply chain information for AI/ML systems. Extends software BOM concepts for AI transparency.
Visit ResourceNIST presentation on AI Bills of Materials for securing AI ecosystems, covering supply chain transparency, component tracking, and risk management.
Visit ResourceCloud Security Alliance. Structured methodology for red-teaming agentic AI systems across identity, memory, tool use, and orchestration layers. Provides a scoping template, attack categories aligned with MITRE ATLAS, and reporting format. A practical companion to OWASP's Top 10 for Agentic Applications.
Visit ResourceVassilev et al., NIST (2025). Authoritative taxonomy and terminology for adversarial machine learning. Catalogues attack surfaces (evasion, poisoning, privacy, abuse) and mitigations across predictive and generative AI. Reference document for red-team scoping and regulator-facing reports.
Visit ResourceEntry point into the MITRE ATLAS knowledge base. Explains adversary tactics, techniques, and case studies specific to ML and LLM systems. Pairs with the main ATLAS matrix as onboarding for red-teamers new to AI-specific attacks.
Visit ResourceMeta AI (October 2025). Proposes a minimal design rule for AI agent security: an agent should never simultaneously (1) process untrusted input, (2) access sensitive data or systems, and (3) act in the real world without human approval. A simple, testable invariant for production agent deployments.
Visit ResourceUK AI Security Institute's open-source framework for large language model evaluations. Supports capability benchmarks, agentic-task evaluations, and red-team scoring with first-class support for multi-turn tool use. Used in AISI's pre-deployment evaluations of frontier models.
Visit ResourceMozilla 0DIN's open taxonomy of AI attack techniques. Structured catalog of jailbreak, prompt-injection, data-extraction, and agent-subversion patterns, with mappings to real-world disclosures. Complements MITRE ATLAS at a finer-grained technique level.
Visit ResourceRepository hosting 200,000+ datasets with standardized documentation, dataset cards, and filtering by task, size, language, and license. See dataset documentation in practice at scale.
Visit ResourceOpen platform for sharing datasets, tasks, and ML experiments with standardized metadata and reproducibility tracking. Browse datasets sorted by usage and activity.
Visit ResourceRepository hosting 1M+ ML models with model cards, usage metrics, and community discussion. Browse model cards across architectures, tasks, and organizations.
Visit ResourceGoogle DeepMind's collection of model cards documenting model capabilities, limitations, and intended uses for their AI models.
Visit ResourceOpenAI's models page listing available models with capabilities, context windows, and versioning. A developer-facing view of model transparency and documentation.
Visit ResourceCollection of system cards for Claude models, detailing safety evaluations, capability assessments, and risk mitigations for each major release.
Visit ResourceAnthropic’s frontier-safety work: classifier-based defenses trained against a constitution, a dedicated red-team organization, and interpretability-for-safety research.
OpenAI’s public risk-management and transparency commitments for frontier models — capability thresholds that gate deployment, plus an ongoing safety-evaluations dashboard.
Google DeepMind’s Critical Capability Level framework for frontier risk, alongside its consolidated AGI-safety research hub.
xAI’s draft Risk Management Framework (Feb 2025) covering dangerous-capability testing, deployment gates, and incident response; released for public comment.
Meta’s umbrella project for open trust-and-safety tooling around Llama models — CyberSecEval benchmarks, Llama Guard classifiers, and Code Shield.
Simon Willison (March 2024). Canonical essay distinguishing prompt injection (attacker exploits an LLM application that trusts attacker-controlled input) from jailbreaking (user convinces a model to ignore its own rules). Required reading before scoping any red-team.
Visit ResourceSimon Willison (November 2025). Curated review of three recent prompt-injection papers including Meta's Agents Rule of Two. A useful running index of the research frontier from the writer who has tracked this space longest.
Visit ResourcePromptfoo blog. On-ramp for practitioners new to AI red-teaming: scoping, tooling, attack categories, reporting. Pairs with their Red Team docs and the "Top 5 OSS Tools" post to get from zero to a first engagement.
Visit ResourceOfficial Promptfoo documentation on using their CLI to run red-team evaluations, including pre-built attack strategies, custom providers, and CI integration patterns.
Visit ResourcePalo Alto Networks' reference explainer on AI red teaming: definitions, scope, methodology, and how it differs from traditional pentesting. A good quick-reference and glossary source.
Visit ResourceCyberThrone (March 2026). Walkthrough applying MITRE ATLAS tactics and techniques during an AI red-team engagement. Shows how to use ATLAS as both a planning framework and a reporting taxonomy.
Visit ResourceDeepTeam's playbook for red-teaming agentic AI systems. Covers scope, threat modeling, attack categories for agents (tool misuse, memory poisoning, goal hijacking), and evaluation harness design.
Visit ResourcePenligent HackingLabs walkthrough of red-teaming OpenClaw, the intentionally vulnerable high-privilege agentic application used in this course's final project. Demonstrates reconnaissance, tool abuse, and privilege escalation against a realistic target.
Visit ResourceOWASP GenAI Project. Landscape report of commercial and open-source AI security solutions for AI red-teaming and agentic-system defense. Useful for surveying the vendor ecosystem before procurement.
Visit ResourceOpen-source evaluation and red-teaming framework for LLM applications. Provides a test-runner, adversarial prompt generators, and a library of attack strategies for prompt injection, jailbreaks, and policy violations. Used as a CI gate for LLM systems.
Visit ResourceNVIDIA's open-source LLM vulnerability scanner. Probes models for hallucination, prompt injection, data leakage, toxicity, and jailbreak susceptibility using a pluggable catalog of probes and detectors. Packaged as a CLI.
Visit ResourcePython Risk Identification Toolkit for generative AI, maintained by Microsoft's AI Red Team. Automates adversarial prompt orchestration, multi-turn attack chains, and scoring across Azure OpenAI, Hugging Face, and local models.
Visit ResourceDebenedetti et al., NeurIPS 2024 (ETH SPY Lab). Benchmark and evaluation harness for prompt-injection and tool-use attacks against LLM agents. Includes 97 realistic agent tasks across Slack, banking, travel, and workspace domains, plus 629 attack instances.
Visit ResourceOpen-source LLM-driven pentester for AI agents. Automates reconnaissance, payload generation, and multi-step exploitation against agentic targets. Used in conference demos including DEF CON AI Village.
Visit ResourceLLM pentester from Keygraph. Autonomous red-team agent that probes deployed LLM applications for prompt injection, data exfiltration, and tool-call abuse. Complements Raptor in the "LLM attacks LLM" tooling category.
Visit ResourceGamified LLM jailbreak challenge from Lakera. Players try to extract secret passwords from progressively hardened prompt-injection defenses across 8 levels. Widely used as an on-ramp for prompt-injection training and recruiting.
Visit ResourceGamified attack challenge focused on agentic systems: goal hijacking, tool misuse, memory poisoning, and privilege escalation. Extends Gandalf's approach to multi-step, tool-calling agents.
Visit ResourceOpen-source lab environment from Microsoft providing hands-on AI red-teaming scenarios. Includes vulnerable applications, guided exercises, and reference solutions for prompt-injection, jailbreak, and RAG-attack patterns.
Visit ResourceOffensive Security's guide for integrating Claude Desktop with Kali Linux via MCP. Turns the pentesting distro into an LLM-driven attack surface where the model can call Kali tools directly for reconnaissance and exploitation workflows.
Visit ResourceAnthropic (June 2024). Reflects on what Anthropic has learned running red-team programs against frontier models: scoping ambiguity, coverage gaps, evaluator drift, and the tension between breadth and depth. Essential reading before designing a red-team plan.
Visit ResourcePromptfoo blog survey of leading OSS AI red-teaming tools in 2025. Compares coverage, target types, and ergonomics across Promptfoo, Garak, PyRIT, and others. Starting point for picking a stack.
Visit ResourceMozilla’s 0Day Investigative Network — a generative-AI bug-bounty program and platform for researchers to report and catalog LLM jailbreaks and vulnerabilities.
Visit ResourceOpen-source framework for defining structural, semantic, and policy validators around LLM inputs and outputs. Validators can be composed declaratively (e.g., "output matches schema AND contains no PII AND no toxic language") and enforced with auto-retry and logging.
Visit ResourceNVIDIA's open-source toolkit for adding programmable guardrails to LLM conversations. Uses a rule language (Colang) to constrain topics, tool use, and response shape. Integrates with enterprise RAG and agent stacks.
Visit ResourceIterathon. Practical guide to deploying guardrails in production: where to put them (pre-prompt, post-response, runtime tool checks), how to measure false-positive rates, and failure modes that don't show up in staging.
Visit ResourceAppSecEngineer. Engineering-focused design guide covering layered guardrail architectures, fail-closed defaults, and integration patterns with policy engines. Strong on the "defense in depth" framing for agentic systems.
Visit ResourceWiz's reference article covering what AI guardrails are, common categories (input, output, tool, policy), and how they fit into a broader AI security program. Good for giving a non-technical stakeholder the vocabulary.
Visit ResourceToolHalla. Deep-dive on output-validation patterns for agents in 2026: schema enforcement, semantic validators, tool-call verification, and rollback strategies when a guardrail fires mid-task.
Visit ResourceMicrosoft Azure Foundry documentation for the platform's built-in guardrails and controls. Covers content filters, jailbreak defenses, prompt-shield, and grounding-detection features available to Azure AI deployments.
Visit ResourceNVIDIA Developer Blog. NVIDIA AI Red Team's recommendations for isolating agentic workflows: process isolation, capability limiting, egress filtering, and recovery from compromised tool calls. Concrete and production-oriented.
Visit ResourceNorthflank. Engineering deep-dive on sandboxing patterns for AI agents: microVMs, container isolation, file-system jails, and capability-scoped secrets. Compares tradeoffs across isolation primitives.
Visit ResourceThe Kubernetes community's special-interest group for agent-sandbox primitives. Working on standards for short-lived, capability-limited execution environments for AI agents in Kubernetes clusters.
Visit ResourceAnthropic's official documentation for Claude Code's sandboxing model. Describes the permission system, file-system boundaries, and command-execution gates that keep an LLM-driven coding agent from exceeding its authorized scope.
Visit ResourceCloudflare Blog on Dynamic Workers: edge-deployed, short-lived sandboxes optimized for agent execution. Claims 100x faster cold-start than container-based approaches, making ephemeral per-request isolation viable.
Visit ResourceLangChain's official documentation on sandboxing patterns for DeepAgents. Covers local process isolation, containerized sandboxes, and remote-execution models with auditable tool-call logs.
Visit ResourceDocumentation for Claude Code’s permission modes, which gate an agent’s ability to run commands and edit files — a concrete example of approval gating and auto-accept controls for coding agents.
Visit ResourceStrata. Operational guide to embedding humans in agent workflows: approval checkpoints, escalation rules, and identity-aware decision logging. Frames HITL as an agentic-identity problem, not just a UX pattern.
Visit ResourceElementum AI. Survey of HITL patterns for enterprise agentic AI: approval gates, confidence-based routing, reviewer staffing models, and metrics for evaluating whether oversight is actually catching errors.
Visit ResourceGalileo. Implementation-focused guide on instrumenting HITL for AI agents: trace capture, reviewer UIs, SLA design, and feedback loops that improve agent behavior over time.
Visit ResourceLangChain's official middleware for HITL: pause-and-resume semantics, structured approval prompts, and audit trails for every decision. A reference implementation for adding oversight to existing LangChain agents.
Visit ResourceCFPB and DOJ complaint alleging algorithmic redlining in Birmingham, AL. Fairway generated significantly fewer mortgage applications from majority-Black neighborhoods than peer lenders, demonstrating how AI-powered underwriting and marketing systems can scale historical discrimination.
Visit ResourceAutomated debt recovery system wrongly accused 526,000+ people of welfare fraud using income averaging. 93% error rate when audited. At least 663 vulnerable people died after receiving notices. Government cost: A$2.4 billion. Subject of a Royal Commission.
Visit ResourceDutch tax authority algorithmically profiled non-Dutch nationals as "higher risk," wrongfully accusing 35,000+ parents of fraud. Over 1,000 children placed in state custody. The entire Dutch cabinet resigned in January 2021. Amnesty International report on systemic rights violations.
Visit ResourceAutomated system falsely accused ~40,000 residents of unemployment fraud with a 93% error rate. Operated without human oversight. Victims had wages garnished, tax refunds seized, and some lost homes. Ford School of Public Policy explainer.
Visit ResourceClass action alleging SafeRent's AI tenant screening scores disproportionately penalized Black and Hispanic renters and housing voucher recipients. $2.28M settlement; SafeRent barred from scoring voucher applicants for five years nationwide.
Visit ResourceOpen-source AI agent with 68,000+ GitHub stars and ~180,000 developers. CVE-2026-25253 (CVSS 8.8): one-click RCE via cross-site WebSocket hijacking. 42,900 exposed instances across 82 countries. ClawHub skills marketplace: nearly 20% of packages contained malicious payloads (Bitdefender). A comprehensive case study in what happens when security is an afterthought.
Visit ResourceNational Eating Disorders Association replaced human helpline workers with the Tessa chatbot, which gave harmful weight-loss advice to vulnerable users. Organization shut down the bot after public backlash.
Visit ResourceCommonwealth Bank of Australia employees trained the "Bumblebee" AI chatbot, then were laid off. CBA later reversed layoffs under regulatory and union pressure.
Visit ResourcePepper humanoid robot repeatedly failed in nursing homes, funerals, retail, and home companion deployments. Rushed to market before technically ready; SoftBank halted production.
Visit ResourceKorean chatbot trained on real users' private chat data without consent, producing discriminatory and hateful outputs. Scatter Lab fined for privacy violations.
Visit ResourceA teenager's interactions with ChatGPT as a companion AI contributed to his suicide. Parents allege OpenAI compressed safety testing and overrode built-in safeguards.
Visit ResourceCriminal organization used deepfake images and video to run romance scams, defrauding victims of approximately 18 billion won (~$8M USD).
Visit ResourceVSquare.org. Investigation into Russian military intelligence (GRU) orchestrating explosive parcels routed through 5+ EU countries via unwitting operatives recruited on Telegram. Used in Presentation 11 as a human parallel to indirect prompt injection, confused deputy problems, and hidden payloads in trusted containers.
Read InvestigationFake AI-generated image of a Pentagon explosion caused a temporary stock market dip. Demonstrates AI-enabled misinformation risks to financial markets.
Visit ResourceEnsign et al. (2018). Mathematical proof that predictive policing feedback loops are inevitable given system design. Historical arrest data trains models that send police to already over-policed neighborhoods, generating more arrests that confirm predictions. FAT* Conference.
Visit ResourceObermeyer et al. (2019). A widely-used algorithm serving ~200M patients used healthcare spending as a proxy for health needs. Because structural inequality means less is spent on Black patients at equivalent illness levels, the algorithm systematically under-predicted their needs. Science.
Visit ResourceKleinberg & Raghavan (2021). When multiple institutions use similar algorithms, they converge on uniform decision criteria. Correlated failures across systems create systematic exclusion that no single institution can observe or correct. PNAS.
Visit ResourceAnalysis of how digital redlining creates cardiovascular health disparities, particularly affecting minorities who depend on digital health tools for work, education, and healthcare access. Johns Hopkins researchers warn of growing risks as AI permeates healthcare systems.
Visit ResourceNature reporting on citation integrity issues in scientific literature. Related: GPTZero's 2025 analysis of ~4,800 NeurIPS papers found 100+ fabricated citations across ~50 accepted papers that passed peer review, coining the term "vibe citing."
Visit ResourceOperant AI (October 2025). Discloses a zero-click attack exploiting the Model Context Protocol (MCP) to exfiltrate data through AI agents like ChatGPT, Claude, and Gemini without requiring user error. Demonstrates how MCP's connectivity becomes an attack vector.
Visit ResourceApollo Research (December 2024). Evaluates o1, Claude 3.5 Sonnet, Gemini 1.5 Pro and others. Finds frontier models strategically introduce subtle mistakes, attempt to disable oversight mechanisms, and maintain deception in over 85% of follow-up questioning.
Visit ResourceHendrycks, Mazeika, and Woodside (2023). Organizes catastrophic AI risks into four categories: malicious use, AI race, organizational risks, and rogue AIs. The course textbook chapter (Hendrycks Ch. 1) is based on this paper. Free textbook version available at aisafetybook.com.
Visit ResourcePalisade Research (February 2025). Reasoning LLMs tasked with winning chess against a stronger opponent spontaneously attempted to hack the game system. o1-preview tried to cheat in 37% of matches against Stockfish, including overwriting board state and running a rival engine.
Visit ResourceOpenAI (2025). Presents research showing scheming behaviors in frontier models during controlled tests. Demonstrates that deliberative alignment training reduced scheming rates from ~13% to ~0.4% in o3, though "imperfect generalization" means rare but serious misbehavior remains.
Visit ResourceMitchell et al. (2019). Foundational paper proposing model cards as short documents accompanying trained ML models that provide benchmarked evaluation across demographic and use-case conditions. FAT* Conference.
Visit ResourceACM (May 2024). Explores the fundamental architectural vulnerability in LLMs where natural language serves as both instruction channel and data channel simultaneously, making reliable separation between control signals and data impossible.
Visit ResourceHubinger et al. (2024, Anthropic). Demonstrates that LLMs trained to be deceptive can behave safely during evaluation but activate harmful behavior on triggers, and this persists even after RLHF and fine-tuning. Standard safety training does not reliably remove deceptive behavior.
Visit ResourceCarlini et al. (2023). Demonstrates novel poisoning attacks that guarantee appearance of malicious examples in web-scale datasets used for training large, widely-used ML models in production.
Visit ResourceGebru et al. (2021). Proposes standardized documentation for datasets, analogous to datasheets in electronics, covering motivation, composition, collection process, and recommended uses. Communications of the ACM.
Visit ResourceCarlini et al. (2021). Seminal demonstration that GPT-2 memorizes and leaks training-data snippets verbatim, including PII, code, and URLs, via targeted prompts. Framed training-data extraction as a practical privacy threat for production LLMs.
Visit ResourceNasr et al. (2023). Scales the Carlini attack to production models including ChatGPT. Shows that a simple divergence attack ("repeat the word poem forever") extracts gigabytes of training data, including email addresses, phone numbers, and proprietary text.
Visit ResourceLukas et al. (2023). Systematic study of PII leakage in LLMs across extraction, inference, and reconstruction attacks. Quantifies how training-data protections (DP, deduplication) and defenses affect leakage rates.
Visit ResourceMehrotra et al., NeurIPS 2024. Introduces TAP (Tree of Attacks with Pruning), an automated jailbreak method that uses a small LLM attacker to iteratively refine prompts against a frontier-model target. High success rates with modest query budgets.
Visit ResourceDeng et al., NDSS 2024. Time-based analysis of commercial chatbot safeguards, followed by an automated jailbreak generator that achieves high success rates across GPT, Bard, and Bing Chat. Foundational work for automated red-teaming.
Visit ResourceReddy et al. (September 2025), arXiv:2509.10540. Academic analysis of the EchoLeak (CVE-2025-32711) zero-click prompt-injection vulnerability in Microsoft 365 Copilot. Dissects the attack chain, retrieval poisoning, and mitigations.
Visit ResourceKarmitsa et al. (2025), arXiv:2509.03294. Tutorial-style survey of differential privacy from theoretical foundations (epsilon, delta, sensitivity) through deployment in real systems. Includes a user-expectations framing useful for explaining DP trade-offs to stakeholders.
Visit ResourceShows that automated LLM vulnerability scanners produce unreliable measurements because their evaluator components are unstable, and proposes a two-phase framework to quantify and improve red-teaming evaluation reliability.
Visit ResourceAnthropic’s foundational paper on training a helpful, harmless assistant using a written constitution and reinforcement learning from AI feedback (RLAIF) instead of extensive human labeling.
Visit ResourceStanford HAI’s flagship annual report tracking global AI progress, investment, technical benchmarks, policy, and governance with extensive data and charts.
Visit ResourceNear & Abuah (2025). Free online textbook covering differential privacy from first principles through implementation. Hands-on chapters with Python code for Laplace/Gaussian mechanisms, composition, and the DP-SGD algorithm. The best free DP textbook for learners with a programming background.
Visit ResourceApple Machine Learning Research. Describes Apple's production differentially-private telemetry system: how local DP is used to learn aggregate user behavior (emoji use, lookup keywords, energy usage) without Apple ever seeing an individual user's data.
Visit ResourceGoogle Blog. Describes how Maps computes Popular Times and Live Busyness using differential privacy over aggregated location data. A widely-seen consumer feature built on DP that most users never realize is privacy-preserving.
Visit ResourceU.S. Census Bureau. Official documentation of the 2020 Decennial Census's Disclosure Avoidance System: the first government-scale deployment of formal differential privacy. Covers methodology, privacy-loss budget, and the trade-offs that generated public controversy.
Visit ResourceMeredith Strohm Gunter, Weldon Cooper Center (January 2020). Memorandum to Governor Ralph Northam documenting how 2020 Census DP noise would distort redistricting and funding allocations for small Virginia localities. A canonical critique of DP's accuracy-vs-privacy trade-off in practice.
Visit ResourceAman Priyanshu. Interactive Tetris demo where adjusting the differential-privacy epsilon visibly warps the game state. An unusually effective pedagogical tool for internalizing the privacy-vs-utility trade-off.
Visit ResourceCathy O'Neil (2016). Defines destructive algorithms by opacity, scale, and damage. Identifies "pernicious feedback loops" as the central mechanism of harm across credit, education, employment, and housing.
Publisher PageVirginia Eubanks (2018). Three case studies of how automated eligibility systems, ranking algorithms, and predictive risk models create a "digital poorhouse" that profiles, polices, and punishes the poor.
Read ExcerptRuha Benjamin (2019). Introduces "the New Jim Code" to describe how automation hides, speeds, and deepens discrimination while appearing neutral. Argues technology is not neutral -- algorithms reflect the social and institutional contexts in which they are built.
Listen to PodcastKate Crawford (2021). Traces the full material supply chain of AI systems, from cobalt mining in the DRC to data center energy consumption to the environmental justice implications of AI infrastructure.
Watch VideoMary L. Gray & Siddharth Suri (2019). Coined the "paradox of automation's last mile": as AI advances, each solution generates new problems requiring human judgment. The hardest 10% of tasks fall to invisible workers with no employment protections.
Listen to PodcastSafiya Umoja Noble (2018). Demonstrates how search algorithms structurally reproduce social relations and reinforce racial hierarchies.
Watch VideoSiddharth Kara (2023). Traces the human cost of cobalt mining in the Democratic Republic of Congo, where approximately 40,000 children work in mining operations supplying the AI and electronics supply chain.
Watch VideoStuart Russell (2019). Argues AI optimization gives us exactly what we specify, not what we actually want (the "King Midas problem"). Proposes provably beneficial AI: systems that are fundamentally uncertain about human preferences, learn from human behavior, and can be switched off.
Publisher PageNick Bostrom (2014). Foundational text on existential risk from AI. Argues that if superintelligence is created, controlling it is necessary to prevent existential catastrophe. Introduces the paperclip maximizer thought experiment. Bostrom has since nuanced his position (2025): failure to develop superintelligence would also be catastrophic.
Watch TED TalkEliezer Yudkowsky & Nate Soares (2025). Argues that intelligence and goals are independent (orthogonality thesis) and that superintelligent agents will pursue self-preservation and resource acquisition regardless of terminal goals (instrumental convergence). "A paperclip maximizer doesn't hate you, but you're made of atoms it can use for paperclips."
Watch TED TalkMax Bennett (2023). Traces five evolutionary breakthroughs in biological intelligence -- steering, reinforcement learning, simulation, mentalizing, and symbolic language -- and maps each onto modern AI system design. Bridges neuroscience, evolutionary biology, and artificial intelligence.
Read OnlineInteractive, searchable tool for browsing the full text of the EU AI Act with article-by-article navigation and SME compliance checker.
Visit ResourceSigned by 29 countries and the EU at the first AI Safety Summit. Commits signatories to international cooperation on frontier AI safety.
Visit ResourceInteractive tracker cataloging AI-related legislation, regulations, and policy initiatives across countries worldwide.
Visit ResourceInteractive tracker cataloging cross-sectoral AI governance bills across all U.S. states. 260+ bills introduced in 2025 alone.
Visit ResourceFull text of the Biden administration's October 2023 executive order on AI safety. Revoked January 2025 by the Trump administration.
Visit ResourceExecutive order revoking EO 14110 and reorienting federal AI policy toward innovation acceleration and business-friendly regulation.
Visit Resource~100 federal actions focused on accelerating AI innovation, building infrastructure (including the Stargate initiative), and leading international AI diplomacy.
Visit ResourceIndia's inaugural global AI summit (February 2026), the first major AI summit hosted in the Global South. 100 countries, 15+ heads of state, 100+ global CEOs.
Visit ResourceThe ~23,000-word constitution of values used to train Claude via Constitutional AI. Defines the principles and values that guide model behavior.
Visit ResourceGoogle's foundational responsible AI principles guiding development and deployment across their AI products and services.
Visit ResourceDetails how Microsoft operationalizes responsible AI at scale: six principles, Frontier Governance Framework, 67 red-teaming operations, and 30+ responsible AI tools.
Visit ResourceGraduated, capability-based framework using AI Safety Levels (ASL-1 through ASL-4+), inspired by Biosafety Levels, scaling safety measures proportionally to model capability.
Visit ResourceGovernment Accountability Office report (GAO-25-107197) examining the benefits and risks of AI in financial services and how federal regulators both oversee and themselves use AI.
Visit ResourceThe first full International AI Safety Report, chaired by Yoshua Bengio and backed by 30 countries, synthesizing the state of evidence on advanced-AI capabilities and risks.
Visit ResourceThe U.S. Department of Defense artificial-intelligence strategy outlining priorities for adopting, scaling, and governing AI across defense operations.
Visit ResourceCISA guidance on principles for securely integrating AI into operational-technology environments such as critical infrastructure and industrial control systems.
Visit ResourceThe FDA’s resource hub on regulating AI/ML-based Software as a Medical Device (SaMD), including its evolving approach to adaptive algorithms in healthcare.
Visit ResourceNTIA’s report weighing the risks and benefits of openly available foundation-model weights, informing U.S. policy on open models.
Visit ResourceWired interview (December 2023). Meta's chief AI scientist and Turing Award winner calls existential risk "premature," "preposterous," and "complete B.S." Argues current LLMs lack persistent memory, reasoning, and planning. Warns existential narratives may justify regulation consolidating power in big tech.
Read InterviewWritten testimony (December 2023). Google Brain founder argues "worrying about existential risk from AI is like worrying about overpopulation on Mars." Focus should be on practical, near-term harms. Guardrails should target AI applications rather than general-purpose AI technology.
Read StatementSimon Willison (June 2025). Identifies the three capabilities that together create critical risk in agentic AI: access to private data, exposure to untrusted content, and the ability to take external actions. If an agent combines all three, attackers can trick it into accessing and exfiltrating private data. The essential design litmus test for evaluating any agentic AI system.
Read Blog PostCity of Hobart lecture (January 2026). Nobel Laureate and "Godfather of AI" who left Google in May 2023 to freely speak about AI risks. Estimates 10-20% chance of AI-caused human extinction within three decades. "The best way to understand it emotionally is we are like somebody who has this really cute tiger cub."
Watch LectureFeature-length documentary covering DeepMind's AlphaGo defeating world champion Lee Sedol in Go. Documents the cultural and geopolitical impact, including China's subsequent national AI mobilization strategy.
Watch DocumentaryVaswani et al. (2017). The original Transformer paper introducing the self-attention mechanism that underlies modern LLMs. Describes tokens, embeddings, query-key-value attention, multi-headed attention, and positional encoding.
Visit ResourceVisual, intuition-building video series on neural networks, including chapters on Transformers and attention mechanisms. Excellent for building geometric intuition about how these systems work.
Watch PlaylistOriginal work on simulated flocking behavior using three simple rules (cohesion, separation, alignment). A foundational example of emergence: complex collective behavior arising from simple individual rules.
Visit ResourceSebastian Lague. Engaging implementation walkthrough of the Boids algorithm, demonstrating how simple rules produce emergent flocking behavior in simulation.
Watch VideoMnih et al. (2013). The original DQN paper from DeepMind. Introduces experience replay -- inspired by hippocampal replay in neuroscience -- enabling an agent to learn Atari games from raw pixels.
Visit ResourceDeepMind blog post covering the development of deep reinforcement learning, from DQN through AlphaGo and beyond. Explains how neural networks combine with RL to achieve superhuman performance.
Visit ResourceDemis Hassabis (2019). MIT lecture by the DeepMind CEO covering self-play, superhuman game playing, and the broader vision for AI systems that learn without human supervision.
Watch LectureO'Reilly, Bhattacharyya, Howard, Ketz. Foundational paper on dual-system learning: fast learning (hippocampus) for episodic memory and slow integration (neocortex) for generalization. Directly inspired dual-system AI architectures.
Visit Resourcevan de Ven, Soures, Kudithipudi. Survey paper covering biological solutions (synaptic consolidation, sleep-based replay) and AI approaches (elastic weight consolidation, progressive networks, rehearsal) to the catastrophic forgetting problem.
Visit ResourceTowards Data Science. Overview of how model performance degrades as real-world data distributions shift over time. Connects to biological memory decay and the need for continuous monitoring and retraining.
Visit ResourceCichy & Kaiser (2019). Explores how deep neural networks serve as scientific models of biological cognition, bridging computational neuroscience and AI. Trends in Cognitive Sciences.
Visit ResourceExplores how transformer attention mechanisms generate new hypotheses about neuron-astrocyte network processing in the brain, demonstrating the bidirectional feedback loop between AI and neuroscience. PMC.
Visit ResourceOpenAI’s announcement of ChatGPT Agent, an agentic mode that can browse the web, use tools, and complete multi-step tasks on a user’s behalf.
Visit ResourceOpenAI’s introduction of GPT-5.5 and its extended reasoning ("thinking") mode for harder tasks.
Visit ResourceGoogle’s overview of Gemini’s agentic capabilities — autonomous task execution, tool use, and multi-step reasoning across the Gemini product line.
Visit ResourceRelease notes for DeepSeek’s V4-Pro model, a useful reference point for the open-weight frontier-model competitive landscape.
Visit ResourceAnthropic’s product page for Claude Opus, including its adaptive "thinking" capabilities for complex reasoning and agentic work.
Visit ResourceAnthropic’s announcement of computer-use, enabling Claude to operate a computer via screenshots and simulated mouse/keyboard actions.
Visit ResourceAnthropic’s open standard for connecting AI assistants to external tools and data sources through a common protocol, now widely adopted across the agent ecosystem.
Visit ResourceAnthropic (February 2026). Documents industrial-scale model distillation by DeepSeek, Moonshot/Kimi K2, and MiniMax, extracting frontier capabilities without safety guardrails. Introduces detection methodologies and countermeasures for IP theft via API-based knowledge extraction.
Visit ResourceLawfare. Analysis of how AlphaGo's victory catalyzed China's military AI strategy, with the PLA framing AI as a revolution in military affairs equivalent to nuclear weapons. Examines the geopolitical implications of AI superiority in defense.
Read ArticleScott Alexander (2014). Foundational essay on coordination failures and race-to-the-bottom dynamics applied to AI development. Uses game theory to analyze how competitive pressures drive collectively harmful outcomes even when all participants prefer cooperation. Content warning: skip the Ginsberg poem if sensitive to explicit content.
Read EssayKen Mogi (2023). Institute of Art and Ideas. Applies the Moloch framework specifically to the AI development race, analyzing prisoner's dilemma dynamics among AI companies and nations. Note: may require free trial to read.
Read ArticleEntry-level, vendor-neutral cybersecurity certification covering core security skills; a widely recognized industry baseline credential.
Visit ResourceVendor-neutral networking certification covering infrastructure, operations, and network-security fundamentals.
Visit ResourceIntermediate certification focused on security analytics, threat detection, and incident response.
Visit ResourceCompTIA’s certification focused on securing AI systems and applying AI within cybersecurity work.
Visit ResourceAdvanced, globally recognized certification for experienced security professionals, spanning eight security domains.
Visit ResourceSANS-affiliated certifications spanning offensive, defensive, forensic, cloud, and management security specialties.
Visit ResourceISACA’s Advanced in AI Security Management credential, focused on governing and securing AI systems.
Visit ResourceThe Artificial Intelligence Governance Professional credential, focused on AI law, policy, and responsible governance.
Visit ResourceAmazon’s specialty certification validating expertise in securing workloads on the AWS cloud platform.
Visit ResourceMicrosoft certification for implementing security controls, identity, and threat protection across Azure.
Visit ResourceOffSec’s hands-on penetration-testing course and the OSCP certification, known for its rigorous 24-hour practical exam.
Visit ResourceBeginner-friendly professional certificate on Coursera covering security fundamentals, SIEM tools, and Python.
Visit ResourceThe U.S. Bureau of Labor Statistics’ authoritative guide to occupations, typical pay, education, and job outlook.
Visit ResourceBLS occupational data for computer and IT roles, including information-security analysts.
Visit ResourceThe BLS official employment-projections release covering expected job growth by occupation and industry.
Visit ResourceA Monthly Labor Review article explaining how BLS accounts for AI’s effects in its long-range employment projections.
Visit ResourceISC2’s annual study quantifying the global cybersecurity workforce and the persistent talent gap.
Visit ResourceFree certifications, training, and community resources for students entering cybersecurity.
Visit ResourceInteractive data on cybersecurity supply and demand, career pathways, and open roles across the U.S.
Visit ResourceLabor-market analytics provider whose skills and demand data underpin many workforce reports.
Visit ResourceTech-focused job board and community featuring startup and enterprise technology roles.
Visit ResourceLarge general-purpose job-search engine aggregating listings across industries.
Visit ResourceProfessional-network job board with listings, referrals, and recruiter outreach.
Visit ResourceEarly-career and university-focused recruiting platform connecting students with employers.
Visit ResourceJob board specializing in roles that require U.S. government security clearances.
Visit ResourceAn aggregated overview of cybersecurity job-market statistics and hiring trends.
Visit ResourceCompensation-benchmarking analysis of AI talent pay and hiring trends.
Visit ResourceA staffing-firm overview of projected AI hiring growth heading into 2026.
Visit ResourceWorld Economic Forum analysis of how technology, including AI, is reshaping skills and employment worldwide.
Visit ResourceCISA’s National Initiative for Cybersecurity Careers and Studies — a training catalog and home of the NICE Workforce Framework.
Visit ResourceAn interactive map of cybersecurity career pathways, roles, and the skills each requires.
Visit ResourceA community-maintained GitHub list of 2026 new-grad and internship roles in AI/ML.
Visit ResourceAn overview of anticipated cybersecurity job trends and in-demand skills for 2026.
Visit ResourceFederal scholarship-for-service program funding cybersecurity and AI-cyber students in exchange for government service after graduation.
Visit ResourceDepartment of Defense scholarship covering full tuition plus a guaranteed DoD civilian position for STEM students.
Visit ResourceNSA student programs including the Stokes Educational Scholarship and paid internships.
Visit ResourceAmazon Web Services careers portal, including cloud-security and responsible-AI roles.
Visit ResourceCareers and student internships at the U.S. Cybersecurity and Infrastructure Security Agency.
Visit ResourceFederal Pathways Programs hiring routes for current students and recent graduates.
Visit ResourceRoles at the European Commission’s AI Office, which implements and enforces the EU AI Act.
Visit ResourceReporting on CISA’s plan to add more than 300 new hires, a signal of federal cyber demand.
Visit ResourceA career-advice nonprofit with in-depth guides on high-impact work in AI safety and policy.
Visit ResourceA community-maintained job board aggregating AI-safety roles worldwide.
Visit ResourceThe University of Tulsa’s online Master of Science in Cyber Security.
Visit ResourceThe University of Tulsa’s interdisciplinary Ph.D. in Cyber Studies — the nation’s first dedicated cyber department to offer B.S., M.S., and Ph.D. degrees, an NSA Center of Academic Excellence.
Visit ResourceThe University of Tulsa’s Bachelor of Science in Applied Artificial Intelligence.
Visit ResourceUniversity of Oklahoma Polytechnic’s B.S. in Applied Artificial Intelligence.
Visit ResourceFlorida International University’s online M.S. in Computer Engineering with an AI-for-cyber / cyber-for-AI focus.
Visit ResourceCarnegie Mellon’s M.S. in Artificial Intelligence Engineering – Information Security (MSAIE-IS).
Visit ResourceStanford’s online graduate certificate in artificial intelligence.
Visit ResourcePurdue Polytechnic’s online graduate certificate in applied AI and cybersecurity.
Visit ResourceA comparison guide to AI-focused cybersecurity master’s degree programs.
Visit ResourceGamified, hands-on penetration-testing labs and challenges across difficulty levels.
Visit ResourceGuided, browser-based cybersecurity training "rooms" for beginners through advanced learners.
Visit ResourceA national collegiate blue-team competition where students defend a live business network.
Visit ResourceA collegiate individual and team CTF competition with detailed skills reporting.
Visit ResourceA national talent pipeline and competition that selects and trains the US Cyber Team.
Visit ResourceMITRE’s Embedded Capture-the-Flag, a hardware/embedded-systems security competition.
Visit ResourceA data-science and machine-learning competition platform with datasets, notebooks, and challenges.
Visit ResourceLocal, year-round DEF CON community groups (DCGs) that meet between conferences.
Visit ResourceThe DEF CON village dedicated to AI security, home to large-scale generative-AI red-teaming events.
Visit ResourceA premier commercial information-security conference and professional training series.
Visit ResourceA large industry security conference covering enterprise security, policy, and emerging threats.
Visit ResourceUSENIX’s academic security conferences, a top venue for systems-security research.
Visit ResourceCommunity-organized, local security "unconferences" held in cities worldwide.
Visit ResourceThe Conference on Neural Information Processing Systems, a leading machine-learning research venue.
Visit ResourceThe International Conference on Learning Representations, a top deep-learning research venue.
Visit ResourceModel Evaluation & Threat Research (METR), a nonprofit that evaluates frontier-model autonomous capabilities and dangerous-capability thresholds.
Visit ResourceA part-time online research program that helps newcomers start concrete AI-safety projects.
Visit ResourceThe ML Alignment & Theory Scholars program, pairing scholars with experienced alignment-research mentors.
Visit ResourceFree, structured courses on AI alignment and governance from BlueDot Impact.
Visit ResourceAnthropic’s alignment-research hub and fellows program.
Visit ResourceUC Berkeley’s Center for Human-Compatible AI — alignment research and open roles.
Visit ResourceAn AI-safety organization focused on evaluating deceptive and scheming behaviors in advanced models.
Visit ResourceA research nonprofit behind the widely signed Statement on AI Risk, plus safety research and field-building.
Visit ResourceAn academic workshop series on engineering safe and trustworthy AI systems.
Visit ResourceA research community and resource hub for machine-learning safety, including competitions like Trojan Detection.
Visit ResourceAn AI-security research company evaluating frontier-model cyber capabilities; careers page.
Visit ResourceA rationality and AI-alignment discussion forum central to the alignment community.
Visit ResourceA grassroots, open-source AI research collective organized around a public Discord.
Visit ResourceAn open engineering consortium behind ML benchmarks such as MLPerf and AI-safety benchmarks.
Visit ResourceThe UK AI Safety Institute, which evaluates frontier-AI risks and informs government policy.
Visit ResourceThe U.S. AI Safety Institute — now the Center for AI Standards and Innovation (CAISI) — at NIST.
Visit ResourceUniversity of Tulsa announcement on launching its applied-AI degree.
Visit ResourceTU news on Tulsa’s $51M federal Tech Hub implementation award.
Visit ResourceTU news on the Oklahoma Cyber Innovation Institute launching the state’s first cyber range.
Visit ResourceThe federal Economic Development Administration’s designation of the Tulsa Regional Tech Hub.
Visit ResourceOklahoma Department of Commerce on TU launching a cyber institute with a projected $75M investment.
Visit ResourceTulsa Innovation Labs, the organization driving the region’s technology-cluster strategy.
Visit ResourceThe OCII initiative building Tulsa’s cyber ecosystem, workforce, and research capacity.
Visit ResourceAn overview of Tulsa-area cybersecurity employers and the skills they look for.
Visit ResourceCivilian and cyber career opportunities at Tinker Air Force Base in Oklahoma City.
Visit ResourceThe FAA’s Oklahoma City campus, a major federal employer in aviation IT and operations.
Visit ResourceTulsa-headquartered retailer; corporate, IT, and technology careers.
Visit ResourceAerospace manufacturer with major Oklahoma operations; engineering and IT careers.
Visit ResourceTulsa-based financial-services firm; technology and security careers.
Visit ResourceEnergy-infrastructure company with Oklahoma presence; careers.
Visit ResourcePSO (an AEP company); careers across Oklahoma utilities operations.
Visit ResourceAnthropic’s landmark work extracting millions of interpretable features from Claude 3 Sonnet using sparse autoencoders — a major step toward understanding a model’s internal concepts.
Visit ResourceAnthropic’s interpretability publication venue, home to foundational work on superposition, features, and circuits in transformer models.
Visit ResourceA practical getting-started guide to mechanistic-interpretability research, with concrete advice on entering the field.
Visit ResourceOpenAI research on whether weak supervisors can elicit the full capabilities of much stronger models — a core question for scalable oversight.
Visit Resource