OpenAI Launches Safety Bug Bounty: Up to $100K for AI Safety Vulnerabilities, Agent Risks Prioritized
OpenAI launched the industry's first Safety Bug Bounty targeting AI abuse and safety risks, covering agentic risks (MCP abuse, prompt injection, data exfiltration), proprietary information leaks, and platform integrity violations. High-severity reports earn up to $100K. Concurrent disclosures include a Codex command injection flaw and ChatGPT data leak channel, both patched.
OpenAI Safety Bug Bounty: AI Security Enters the 'Crowdsourced Defense' Era
Why AI-Specific Bounties?
Traditional bug bounties target software engineering vulnerabilities (XSS, SQL injection, buffer overflow). AI systems face fundamentally different threats: prompt injection manipulating model behavior, agent privilege abuse exceeding authorization, data exfiltration leaking training data or user privacy, and model information leakage inferring internal parameters. OpenAI recognized these AI-specific risks need dedicated security research communities.
Coverage Scope
Agent risks (highest priority): MCP protocol abuse, third-party prompt injection when agents process external data, data exfiltration to unauthorized destinations, large-scale unauthorized operations. Proprietary information: reasoning process exposing internal information, training data extraction methods. Platform integrity: bypassing anti-automation controls, manipulating trust signals, evading bans.
High-severity reports earn up to $100K. General jailbreaks are out of scope, but those causing direct user harm are evaluated case-by-case.
Concurrent Real-World Disclosures
Codex command injection (BeyondTrust): hidden channel in coding agent exfiltrating user GitHub tokens for full private repository access. ChatGPT data leak: code execution sandbox with information leakage channel encoding conversation history into seemingly normal output. MCP protocol injection (February 2026): multiple independent discoveries of malicious tools injecting hidden instructions via MCP responses.
Paradigm Shift: Model Security to System Security
Early AI security focused on model outputs (jailbreaks, harmful content). Safety Bug Bounty extends to entire AI systems — agent architecture, tool integration, platform infrastructure. AI security is no longer 'what the model shouldn't say' but 'whether the entire AI system is secure across complex interactions.'
Developer Implications
The bounty effectively defines an 'AI application security checklist': Can your agent resist third-party prompt injection? Does it follow least-privilege principles? Does it have data exfiltration protection? Does your platform have anti-automation controls?
Expect Anthropic, Google, Meta to launch similar programs within months. AI security is transitioning from 'internal-only defense' to 'global security community crowdsourced collaboration' — a fundamental shift that will reshape how AI products are developed and maintained.