OpenAI Launches Safety Bug Bounty: First AI Safety Vulnerability Program with Up to $100K Rewards

OpenAI launched a Safety Bug Bounty program — the industry's first vulnerability bounty specifically targeting AI abuse and safety risks. It covers agentic risks (MCP abuse, prompt injection, data exfiltration), proprietary information leaks, and platform integrity violations. High-severity reports can earn up to $100K. Simultaneously, researchers disclosed and patched a Codex command injection flaw and a ChatGPT data leak vulnerability.

OpenAI Safety Bug Bounty: AI Security Enters the

'Crowdsourced Defense' Era #

Why AI-Specific Bounties?

Traditional security bounties target software engineering vulnerabilities. AI systems face novel threats — prompt injection, agent privilege abuse, data exfiltration, model information leakage — that require specialized security research communities. #

Coverage

Agentic risks (MCP abuse, third-party prompt injection, data exfiltration, unauthorized operations), proprietary information leakage, and platform integrity violations. High-severity reports earn up to $100K. General jailbreaks without safety impact are excluded. #

Concurrent Disclosures

Simultaneously, researchers disclosed a Codex command injection flaw (GitHub token theft potential) and a ChatGPT data exfiltration channel — both patched by OpenAI. #

Industry Impact

OpenAI's program pioneers 'crowdsourced defense' for AI safety. Expect Anthropic, Google, and Meta to launch similar programs soon, marking AI security's transition from internal-only to community-collaborative defense. #

Concrete Agent Security Incidents

Recent disclosed incidents illustrating Safety Bug Bounty's necessity: Codex GitHub Token theft (March 2026, hidden channel in code execution environment exfiltrating user tokens for full private repository access), ChatGPT data exfiltration channel (encoding sensitive conversation history into seemingly normal code execution output), and MCP protocol injection (February 2026, multiple independent discoveries of malicious tools injecting hidden instructions via MCP responses to control agent behavior). #

New Paradigm for AI Security Research Safety Bug

Bounty marks several shifts: from 'model security' to 'system security' (covering agent architecture, tool integration, platform infrastructure), from 'academic research' to 'combat defense' (incentivizing real-world vulnerability discovery over theoretical risks), and from 'internal-only' to 'crowdsourced' (leveraging global security research community to expand testing coverage). #

Developer Implications

The bounty's coverage effectively defines an 'AI application security checklist': Can your agent resist third-party prompt injection from external data? Does your agent follow least-privilege principles? Does your agent have data exfiltration protection? Does your platform have anti-automation controls? These questions should be part of every AI application's security review process. #

Historical Context: From Security Bounties to Safety Bounties

The evolution from traditional bug bounties to AI safety bounties reflects a fundamental shift in what constitutes 'vulnerability.' Traditional bounties focus on unauthorized access (reading data you shouldn't, executing code you shouldn't). AI safety bounties introduce a new category: unauthorized influence (making the AI do things it shouldn't, extracting information through conversational manipulation rather than code exploitation). This shift requires different skills from security researchers. Traditional penetration testers think in terms of code execution paths and memory corruption. AI safety researchers need to think in terms of semantic manipulation, context exploitation, and emergent behaviors in complex multi-agent systems. The Safety Bug Bounty effectively creates a new professional discipline: AI red-teaming as a service. #

Comparison

with Anthropic's Approach Anthropic has maintained an internal red team and partners with external researchers but hasn't launched a formal public bounty program as of March 2026. Google runs a vulnerability rewards program that includes some AI-specific categories but doesn't have a dedicated AI safety bounty. OpenAI's move puts pressure on competitors to follow suit, potentially creating an industry-wide AI safety testing ecosystem.