Skip to main content

AI Bug Bounty

Organizations: Best practices for running a successful AI-focused bounty program

Updated today

HackerOne bug bounty programs for AI leverage the world’s largest community of security researchers to continuously test AI systems for security vulnerabilities and safety issues. This guide outlines the best practices for running a successful AI-focused bounty program, from defining scope to engaging effectively with the research community.

Understanding AI Bug Bounties

AI bug bounties focus on identifying and mitigating issues unique to AI systems. These programs cover both AI security and AI safety domains.

AI Security

AI security vulnerabilities are technical issues that exploit weaknesses specific to AI systems. Examples include:

  • System prompt leaks that expose confidential instructions or logic.

  • Retrieval-Augmented Generation (RAG) manipulation that alters or misuses knowledge sources.

  • Adversarial input exploitation that triggers unintended model behavior.

AI Safety

AI safety testing focuses on Trust and Safety concerns that go beyond traditional security. These include:

  • Harmful or biased outputs that could lead to ethical or reputational risks.

  • Hallucinations that produce inaccurate or fabricated information.

  • Misuse scenarios that are difficult to simulate in-house, such as generating disallowed content or evading safeguards.

Defining and Scoping AI Model Assets

Properly defining your AI assets is the first step to effective testing. HackerOne provides specific tooling to support this.

  • The AI Model Asset Type: Our platform includes a dedicated AI Model asset type. Use this for any large language model (LLM), AI integration endpoint, or direct model link. This asset type allows HackerOne to track these unique assets and match researchers with the right AI skills for your program.

  • Adding AI Model Assets: You can add AiModel assets individually or in bulk via CSV import. This is a case-sensitive name.

Platform Asset Type

Description

Case-Sensitive Name for Importing

Example

AiModel

AI integration endpoint or direct model link.

AiModel

LLM-06-12-2023

  • Scoping Your Program: Once defined, you can add your AI Models as in-scope for your bounty program. A clear scope should include the AI-specific vulnerabilities you want to receive reports on, such as those listed in the OWASP Top 10 for LLMs.

The AI Model Leaderboard

HackerOne has an Asset Types Leaderboard to help you identify security researchers with proven experience in specific asset types. This leaderboard highlights specialists with a strong reputation for discovering valid vulnerabilities in areas such as AI Models. You can use it to invite top AI researchers directly to your program.

AI Security Bounty Engagement Best Practices

AI is now a critical component of most organizations, which makes robust security testing essential. Because AI systems behave differently from traditional web applications, it’s important to equip researchers with a clear scope, robust testing environments, attractive incentives, and the right resources.

The following sections outline the key pillars for launching a successful AI-focused bounty engagement. They also provide guidance for conversations with your Customer Success Manager (CSM) or Technical Account Manager (TAM).

Defining a Clear, Focused Scope

When defining the scope for an AI model, consider not only the traditional program scope but also the specific AI assets and the boundaries of testing. For example:

  • Program scope: AI Chatbot

  • Testing scope: AI Chatbot Safety

Start by cataloging every AI component you plan to test—such as chat interfaces, API endpoints, backend orchestration layers, file upload processors, and downstream services the AI can access. Clearly list the AI-specific vulnerabilities you want to receive reports on, relevant to your business context. For instance:

“The AI assistant should not access data belonging to other users or make account changes for the current user.”

Include any additional AI safety concerns, such as bias exploits, harmful content bypasses, or legal and regulatory risks. As a starting point, we recommend consulting the OWASP Top 10 for LLMs.

Equally important is identifying what falls out of scope—for example, harmless hallucinations, irrelevant outputs, or bias without security impact. Defining both in-scope and out-of-scope areas up front prevents confusion and ensures researchers focus their efforts where it matters most.

We recommend running threat modeling exercises periodically, either with the HackerOne team or internally, to verify that your scope continues to align with your organization’s evolving security goals.

Preparing a Supportive Testing Environment

Researchers need an environment that is both safe and as close to production as possible. Ideally, set up a sandbox or staging instance populated with dummy data that can be reset as needed.

If a full sandbox is not possible, make sure production safeguards such as rate limits and content filters are documented and adjustable for testers. Streamline access by pre-creating test accounts or API keys, and set rate limits that allow researchers to probe thoroughly without being blocked.

We recommend freezing non-critical updates to the AI system during the bounty window. If an emergency patch is required, communicate the change promptly to prevent researchers from testing moving targets.

Attractive, Aligned Incentives

Given the novelty and complexity of AI exploits, reward structures must reflect both effort and impact. Set competitive bounty ranges for AI vulnerabilities by mapping issues such as prompt injections that expose private data to higher payout levels. Define severity criteria with concrete examples to ensure consistency.

Assuming the AI asset in scope has some security and safety controls, we recommend the following minimum bounty levels:

  • $7000–$10,000 for Critical reports

  • $2500 – $5000 for High reports

  • $750 – $2000 for Medium reports

  • $100 – $500 for Low reports

For AI safety issues, use flags with a fixed bounty amount per severity level. This approach accounts for multiple unique methods of triggering the same issue. For example, a prompt that generates an image containing blood would qualify as a valid flag with a potential bounty of $250.

If an AI asset has already undergone extensive testing or requires a complex setup, adjust the bounty table to remain competitive:

  • $7000–$10,000 for Critical reports

  • $2500 – $5000 for High reports

  • $750 – $2000 for Medium reports

  • $100 – $500 for Low reports

AI assets that are significantly hardened may warrant a higher reward tier.

You can also boost engagement with AI-specific bonuses, such as first-finder awards for certain vulnerability types. Non-monetary recognition—such as blog features, hall-of-fame mentions, shout-outs, or speaking opportunities—can further motivate the community. Above all, transparency around how you grade and reward AI vulnerabilities builds trust and drives sustained participation.

Example Severity Criteria

Severity level

Example vulnerability categories

Bounty payment

Critical

  • Insecure Plugin Design

  • Insecure Output Handling

  • High-Impact Prompt Injection

Examples:

RAG Data Poisoning: An Attacker can inject or overwrite retrieval sources so that every user sees malicious or misleading content.

Unauthorized Account Takeover: Attacker uses the assistant to change another user’s password or permissions without proper authentication.

$7000

High

  • Supply Chain Vulnerabilities

  • Broad Sensitive Information Disclosure / Inferred Sensitive Data

Examples:

Prompt Injection with Limited Impact: Attacker crafts a prompt that causes the assistant to execute administrative commands (e.g. modify user settings) on behalf of another user.

Context Leakage: Assistant reveals hidden system prompts or sensitive request headers.

$5000

Medium

  • Sensitive Information Disclosure / Inferred Sensitive Data about another user

  • Excessive Agency

Examples:

  • System Information Disclosure: Finding a way to call internal APIs that should be restricted, but no actual data or settings change occurs without additional steps.

  • RAG Retrieval Bypass: Attacker triggers the retrieval of non‐public documents without altering them

$1500

Low

  • Low Severity Context Leakage

Examples:

  • Prompt leaks that lead to non-sensitive internal information about the model, such as its original conditioning prompt(s).

  • Sensitive Information Disclosure / Inferred Sensitive Data only about the current user.

$500

Comprehensive Documentation and Tooling

High-quality documentation is key to a successful AI security bug bounty program. Give researchers the context and tools they need to test effectively.

  • Share an architecture overview of your AI pipeline. Include data flows between chat interfaces, APIs, orchestration layers, and downstream services.

  • Add sample code or a Postman collection that demonstrates how to query the model as an attachment on the Security Page.

  • Summarize existing safety controls and data-handling practices so researchers understand the baseline and any known issues.

  • Provide any internal or community tools for prompt fuzzing or adversarial input generation.

  • Document rate limits, content filters, and other guardrails; note how testers can request adjustments during the engagement.

We recommend keeping these materials current and versioned. Lowering the learning curve attracts a broader pool of researchers and leads to higher-quality findings.

Tip: Hai can generate architecture diagrams for you!

Active Communication and Educational Support

AI security is still emerging; researchers value real-time guidance. Establish clear channels and proactive education to keep momentum high.

  • Create a dedicated contact channel. Set up an email alias monitored by security and engineering. Publish expected response hours and escalation steps.

  • Provide a short kickoff recording—a 5–10 minute demo that explains intended behavior, known boundaries, test accounts, and how to request rate-limit adjustments. Attach it to the Security Page.

  • Offer office hours. Host sessions during launch and after major updates; record them and add links on the Security Page.

  • Publish a living FAQ or clarifications document. Post updates quickly in response to common questions. Date each change and summarize what changed.

  • Announce changes promptly. If you ship an emergency patch or modify safeguards, post a brief notice with a timestamp and version so researchers are not testing a moving target.

  • Centralize resources. Keep links to architecture overviews, sample requests, safety controls, and tools in one place on the Security Page.

We recommend keeping communications concise and consistent. Clear guidance reduces friction, attracts more researchers, and sustains engagement.

Did this answer your question?