EVMbench
Open-source benchmark evaluating AI agents' ability to detect, patch, and exploit smart contract vulnerabilities on the Ethereum Virtual Machine.
What is EVMbench?
EVMbench is an open-source benchmark framework launched by OpenAI and Paradigm in February 2026 that evaluates how well AI agents can analyze, detect, patch, and exploit smart contract vulnerabilities. The benchmark draws on 120 curated vulnerabilities from 40 real-world audits and security competitions, including scenarios from the Tempo blockchain. It measures three capability modes: Detect (vulnerability auditing), Patch (vulnerability remediation), and Exploit (end-to-end attack execution in a sandboxed environment). EVMbench aims to encourage the use of AI defensively to audit and strengthen deployed smart contracts that secure over $100B in crypto assets.
Best for: AI agent evaluation · Security research · DeFi security benchmarking

Developer Stack Fit
Quick read on where EVMbench fits in a software team's AI stack. Validate final fit against your codebase, data policy, and deployment model.
- Stack layer
- AI Security
- Deployment model
- Open-source deployable
- Open-source status
- Yes or source-available
- API support
- API or integration-friendly
- MCP support
- No MCP signal found
- Security posture
- Stronger controls worth validating
- Best use case
- AI agent evaluation
Key Features
- 01
Detect mode: AI agents audit smart contracts and score on vulnerability recall
First benchmark for AI smart contract security capabilities
- 02
Patch mode: AI agents modify vulnerable contracts while preserving functionality
Real-world vulnerabilities from professional audits
- 03
Exploit mode: AI agents execute end-to-end fund-draining attacks in sandbox
Three evaluation modes covering full security workflow
- 04
120 curated vulnerabilities from 40 real-world audits
Open-source and freely available
- 05
Scenarios from Tempo blockchain for payment-oriented contracts
A core development capability that teams use daily.
- 06
Automated task auditing agents for quality control
A core development capability that teams use daily.
- 07
Custom graders and red-teaming to prevent exploitation
A core development capability that teams use daily.
Pros & Cons
What stands out
- First-of-its-kind benchmark for AI security capabilities in DeFi
- Based on real audit data, not synthetic vulnerabilities
- Encourages defensive use of AI for contract auditing
- Open-source framework for researchers and developers
- Covers the full security workflow (detect, patch, exploit)
Watch outs
- Research benchmark, not a production security tool
- Limited to EVM-compatible contracts
- Exploit mode is for evaluation only, not actual attacks
- Requires AI agents to run the benchmark
Pricing Plans
EVMbench Pricing
Choose the perfect plan for your needs. All plans include our core features with different usage limits and advanced capabilities.
Open Source
Need a Custom Solution?
Looking for enterprise features or custom pricing? Contact EVMbench directly for tailored solutions.
Contact SalesMost teams land on the Open Source plan.
Alternatives
FAQ
What is EVMbench and how does it work?
EVMbench is a development tool that open-source benchmark evaluating ai agents' ability to detect, patch, and exploit smart contract vulnerabilities on the ethereum virtual machine.. It uses AI to help users improve productivity through analyzing input and generating relevant output.
Is EVMbench free to use?
EVMbench offers a completely free plan. You can get started without paying anything.
Is there a free plan or trial?
EVMbench doesn't offer a traditional free trial, but provides a money-back guarantee on paid plans.
What can EVMbench do?
More development Tools
Cursor
AI-powered code editor with autonomous agents, multi-model support, and Automations for triggering agents via code changes, Slack, or timers.
Read review →TurboQuant
Revolutionary KV cache compression achieving 6x memory reduction and 8x speedup for LLM inference with zero accuracy loss.
Read review →Ollama
Local-first LLM runtime for running models on your hardware with local privacy, no per-token API costs, and offline-capable workflows.
Read review →Affiliate Disclosure: We may earn a commission when you purchase through links on our site. This doesn't affect our editorial independence or the price you pay.
EVMbench
Free