Methodology
Data Collection, Ranking Rules & Model Classification
Data Sources
All benchmark results are collected from published papers and official repositories. We do not re-run experiments.
Ranking Rules
Models are ranked by their primary metric on each benchmark. For Adroit, DexArt, and Bi-DexHands, this is the Mean Success Rate.
Known Limitations
Results across different benchmarks are not directly comparable. Different papers may use slightly different evaluation protocols.
Model Classification
We classify models into two categories based on their open-source status:
Open-Source Models
Models with publicly available code, marked with an "Open Source" badge. These models provide the highest level of reproducibility and transparency.
Other Models
Models without the "Open Source" badge include: (1) models whose code repository we could not find, and (2) models that were in "Coming Soon" status before the data collection deadline. These models are hidden by default but can be shown using the "Include All Models" toggle.
Data Notice
- •Data notice last updated: Feb 2, 2026.
- •If you find any errors or omissions, please let us know by creating an issue on GitHub or contacting us via email: business@evomind-tech.com
Disclaimer
Cross-benchmark comparisons should be avoided. Each benchmark has its own evaluation protocol and metrics.
Supported Benchmarks
⭐ Support This Project
If you find this leaderboard helpful for your research, please consider giving us a star on GitHub!
Contact Us
Found errors or want to submit your model? Reach out via GitHub Issue or email!